Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
TheCreepster1
Helper II
Helper II

Very Slow Performance on Group By

Hi,


Unfortunately, our inventory and sales data reside in different systems so I have had to construct two data queries to bring this information in to Power Query. I have constructed the two queries so that in they are laid out in the same format:

 

TheCreepster1_0-1647599904876.png

I've then appended the two queries and performed a group by on the Location and Product, aggregating the measure fields. However, the volume of data is such that it's taken nearly an hour so far to load the data into the Power BI data model, with no sign that is near to being finished. In total I have approx 13.5m rows of inventory data and 2.5m rows of sales data. Below is the actual M code being generated by Power Query. Is there anything I can do to make this more efficient? I thought the append rather than a merge with full outer join would be a more efficient way of doing this, but so far it's not proving that way!

 

 

let
    #"Sales Query" = SapHana.Database("xxxxxxxx:xxxxx", [Query="select#(tab)CREATION_WEEK as WEEK_ID,#(lf)#(tab)#(tab)ltrim(gsv.PRCTR, '0') as SITE,#(lf)#(tab)#(tab)GENERIC,#(lf)#(tab)#(tab)sum(UNITS) as UNITS,#(lf)#(tab)#(tab)sum(NET_VALUE) as NET_VALUE,#(lf)#(tab)#(tab)sum(NDW) as COST_OF_SALES,#(lf)#(tab)#(tab)sum(GROSS_MARGIN) as GROSS_MARGIN,#(lf)#(tab)#(tab)0 as STOCK#(lf)#(lf)from aggregates.global_single_view gsv#(lf)#(lf)left join ""_SYS_BIC"".""EuropeMerch/CV_MER_LOCATION_CHANNEL_HIERARCHY"" lh on case when gsv.vtweg = '10' then gsv.SOLD_TO else gsv.PRCTR end = lh.LOCATION_JOIN and lh.LOCATION_JOIN != ''#(lf)#(lf)where LEDGER = 'LOCAL'#(lf)and TYPE like 'SALE%'#(lf)and lh.REGION = 'Americas'#(lf)and lh.CHANNEL = 'Direct to Consumer'#(lf)and CREATION_WEEK >201899#(lf)group by CREATION_WEEK,#(lf)#(tab)#(tab)gsv.PRCTR,#(lf)#(tab)#(tab)GENERIC#(lf)", Implementation="2.0"]),
    #"Appended Query" = Table.Combine({#"Sales Query", #"Stock Query"}),
    #"Grouped Rows" = Table.Group(#"Appended Query", {"WEEK_ID", "SITE", "GENERIC"}, {{"UNITS", each List.Sum([UNITS]), type nullable number}, {"NET_VALUE", each List.Sum([NET_VALUE]), type nullable number}, {"COST_OF_SALES", each List.Sum([COST_OF_SALES]), type nullable number}, {"GROSS_MARGIN", each List.Sum([GROSS_MARGIN]), type nullable number}, {"STOCK", each List.Sum([STOCK]), type nullable number}})
in
    #"Grouped Rows"

 

 

1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @TheCreepster1 ,

 

Don't append and merge at all.

 

Set your model up in the STAR SCHEMA structure, where your Sales and Inventory tables are the FACT tables, and any common fields (e.g. Location, Product etc.) are aggregated into DIMENSION tables that are then related to both fact tables.

You can then write measures that span both fact tables, as well as display data from both in visuals using the dimension table fields to maintain commonality between them.

https://docs.microsoft.com/en-us/power-bi/guidance/star-schema 

 

Another option would be to have your SQL query written on the source as a view, rather than being a native query from Power Query. Presuming both of your tables are on the same DB/server, your append/group steps should be able to be folded back to the source, which would be significantly quicker at performing these functions than Power Query.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

3 REPLIES 3
BA_Pete
Super User
Super User

Hi @TheCreepster1 ,

 

Don't append and merge at all.

 

Set your model up in the STAR SCHEMA structure, where your Sales and Inventory tables are the FACT tables, and any common fields (e.g. Location, Product etc.) are aggregated into DIMENSION tables that are then related to both fact tables.

You can then write measures that span both fact tables, as well as display data from both in visuals using the dimension table fields to maintain commonality between them.

https://docs.microsoft.com/en-us/power-bi/guidance/star-schema 

 

Another option would be to have your SQL query written on the source as a view, rather than being a native query from Power Query. Presuming both of your tables are on the same DB/server, your append/group steps should be able to be folded back to the source, which would be significantly quicker at performing these functions than Power Query.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi,

 

Don't know why this didn't even cross my mind! Everything was set up to join as a star schema but I had it in my head I needed to smash the inventory and sales together for the end user. Unfortunately the two data sources are coming from different databases, hence the inability to do it in the SQL back end but I've managed to sort it through your first suggestion, thank you!

Vijay_A_Verma
Super User
Super User

Try buffering the table to improve the performance.

Replace 

#"Appended Query" = Table.Combine({#"Sales Query", #"Stock Query"}),

with 

#"Appended Query" = Table.Buffer(Table.Combine({#"Sales Query", #"Stock Query"})),

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors