The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
This is more of a generic question than a specific problem but my question, whats the most efficent way to filter large data sets within Power Query?
Simplified Example:
Table 1 Job Number and Date
Table 2 Job Number and Qty
I want to see total qty produced within a given timeframe so I believe I have 3 options to get there...
Option 1: Filter Table 1 to desired time frame then merge Table 2 into that query
Option 2: Filter Table 1 to desired time frame then merge Table 1 into Table 2, then filter out null in Qty column
Option 3: Filter Table 1 to desired time frame, turn into a list, bring list into Table 2 (use List.Buffer) and utilise List.Contains to now fitler - I like this option as it means I can make the date ranges user editable
Example from a query
#"Filtered Material" = Table.SelectRows(PV_JobStep_Table, each ([StepType] = 113)),
ListQuery = List.Buffer (PV_JobCosts),
#"Filtered PV_JobCosts" = Table.SelectRows(#"Filtered Material", each List.Contains (ListQuery, [JobCode])),
Problem I'm having is when the tables have lots of columns and contain millions of rows the reports are taking up to 10 minutes to refresh and return a hundred or so rows.
Using a 64bit ODBC connection into a Progress database so no option to utilise Query Folding 😞
Thanks,
I like option 3 best as long as your list is < 1,000 items. More than that and it starts to get slow.
Because you are not folding, you have to process 100% of the data no matter what.
If Table 1 isn't huge after the date filter, consider wrapping it in Table.Buffer before you do the merge. Test that out - but save the file first. If it is huge, it could eat up RAM and cause the app to lock up. I speak from experience when I used Table.Buffer on a 3M record 30+ column table.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingThanks Edhans.
Yea most of the time it isn't too bad (I consider report refresh in sub 2 minutes acceptable for users) but as you say utilising bigger tables with 2M+ lines things start to slow down considerably.
I've played with Tabe.Buffer a bit to see if that can speed things up but it seems really hit and miss either getting a slight speed increase, no change or absolutley kills the report depending on where I use it.
Main speed considerations/rules I try to stick to:
- Filter as early as possible
- Remove unwanted columns before merging
- As few steps as possible and no duplication of steps (so all my change types, renames etc. together)
- Reference tables rather than going into the databse multiple times for the same data
Are there any other 'speed' guidelines people try and use when query folding isn't an option?