March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Often, I'll start with a siginficant dataset and filter/cut out columns to achieve the refined dataset for importing. However, after sitting and waiting for ages for further slight adjustment be made, I realised that I would be better off 'drawing a line in the sand' and saving the reduced dataset as an excel workbook and then pointing power query at that. If I need to provide an updated version of the whole chain, I would have to go to those original excel workbooks and update those manually.
The effect of this thought process is that I end up with 3-4 different save points of excel workbooks datasets that combine, but are not live data. I know that I'm using PQ incorrectly and the documentation points me in the direction of dataflows, but I feel that this isn't what I'm after either.
In short, is there a way to achieve the above using the built in tools?
Hi @Crow2525 ,
I think you may still need to start from the data source and optimize your data source, so that the speed of filtering should be accelerated.
The fastest way to optimize your Power BI report is to limit the number of columns to only the ones you need in your data model. Go through your tables in Power Query and determine what fields are being used. Delete these columns if they are not being used in any of your reports or calculations.
Best Regards,
Stephen Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
To get a distinct buffered table column instead of a list:
= Table.Buffer(Table.Distinct(getlist[[Policy Number]]))
--Nate
Reference queries DO cause the original query to evaluate, in case the data has changed since you referenced it. I would move that from the first to the subsequent queries. Meanwhile, to get your distinct list of Policy ID, add a new query named PolicyNumbers, and make the source:
= List.Buffer(List.Distinct(getlist[Policy Number]))
Now you have a buffered list of distinct Policy numbers that you can use either as a list in selecting rows (each List.Contains...) or you can make it into a table column to use for joins.
I am here to tell you that these techniques will speed your queries right on up'
--Nate
wow, thank you! I was trying a each list contains filter based on a distinct list and it was hanging. Replaced the list source with this buffered list code and my filter step almost instantly resolved!
I'm learning PBI and have a few instances where I try several ways to do something before finding a significantly smoother option. This is the most frustrating one so far so its a big relief to find such a succint solution. But also frustrating that I didn't find it sooner.
Yes indeed, always use each List.Contains(List.Buffer(List), [Column])--especially if you are wanting your SQL to fold--it adds a WHERE COLUMN IN (List Values).
---Nate
1 - getList
let
Source = Excel.Workbook(Web.Contents(Folder & "Consolidated List.xlsm"), null, true),
#"SelectData" = Table.SelectRows(Source, each ([Name] = "data")),
data_DefinedName = #"SelectData"{[Item="data",Kind="DefinedName"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(data_DefinedName, [PromoteAllScalars=true]),
#"Removed Duplicates" = Table.Distinct(#"Promoted Headers", {"Policy Number"}),
Custom1 = #"Removed Duplicates"
in
Custom1
2 – Renewal List 2022
let
Source = #"1 - getList",
#"▶ Add Allocations" = Source,
Nothing fancy, just grabbing data from a 35MB file with mutiple sheets and data tabs. The above is how I have traditionally done this. Split the query up and then referenced the first query to the second. The issue I have is that if I add an additional step, e.g. filter. PowerQuery appears to start the query right back from the first referenced query and re-download the 35MB file and store it. If I put a line in the sand, and save that query as a as a separate excel document and then reference that excel document in the second query, I get a faster working file.
let
Source = Excel.Workbook(File.Contents("C:\Users\OneDrive\Projects\Renewal List\Working\1 - getList.xlsx"), null, true),
Can I achieve the same thing using incremental refresh?
Hi @Crow2525 ,
I still have not understood what you are trying to do by putting a line in the sand.
My suggestion was to put the filter on your 1st query, do whatever needs to be done and then remove that filter. Its best to filter the query as far up as possible, so that the steps to follow run faster and query folding can take place as well.
Refrence queries do not redownload, that is what reference is meant for.
If you were creating a duplicate of the first query, then it would have redownloaded the data.
Inceremental refresh works on dates and updates only the data that falls between the specified dates, while keeping the rest of the data static.
What is the source of your original data?
Also, 35MB is not a very large dataset and PBI should be able to easily compress it to about 1/4th if the right data types are defined, eventually making the refrehes faster.
Hope this helps.
Thank you,
Vishesh Jain
Proud to be a Super User!
No, reference queries DO redownload. It performs the whole refresh on the referenced query, and only then starts working on the downstream query. If you load a file as a data source, make some transforms, and then make a reference query, the whole original query will refresh before anything happens with the new query. Sometimes it's better to just duplicate the query or the data source, so that one query calls one file once. If you have four queries referencing one query, that one query will refresh four separate times--once for each of then four queries that reference it. A reference query is basically all the steps of the first query, plus the steps in the new query.
https://learn.microsoft.com/en-us/power-bi/guidance/power-query-referenced-queries
--Nate
mmm that's interesting. I do amend the data-types, but I've done it in the second query.
My code appears to load the first query and then the second. Are you sure that I'm referencing correctly? The documentation on referencing also sayes that it does load both queies.
"Let's be clear about what this means: When a query references a second query, it's as though the steps in the second query are combined with, and run before, the steps in the first query."
Sorry I didn't answer your question re line in the sand. I think we're on the same page.
When I reference the first query, i.e. :
let Source = #"1 - getList", #"▶ Add Allocations" = Source,
Any further amendments to this query, will result in the query re-loading the first query's data source.
Perhaps my termanology is wrong... Which is probably confusing the issue.
I'm trying to achieve the same thing that a dataflow is, without having to use the web service:
"You can design the dataflow to encapsulate the source data and transformations. As the dataflow is a persisted store of data in the Power BI service, its data retrieval is fast. So, even when referencing queries result in multiple requests for the dataflow, data refresh times can be improved."
Hi @Crow2525,
I have not fully understood the problem but from what I understood you can look into incremental refresh, which will only refresh a portion of your fact table.
Alternatively, if you want to work on a reduced dataset, then as soon as the data is imported, you can add a step to just keep the top 1000 rows or something and once you are done with the query transformation, remove that step to get the full data.
Please correct me if my understanding of the problem is wrong.
Hope this helps.
Thank you,
Vishesh Jain
Proud to be a Super User!
Power Query will always do exactly what you want it to do, so long as you whisper the right words into the query. Can you post some of the code before the "line in the sand", please?
--Nate
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
21 | |
16 | |
13 | |
12 | |
9 |
User | Count |
---|---|
34 | |
31 | |
20 | |
19 | |
17 |