The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I select a single column from a very large table to operate certain filters, and finally I want to combine it with the original table to get the fields that I omitted.
Solved! Go to Solution.
Hi @carlpaul153 ,
I do not think you will gain performance by:
Just do the filter. Table.SelectRows(). Power Query doesn't hold the entire table in memory and filter it. It works backwards from how you think. It adds records to the table as it processes the original table one record at a time.
I suggest you watch this excellent video on how Power Query thinks.
In no case should you join by an index column in this case. If the column you were joining on was unique, the index will just create a 2nd unique column. An "Index" is not the same as "Indexed" which is something you do to a column in SQL Server or other RDBMS.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingThere's 2 answers I can see:
Joining by text is better because: You don't need to spend load times dynamically creating index columns. (and then deleting the original values from the fact table).
Joining by index number is better because: You don't need to store larger values in the fact table in each row. The smaller index number replaces the values. Which can be nice if you have millions upon millions of rows in your fact table. It will shrink the size of it.
Other than that, unless you absolutely need an index value for transformation purposes, it probably doesn't make much difference.
Don't forget, in all cases, use the dimension table for that column in visuals and slicers, even if you join by text. Populating slicer values is much faster doing it on 10 rows for example than it is by selecting 10 row values from millions of rows 🙂
Hi @carlpaul153 ,
I do not think you will gain performance by:
Just do the filter. Table.SelectRows(). Power Query doesn't hold the entire table in memory and filter it. It works backwards from how you think. It adds records to the table as it processes the original table one record at a time.
I suggest you watch this excellent video on how Power Query thinks.
In no case should you join by an index column in this case. If the column you were joining on was unique, the index will just create a 2nd unique column. An "Index" is not the same as "Indexed" which is something you do to a column in SQL Server or other RDBMS.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI Reporting@edhans when I said about filtering the column, I was referring to doing a long series of complex steps (more than 20), not just a Table.SelectRows () filter; that's why I thought it might be better to just operate that column and then re-combine it, is that still a bad idea?
Regarding that "Power Query doesn't hold the entire table in memory", I wonder then how it is possible that when I navigate between those more than 20 steps that I mentioned, the preview of 1000 rows reflects data from the file in its entirety, that power query would not be able to display them if it only used the first 1000 rows of the first step. I don't know if I explain myself ... Is my doubt understood?
On the other hand, thanks for answering my question about number key vs text key
@carlpaul153 without knowing what you are doing, it is impossible to say. Power Query will optimize steps, especially if you are folding queries to a server. For example, if you change the data type of 10 columns, then remove all but 2 columns, Power Query will often not bother changing the type of the discarded 8 columns. It works backwards, bottom to top to figure out what it needs. That isn't always the case, but that video I linked to above walks you through it. So, without some info on the 20 steps you are doing and the server or source you are connected to, I couldn't begin to answer.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingNot sure if I understand your question fully, but...
Folding: Power Bi will attempt to transform your 20 steps into a query to the data source. It will only generate a preview when you select a step. In that I have seen cases where:
Step 1: preview of initial data
Step 2 (grouping): Error getting data (too large)
Step 3 (filter): preview of the data since the filter allowed the data to be displayed.
This largely depends on your data source and if it will fold or not.
From my understanding:
If it doesn't fold, then not using an index is better.
If it folds on a standard database (like SQL), then adding an index should be better (although you might want to write your own SQL for this)
If it folds and is on a data source that supports column compression, then not using an index is better.
@carlpaul153 Check out these articles if you have not already:
The articles are interesting! They show that the merge is always more efficient with keys. But, do you think there will be a difference between merging with a number key (the index) or merging with a text unique value key (phrases or words)?
That is to say, my question is not between keys against not keys, but about numbers against text.
@carlpaul153 I would think it would be more efficient with an Index versus text and I swear I read about that somewhere, I will try to track it down.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.