Solved: Re: Is it more efficient to combine tables by inde...

carlpaul153 · ‎09-14-2020

I select a single column from a very large table to operate certain filters, and finally I want to combine it with the original table to get the fields that I omitted.

Is this the correct approach or would it be better to work with all the columns from the beginning?
Should I initially create an index and then do the merge, or would combining by the text column be just as efficient?

edhans · ‎09-14-2020

Hi @carlpaul153 ,

I do not think you will gain performance by:

REmoving all but a key column from a table
filter that single column table
merge the filtered column with the original table to get the columns you lost.

Just do the filter. Table.SelectRows(). Power Query doesn't hold the entire table in memory and filter it. It works backwards from how you think. It adds records to the table as it processes the original table one record at a time.

I suggest you watch this excellent video on how Power Query thinks.

In no case should you join by an index column in this case. If the column you were joining on was unique, the index will just create a 2nd unique column. An "Index" is not the same as "Indexed" which is something you do to a column in SQL Server or other RDBMS.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

MacDee123 · ‎03-23-2024

There's 2 answers I can see:

Joining by text is better because: You don't need to spend load times dynamically creating index columns. (and then deleting the original values from the fact table).

Joining by index number is better because: You don't need to store larger values in the fact table in each row. The smaller index number replaces the values. Which can be nice if you have millions upon millions of rows in your fact table. It will shrink the size of it.

Other than that, unless you absolutely need an index value for transformation purposes, it probably doesn't make much difference.

Don't forget, in all cases, use the dimension table for that column in visuals and slicers, even if you join by text. Populating slicer values is much faster doing it on 10 rows for example than it is by selecting 10 row values from millions of rows 🙂

edhans · ‎09-14-2020

Hi @carlpaul153 ,

I do not think you will gain performance by:

REmoving all but a key column from a table
filter that single column table
merge the filtered column with the original table to get the columns you lost.

Just do the filter. Table.SelectRows(). Power Query doesn't hold the entire table in memory and filter it. It works backwards from how you think. It adds records to the table as it processes the original table one record at a time.

I suggest you watch this excellent video on how Power Query thinks.

In no case should you join by an index column in this case. If the column you were joining on was unique, the index will just create a 2nd unique column. An "Index" is not the same as "Indexed" which is something you do to a column in SQL Server or other RDBMS.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

carlpaul153 · ‎09-15-2020

@edhans when I said about filtering the column, I was referring to doing a long series of complex steps (more than 20), not just a Table.SelectRows () filter; that's why I thought it might be better to just operate that column and then re-combine it, is that still a bad idea?

Regarding that "Power Query doesn't hold the entire table in memory", I wonder then how it is possible that when I navigate between those more than 20 steps that I mentioned, the preview of 1000 rows reflects data from the file in its entirety, that power query would not be able to display them if it only used the first 1000 rows of the first step. I don't know if I explain myself ... Is my doubt understood?

On the other hand, thanks for answering my question about number key vs text key

edhans · ‎09-15-2020

@carlpaul153 without knowing what you are doing, it is impossible to say. Power Query will optimize steps, especially if you are folding queries to a server. For example, if you change the data type of 10 columns, then remove all but 2 columns, Power Query will often not bother changing the type of the discarded 8 columns. It works backwards, bottom to top to figure out what it needs. That isn't always the case, but that video I linked to above walks you through it. So, without some info on the 20 steps you are doing and the server or source you are connected to, I couldn't begin to answer.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

artemus · ‎09-15-2020

Not sure if I understand your question fully, but...

Folding: Power Bi will attempt to transform your 20 steps into a query to the data source. It will only generate a preview when you select a step. In that I have seen cases where:

Step 1: preview of initial data

Step 2 (grouping): Error getting data (too large)

Step 3 (filter): preview of the data since the filter allowed the data to be displayed.

artemus · ‎09-14-2020

This largely depends on your data source and if it will fold or not.

From my understanding:

If it doesn't fold, then not using an index is better.

If it folds on a standard database (like SQL), then adding an index should be better (although you might want to write your own SQL for this)

If it folds and is on a data source that supports column compression, then not using an index is better.

Greg_Deckler · ‎09-14-2020

@carlpaul153 Check out these articles if you have not already:

https://seddryck.wordpress.com/2018/03/22/improving-then-performance-when-merging-tables-with-the-po...

https://blog.crossjoin.co.uk/2018/03/16/improving-the-performance-of-aggregation-after-a-merge-in-po...

@ImkeF , @edhans @HotChilli

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

carlpaul153 · ‎09-14-2020

The articles are interesting! They show that the merge is always more efficient with keys. But, do you think there will be a difference between merging with a number key (the index) or merging with a text unique value key (phrases or words)?
That is to say, my question is not between keys against not keys, but about numbers against text.

Greg_Deckler · ‎09-14-2020

@carlpaul153 I would think it would be more efficient with an Index versus text and I swear I read about that somewhere, I will try to track it down.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

Is it more efficient to combine tables by index than by text?

Helpful resources

Join our Fabric User Panel

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Is it more efficient to combine tables by index than by text?

Helpful resources

Join our Fabric User Panel

Fabric Community Update - June 2025