Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
carlpaul153
Helper I
Helper I

Is it more efficient to combine tables by index than by text?

I select a single column from a very large table to operate certain filters, and finally I want to combine it with the original table to get the fields that I omitted.

  1. Is this the correct approach or would it be better to work with all the columns from the beginning?
  2. Should I initially create an index and then do the merge, or would combining by the text column be just as efficient?
1 ACCEPTED SOLUTION
edhans
Super User
Super User

Hi @carlpaul153 ,

 

I do not think you will gain performance by:

  1. REmoving all but a key column from a table
  2. filter that single column table
  3. merge the filtered column with the original table to get the columns you lost.

Just do the filter. Table.SelectRows(). Power Query doesn't hold the entire table in memory and filter it. It works backwards from how you think. It adds records to the table as it processes the original table one record at a time.

I suggest you watch this excellent video on how Power Query thinks.

 

In no case should you join by an index column in this case. If the column you were joining on was unique, the index will just create a 2nd unique column. An "Index" is not the same as "Indexed" which is something you do to a column in SQL Server or other RDBMS. 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

9 REPLIES 9
MacDee123
New Member

There's 2 answers I can see: 

 

Joining by text is better because:  You don't need to spend load times dynamically creating index columns. (and then deleting the original values from the fact table).

 

 

Joining by index number is better because:  You don't need to store larger values in the fact table in each row. The smaller index number replaces the values.  Which can be nice if you have millions upon millions of rows in your fact table. It will shrink the size of it.

 

Other than that, unless you absolutely need an index value for transformation purposes, it probably doesn't make much difference.

 

Don't forget, in all cases, use the dimension table for that column in visuals and slicers, even if you join by text.  Populating slicer values is much faster doing it on 10 rows for example than it is by selecting 10 row values from millions of rows 🙂

edhans
Super User
Super User

Hi @carlpaul153 ,

 

I do not think you will gain performance by:

  1. REmoving all but a key column from a table
  2. filter that single column table
  3. merge the filtered column with the original table to get the columns you lost.

Just do the filter. Table.SelectRows(). Power Query doesn't hold the entire table in memory and filter it. It works backwards from how you think. It adds records to the table as it processes the original table one record at a time.

I suggest you watch this excellent video on how Power Query thinks.

 

In no case should you join by an index column in this case. If the column you were joining on was unique, the index will just create a 2nd unique column. An "Index" is not the same as "Indexed" which is something you do to a column in SQL Server or other RDBMS. 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans when I said about filtering the column, I was referring to doing a long series of complex steps (more than 20), not just a Table.SelectRows () filter; that's why I thought it might be better to just operate that column and then re-combine it, is that still a bad idea?

 

Regarding that "Power Query doesn't hold the entire table in memory", I wonder then how it is possible that when I navigate between those more than 20 steps that I mentioned, the preview of 1000 rows reflects data from the file in its entirety, that power query would not be able to display them if it only used the first 1000 rows of the first step. I don't know if I explain myself ... Is my doubt understood?

 

On the other hand, thanks for answering my question about number key vs text key

@carlpaul153 without knowing what you are doing, it is impossible to say. Power Query will optimize steps, especially if you are folding queries to a server. For example, if you change the data type of 10 columns, then remove all but 2 columns, Power Query will often not bother changing the type of the discarded 8 columns. It works backwards, bottom to top to figure out what it needs. That isn't always the case, but that video I linked to above walks you through it. So, without some info on the 20 steps you are doing and the server or source you are connected to, I couldn't begin to answer.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Not sure if I understand your question fully, but...

 

Folding: Power Bi will attempt to transform your 20 steps into a query to the data source. It will only generate a preview when you select a step. In that I have seen cases where:

 

Step 1: preview of initial data

Step 2 (grouping): Error getting data (too large)

Step 3 (filter): preview of the data since the filter allowed the data to be displayed.

artemus
Employee
Employee

This largely depends on your data source and if it will fold or not.

 

From my understanding:

If it doesn't fold, then not using an index is better.

If it folds on a standard database (like SQL), then adding an index should be better (although you might want to write your own SQL for this)

If it folds and is on a data source that supports column compression, then not using an index is better.

Greg_Deckler
Super User
Super User

@carlpaul153 Check out these articles if you have not already:

https://seddryck.wordpress.com/2018/03/22/improving-then-performance-when-merging-tables-with-the-po...

https://blog.crossjoin.co.uk/2018/03/16/improving-the-performance-of-aggregation-after-a-merge-in-po...

@ImkeF , @edhans @HotChilli 


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

The articles are interesting! They show that the merge is always more efficient with keys. But, do you think there will be a difference between merging with a number key (the index) or merging with a text unique value key (phrases or words)?
That is to say, my question is not between keys against not keys, but about numbers against text.

Greg_Deckler
Super User
Super User

@carlpaul153 I would think it would be more efficient with an Index versus text and I swear I read about that somewhere, I will try to track it down.


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
Fabcon_Europe_Social_Bogo

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Top Solution Authors
Top Kudoed Authors