Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
peterhui50
Helper III
Helper III

Power Query folder combine -- file combining -- faster?

Hi,

 

I have a folder which we do monthly refreshes - 56 files with approx 100,000 KBs each. Each file has around 149 columns each and around 80000 rows of data. Due to the nature of the work, we can't access the SQL server o we work with flat csv files.

 

My question is, if I were to combine these files, may 25 of these into one excel file, the other 25 into another file, and the final 6 in another... would this improve performance?

 

It takes a whole day just to load these files into Power BI report view, and it's a gigantic waste of time since it just slows down the computer.  I've tried to load columns that I need, but it hasn't been all that successful either.

 

any help is appreciated!

 

 

2 ACCEPTED SOLUTIONS
edhans
Super User
Super User

@peterhui50 can you not use this:

=List.ContainsAny({"A".."Z"}, Text.ToList([Column1]))

It returns true or false. It is case sensitive. So the "xxxxx" value returns false since it isn't in A-Z

edhans_0-1612498287053.png

Changing it to this returns Yes or No.

if List.ContainsAny({"A".."Z"}, Text.ToList([Column1])) then "Yes" else "No"

And changing it to this is case insensitive.

if List.ContainsAny({"A".."Z", "a".."z"}, Text.ToList([Column1])) then "Yes" else "No"

 

edhans_1-1612498417076.png

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

edhans
Super User
Super User

@peterhui50 - post back with performance enhancements. I don't know how much more efficient a List.Contains with Text.ToList is vs a single if/then/else with Text.Contains, but it surely must be more efficient than 26 of them!

Thanks to @mahoneypat for asking the right question. I was just thinking originally of a lot of data. Looking at the transformations should have been at the top of my list as well.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

10 REPLIES 10
edhans
Super User
Super User

I use the heck out of List.Contains*() functions. Seems more versitile and faster than Table.SelectRows(), even when query folding against SQL Server.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Thank you! I love this list.contains solutions. I've used it for various other queries and performance seemed to have improved. 

 

I have to add conditions in here , but not sure what I've done wrong

 

peterhui50_0-1612561251994.png

 

The File_Code is "A" and FILE is "A1" ... I did a 

 

if List.Contains({"A"},Text.ToList([Files_Code])) and List.Contains({"A1"},Text.ToList([FILES])) then "True" else "False"

 

how can it return TRUE???

edhans
Super User
Super User

@peterhui50 - post back with performance enhancements. I don't know how much more efficient a List.Contains with Text.ToList is vs a single if/then/else with Text.Contains, but it surely must be more efficient than 26 of them!

Thanks to @mahoneypat for asking the right question. I was just thinking originally of a lot of data. Looking at the transformations should have been at the top of my list as well.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Thank you everyone. The details and solutions are very much appreciated. I love the simple solution of List.ContainsAny and the Text.ToList.

 

I am looking into using the PQ diagnostic tool and that column turns out maybe not be needed. 

Jimmy801
Community Champion
Community Champion

Hello @peterhui50 

 

I did some tests as well.

a content that was stored as xlsx (43MByte) took 16 seconds to load (150 column, 80k rows)

If you have 56 xlsx x 100Mbyte should be loaded in approx. 1 hour. But when you load only 70 columns you need about 12 seconds for a file of 43 MByte.

So immediatly get rid of any columns in the query. But for sure if you have complex transformation it will take some time longer. And as the other already mentioning you have to enhance your transformations.

 

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

edhans
Super User
Super User

@peterhui50 can you not use this:

=List.ContainsAny({"A".."Z"}, Text.ToList([Column1]))

It returns true or false. It is case sensitive. So the "xxxxx" value returns false since it isn't in A-Z

edhans_0-1612498287053.png

Changing it to this returns Yes or No.

if List.ContainsAny({"A".."Z"}, Text.ToList([Column1])) then "Yes" else "No"

And changing it to this is case insensitive.

if List.ContainsAny({"A".."Z", "a".."z"}, Text.ToList([Column1])) then "Yes" else "No"

 

edhans_1-1612498417076.png

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
mahoneypat
Microsoft Employee
Microsoft Employee

That sounds way slower than it should be for that.  What transformations are you doing in the Transform from example function/query?  Are you doing multiple merges in the transform example or the final query?  If you share your M code, we can help troubleshoot the slowness.  Copy/paste from the advanced editor.

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Thank you so much,

 

I don't know if I can even fit the code on to here, mostly because of the columns, but further inspection, there are no merges. However, there are a lot of conditional steps -- here is just one of them, I think there are 4 more steps that are like this as well. Is there a way to do this simpler?

 

#"Added Conditional Column1" = Table.AddColumn(#"Inserted First Characters", "Files_Code contains text", each if Text.Contains([FILES_CODE], "A") then "YES" else if Text.Contains([FILES_CODE], "B") then "YES" else if Text.Contains([FILES_CODE], "C") then "YES" else if Text.Contains([FILES_CODE], "D") then "YES" else if Text.Contains([FILES_CODE], "E") then "YES" else if Text.Contains([FILES_CODE], "F") then "YES" else if Text.Contains([FILES_CODE], "G") then "YES" else if Text.Contains([FILES_CODE], "H") then "YES" else if Text.Contains([FILES_CODE], "I") then "YES" else if Text.Contains([FILES_CODE], "J") then "YES" else if Text.Contains([FILES_CODE], "K") then "YES" else if Text.Contains([FILES_CODE], "L") then "YES" else if Text.Contains([FILES_CODE], "M") then "YES" else if Text.Contains([FILES_CODE], "N") then "YES" else if Text.Contains([FILES_CODE], "O") then "YES" else if Text.Contains([FILES_CODE], "P") then "YES" else if Text.Contains([FILES_CODE], "Q") then "YES" else if Text.Contains([FILES_CODE], "R") then "YES" else if Text.Contains([FILES_CODE], "S") then "YES" else if Text.Contains([FILES_CODE], "T") then "YES" else if Text.Contains([FILES_CODE], "U") then "YES" else if Text.Contains([FILES_CODE], "V") then "YES" else if Text.Contains([FILES_CODE], "W") then "YES" else if Text.Contains([FILES_CODE], "X") then "YES" else if Text.Contains([FILES_CODE], "Y") then "YES" else if Text.Contains([FILES_CODE], "Z") then "YES" else "NO"), 

I think this can be made much faster.  Two things.

 

1. Try removing your custom column and see how fast the data load.  If that alone takes a long time, a different fix is needed.

 

2.  Try this approach instead for your custom column.  To see how it works, just create a blank query, go to Advanced Editor, and replace the text there with the M code below.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKs7PTVUoSa0oUYrViVbKL8lILUJwE/MgAvl5qWB+RWWVUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Files_Code = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Files_Code", type text}}),
    MyList = List.Buffer({"a", "b", "c", "d", "e"}),
    Custom2 = #"Changed Type",
    #"Added Custom" = Table.AddColumn(Custom2, "Custom", each let mytext = [Files_Code] in if List.Count(List.Select(MyList, each Text.Contains(mytext, _)))>0 then "Y" else "N")
in
    #"Added Custom"

 

I made up some sample data, made the list of letters/words to look up in the MyList step (note that M is case sensitive, so you may want to make your Files_Code column UPPER or lower case before this step.  It then counts how many items in the list are found in your Files_Code and returns Y if any are found.  Please report back if it improves things.

 

Pat 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


edhans
Super User
Super User

I've not done any benchmarking, but to me it seems CSV is faster than XLSX when accessing large amounts of data.

You will have to test and experiment, but I will say this:

  • Remove unnecessary columns early in the process (or from the source is even better)
  • Filter early in the process if anything is filtered out so later transformation happen on the smaller recordset.
  • Consider Azure DataLake for file storage. It seems to me to work faster and more reliably than files in SharePoint with large datasets.


Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors