Solved: Re: How to hasten data load after changes.

Shytype · ‎03-26-2024

Hi,

I'm fairly new to PBI and currently building a report that shows historical data.

Basically, data source is a csv file ia stored in sharepoint site document folder. Each csv files contains about 300k+ rows and static. Just new csv file is uploaded weekly.

The requirement is to show up to 18 months of data and cureently I have 12 months worth and is already 8 million rows.

Everytime the report is refeshed or whem I'm editing the report query it takes a lot of time to load.

I've been reading up the incremental refreah but it doesn't seem to be feasible for my report.

danextian · ‎03-26-2024

I'm not saying you should append them. That would be a manual process each time there is a new file but instead, strike a balance between the two approaches. The current combine feature doesn't do much other than letting you combine the files without much coding. I would do these:

Process everything in Desktop
- partition the files into different folders which means one combine will have lesser files to work on
- append each combine into one
- Or pre-process historical/static/files that are unlikely to change in Dataflow
  - Partition the files same as above, each partition having its own query
    - Save the Dataflow and refresh it
    - Connect to the Dataflow from the desktop
    - Append the Dataflow tables together with the local query (for current file/s)
    - Make sure each query both from the Dataflow and local in the Desktp have the same transformations prior to appending the. Apply additional transformations after appending if needed.
      Data from Dataflows are pre-processed so transformations are not applied to the raw data but to the result of the Dataflow instead. This should take most of the legwork from the Desktop. Dataflows have their own separate refresh and are not triggered when the semantic model itself is refreshed. Refresh the dataflows if the files within the folder the Dataflows are connected have been updated. Note: Dataflow requires a pro workspace.se the code below to combine them as using the combine feature in Dataflow will likely result to a compute entity which requires premium. Modify the path, etc
      
      let
      Source = SharePoint.Files(spsite, [ApiVersion = 15]),
      #"Filtered Rows" = Table.SelectRows(Source, each ([Folder Path] = spfolderpath)),
      #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Extension] = ".csv"),
      #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Name", "Content"}),
      #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Connect to csv", each Table.PromoteHeaders( Csv.Document([Content],[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.Csv]) ))
      in
      #"Added Custom"

Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

View solution in original post

christinepayton · ‎03-26-2024

You're going to have some slowness with 8 million rows from SharePoint. SharePoint limits the speed you can pull data with for stability reasons - it's not a data warehouse. 🙂

Make sure to remove any columns you're not using as your first step in the query, and keep the number of transformations to a minimum - that can help a bit. You might also consider putting the initial data load into a dataflow, then connecting to that doing your transforms on the desktop side to "split" the processing load. The connection to dataflows is very fast, so it can sometimes help.

View solution in original post

christinepayton · ‎03-26-2024

You're going to have some slowness with 8 million rows from SharePoint. SharePoint limits the speed you can pull data with for stability reasons - it's not a data warehouse. 🙂

Make sure to remove any columns you're not using as your first step in the query, and keep the number of transformations to a minimum - that can help a bit. You might also consider putting the initial data load into a dataflow, then connecting to that doing your transforms on the desktop side to "split" the processing load. The connection to dataflows is very fast, so it can sometimes help.

danextian · ‎03-26-2024

Hi @Shytype

How did you connect to the csv files? Did you use the combine feature in Power Query or append them to each other? The latter is faster but a manual process. The former is automatic but can be very slow if used on large files.

Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Shytype · ‎03-26-2024

I used combine, to ensure new uploaded data will get added. I haven't tried appemd yet but will try. Thank you!

danextian · ‎03-26-2024

I'm not saying you should append them. That would be a manual process each time there is a new file but instead, strike a balance between the two approaches. The current combine feature doesn't do much other than letting you combine the files without much coding. I would do these:

Process everything in Desktop
- partition the files into different folders which means one combine will have lesser files to work on
- append each combine into one
- Or pre-process historical/static/files that are unlikely to change in Dataflow
  - Partition the files same as above, each partition having its own query
    - Save the Dataflow and refresh it
    - Connect to the Dataflow from the desktop
    - Append the Dataflow tables together with the local query (for current file/s)
    - Make sure each query both from the Dataflow and local in the Desktp have the same transformations prior to appending the. Apply additional transformations after appending if needed.
      Data from Dataflows are pre-processed so transformations are not applied to the raw data but to the result of the Dataflow instead. This should take most of the legwork from the Desktop. Dataflows have their own separate refresh and are not triggered when the semantic model itself is refreshed. Refresh the dataflows if the files within the folder the Dataflows are connected have been updated. Note: Dataflow requires a pro workspace.se the code below to combine them as using the combine feature in Dataflow will likely result to a compute entity which requires premium. Modify the path, etc
      
      let
      Source = SharePoint.Files(spsite, [ApiVersion = 15]),
      #"Filtered Rows" = Table.SelectRows(Source, each ([Folder Path] = spfolderpath)),
      #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Extension] = ".csv"),
      #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Name", "Content"}),
      #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Connect to csv", each Table.PromoteHeaders( Csv.Document([Content],[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.Csv]) ))
      in
      #"Added Custom"

Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

AUDISU · ‎03-26-2024

Hi @Shytype ,
Is this in power bi desktop or Power BI service?

If this is in power bi desktop, then try Perameter.
Create a perameter and using that only load one csv file in power BI desktop. Then publish it to Power BI service and change Perameter to load all files.

The other option is, You can try functions in power query editor to clean the data.
Thanks

Shytype · ‎03-26-2024

It's on desktop as I'm still building the report. Basically if I change some queries in power query then applied. It will load those rows again. I'm worried thay when I publish the report it will also exhibit the same long load of data everytime it refreshes.

AUDISU · ‎03-26-2024

Hi @Shytype ,
Just select one file then do all data cleaning part. Once you compleate the data cleaning you can select all files..
If you have an idea to create a function for data cleaning, that will be more fast. It will take each file and do the cleaning one by one.
Thanks.

How to hasten data load after changes.

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - November 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

How to hasten data load after changes.

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - November 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly