Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Shytype
New Member

How to hasten data load after changes.

Hi,

 

I'm fairly new to PBI and currently building a report that shows historical data.

 

Basically, data source is a csv file ia stored in sharepoint site document folder. Each csv files contains about 300k+ rows and static. Just new csv file is uploaded weekly.

 

The requirement is to show up to 18 months of data and cureently I have 12 months worth and is already 8 million rows.

 

Everytime the report is refeshed or whem I'm editing the report query it takes a lot of time to load.

 

I've been reading up the incremental refreah but it doesn't seem to be feasible for my report.

2 ACCEPTED SOLUTIONS

I'm not saying you should append them. That would be a manual process each time there is a new file but instead, strike a balance between the two approaches. The current combine feature doesn't do much other than letting you combine the files without much coding. I would do these:

  • Process everything in Desktop
    • partition the files into different folders which means one combine will have lesser files to work on
    • append each combine into one
    • Or pre-process historical/static/files that are unlikely to change in Dataflow
      • Partition the files same as above, each partition having its own query
        • Save the Dataflow and refresh it
        • Connect to the Dataflow from the desktop
        • Append the Dataflow tables together with the local query (for current file/s)
        • Make sure each query both from the Dataflow and local in the Desktp have the same transformations prior to appending the. Apply additional transformations after appending if needed.

          Data from Dataflows are pre-processed so transformations are not applied to the raw data but to the result of the Dataflow instead. This should take most of the legwork from the Desktop. Dataflows have their own separate refresh and are not triggered when the semantic model itself is refreshed. Refresh the dataflows if the files within the folder the Dataflows are connected have been updated. Note: Dataflow requires a pro workspace.se the code below to combine them as using the combine feature in Dataflow will likely result to a compute entity which requires premium. Modify the path, etc

           

           let
          Source = SharePoint.Files(spsite, [ApiVersion = 15]),
          #"Filtered Rows" = Table.SelectRows(Source, each ([Folder Path] = spfolderpath)),
          #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Extension] = ".csv"),
          #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Name", "Content"}),
          #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Connect to csv", each Table.PromoteHeaders( Csv.Document([Content],[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.Csv]) ))
          in
          #"Added Custom" 

          danextian_3-1711452597633.png





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

View solution in original post

christinepayton
Super User
Super User

You're going to have some slowness with 8 million rows from SharePoint. SharePoint limits the speed you can pull data with for stability reasons - it's not a data warehouse. 🙂

 

Make sure to remove any columns you're not using as your first step in the query, and keep the number of transformations to a minimum - that can help a bit. You might also consider putting the initial data load into a dataflow, then connecting to that doing your transforms on the desktop side to "split" the processing load. The connection to dataflows is very fast, so it can sometimes help. 

View solution in original post

7 REPLIES 7
christinepayton
Super User
Super User

You're going to have some slowness with 8 million rows from SharePoint. SharePoint limits the speed you can pull data with for stability reasons - it's not a data warehouse. 🙂

 

Make sure to remove any columns you're not using as your first step in the query, and keep the number of transformations to a minimum - that can help a bit. You might also consider putting the initial data load into a dataflow, then connecting to that doing your transforms on the desktop side to "split" the processing load. The connection to dataflows is very fast, so it can sometimes help. 

danextian
Super User
Super User

Hi @Shytype 

 

How did you connect to the csv files? Did you use the combine feature in Power Query  or append them to each other? The latter is faster but a manual process.  The former is automatic but can be very slow if used on large files.





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

I used combine, to ensure new uploaded data will get added. I haven't tried appemd yet but will try. Thank you!

I'm not saying you should append them. That would be a manual process each time there is a new file but instead, strike a balance between the two approaches. The current combine feature doesn't do much other than letting you combine the files without much coding. I would do these:

  • Process everything in Desktop
    • partition the files into different folders which means one combine will have lesser files to work on
    • append each combine into one
    • Or pre-process historical/static/files that are unlikely to change in Dataflow
      • Partition the files same as above, each partition having its own query
        • Save the Dataflow and refresh it
        • Connect to the Dataflow from the desktop
        • Append the Dataflow tables together with the local query (for current file/s)
        • Make sure each query both from the Dataflow and local in the Desktp have the same transformations prior to appending the. Apply additional transformations after appending if needed.

          Data from Dataflows are pre-processed so transformations are not applied to the raw data but to the result of the Dataflow instead. This should take most of the legwork from the Desktop. Dataflows have their own separate refresh and are not triggered when the semantic model itself is refreshed. Refresh the dataflows if the files within the folder the Dataflows are connected have been updated. Note: Dataflow requires a pro workspace.se the code below to combine them as using the combine feature in Dataflow will likely result to a compute entity which requires premium. Modify the path, etc

           

           let
          Source = SharePoint.Files(spsite, [ApiVersion = 15]),
          #"Filtered Rows" = Table.SelectRows(Source, each ([Folder Path] = spfolderpath)),
          #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Extension] = ".csv"),
          #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Name", "Content"}),
          #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Connect to csv", each Table.PromoteHeaders( Csv.Document([Content],[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.Csv]) ))
          in
          #"Added Custom" 

          danextian_3-1711452597633.png





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.
AUDISU
Resolver III
Resolver III

Hi @Shytype ,
Is this in power bi desktop or Power BI service?

If this is in power bi desktop, then try Perameter.
Create a perameter and using that only load one csv file in power BI desktop. Then publish it to Power BI service and change Perameter to load all files.

 

The other option is, You can try functions in power query editor to clean the data.
Thanks

It's on desktop as I'm still building the report. Basically if I change some queries in power query then applied. It will load those rows again. I'm worried thay when I publish the report it will also exhibit the same long load of data everytime it refreshes.

Hi @Shytype ,
Just select one file then do all data cleaning part. Once you compleate the data cleaning you can select all files..
If you have an idea to create a function for data cleaning, that will be more fast. It will take each file and do the cleaning one by one.
Thanks.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.