Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
aAmirkHan
Helper II
Helper II

How to manage large dataset in Power Query

Hi,

I have a large csv dataset of up to 400MB that I have uploaded into Power Query. When I start clearing and doing basic data transformation then it every time takes a lot of time to refresh data in Power Query in the background. How to make it faster for data clearing and transforming? Please advise on this issue.

 

Thank you

1 ACCEPTED SOLUTION
Omid_Motamedise
Super User
Super User

Its completely depend on the steps you apply, but if you use UI and the tasks took such amount, I recomend you consider the below points:
Minimize Data Load
Filter Early: Apply filters as early as possible in your query to reduce the number of rows processed in subsequent steps.
Select Relevant Columns: Use the "Choose Columns" option to remove unnecessary columns right after importing data.
2. Optimize Query Design
Disable Background Preview: Disable "Enable Background Data Preview" in Power Query options to prevent unnecessary refreshes while editing.
Use Table.Buffer: For sorted or grouped operations, wrap the data in Table.Buffer to avoid re-evaluating steps multiple times.
Avoid Repeated Steps: Combine transformations into fewer steps to reduce dependency chains.
3. Leverage Native Queries
Push Operations to Source: Use SQL queries or other source-side operations (if supported) to filter, group, or transform data before it reaches Power Query.
Direct Query: When connecting to databases, write native SQL queries to retrieve only the required data.
4. Reduce Complexity
Avoid Nested Functions: Minimize the use of nested or custom functions in steps, as they can be slower with large datasets.
Simplify Logic: Break down complex logic into smaller, more efficient operations.
5. Manage Data Types
Set Data Types Once: Apply data type changes in the final steps, as Power Query evaluates this operation each time it's performed.
Remove Duplicates Smartly: Use deduplication only on necessary columns instead of entire rows.
6. Handle Merging and Joining Carefully
Sort Before Merge: Sort tables before merging to improve performance.
Reduce Table Size for Joins: Pre-filter data to include only rows that are necessary for merging.
7. Aggregate Efficiently
Group By Wisely: Avoid grouping by too many columns, and aggregate only the necessary metrics.
8. Use Power Query Settings
Increase Data Cache Memory: Go to Query Options > Data Load and increase cache memory to improve performance for large datasets.
Enable Parallel Loading: Allow parallel loading of tables in the query options.
9. Test with Sample Data
Work with Smaller Data Samples: Use a filtered subset of data while designing queries, then apply the full dataset for final execution.
10. Review Performance Metrics
Use Query Diagnostics: Analyze performance bottlenecks using Power Query's built-in Query Diagnostics feature.
Monitor Data Refresh Times: Identify steps that take the most time and optimize them.


If my answer helped solve your issue, please consider marking it as the accepted solution. It helps others in the community find answers faster—and keeps the community growing stronger!
You can also check out my YouTube channel for tutorials, tips, and real-world solutions in Power Query with the following link
https://youtube.com/@omidbi?si=96Bo-ZsSwOx0Z36h

View solution in original post

5 REPLIES 5
v-sgandrathi
Community Support
Community Support

Hi @aAmirkHan,

Thank you  for reaching out to the Microsoft Fabric Community forum.

I trust the @Omid_Motamedise  response is accurate and will address your issue.

If you have any further questions or updates regarding your issue, feel free to ask, and we will look into that.

 

If theSuper user's answer meets your requirements, please consider marking it as the Accepted solution to help the other members find it more quickly.

 

Regards,
Sahasra.

Omid_Motamedise
Super User
Super User

Its completely depend on the steps you apply, but if you use UI and the tasks took such amount, I recomend you consider the below points:
Minimize Data Load
Filter Early: Apply filters as early as possible in your query to reduce the number of rows processed in subsequent steps.
Select Relevant Columns: Use the "Choose Columns" option to remove unnecessary columns right after importing data.
2. Optimize Query Design
Disable Background Preview: Disable "Enable Background Data Preview" in Power Query options to prevent unnecessary refreshes while editing.
Use Table.Buffer: For sorted or grouped operations, wrap the data in Table.Buffer to avoid re-evaluating steps multiple times.
Avoid Repeated Steps: Combine transformations into fewer steps to reduce dependency chains.
3. Leverage Native Queries
Push Operations to Source: Use SQL queries or other source-side operations (if supported) to filter, group, or transform data before it reaches Power Query.
Direct Query: When connecting to databases, write native SQL queries to retrieve only the required data.
4. Reduce Complexity
Avoid Nested Functions: Minimize the use of nested or custom functions in steps, as they can be slower with large datasets.
Simplify Logic: Break down complex logic into smaller, more efficient operations.
5. Manage Data Types
Set Data Types Once: Apply data type changes in the final steps, as Power Query evaluates this operation each time it's performed.
Remove Duplicates Smartly: Use deduplication only on necessary columns instead of entire rows.
6. Handle Merging and Joining Carefully
Sort Before Merge: Sort tables before merging to improve performance.
Reduce Table Size for Joins: Pre-filter data to include only rows that are necessary for merging.
7. Aggregate Efficiently
Group By Wisely: Avoid grouping by too many columns, and aggregate only the necessary metrics.
8. Use Power Query Settings
Increase Data Cache Memory: Go to Query Options > Data Load and increase cache memory to improve performance for large datasets.
Enable Parallel Loading: Allow parallel loading of tables in the query options.
9. Test with Sample Data
Work with Smaller Data Samples: Use a filtered subset of data while designing queries, then apply the full dataset for final execution.
10. Review Performance Metrics
Use Query Diagnostics: Analyze performance bottlenecks using Power Query's built-in Query Diagnostics feature.
Monitor Data Refresh Times: Identify steps that take the most time and optimize them.


If my answer helped solve your issue, please consider marking it as the accepted solution. It helps others in the community find answers faster—and keeps the community growing stronger!
You can also check out my YouTube channel for tutorials, tips, and real-world solutions in Power Query with the following link
https://youtube.com/@omidbi?si=96Bo-ZsSwOx0Z36h
lbendlin
Super User
Super User

400 MB is not large.

 

- how many rows? 

- how many columns?

- are you attempting merges?

- are you attempting sorting?

 

You can use the Power Query Diagnostics options to pinpoint exactly where your process takes the longest. Then you can decide on how to refactor that part.

 

 

Ideally you will want to ingest the CSV files as is - no transforms at all.

Thank you for your response to my query.

In this single CSV dataset, there are 25 Million rows and 16 columns. I need to do some basic data-cleaning tasks in Power Query.

Please suggest to me the best way to manage such a dataset in Power Query.

Please show a sanitized version of the query.

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.