Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi,
I have a large csv dataset of up to 400MB that I have uploaded into Power Query. When I start clearing and doing basic data transformation then it every time takes a lot of time to refresh data in Power Query in the background. How to make it faster for data clearing and transforming? Please advise on this issue.
Thank you
Solved! Go to Solution.
Its completely depend on the steps you apply, but if you use UI and the tasks took such amount, I recomend you consider the below points:
Minimize Data Load
Filter Early: Apply filters as early as possible in your query to reduce the number of rows processed in subsequent steps.
Select Relevant Columns: Use the "Choose Columns" option to remove unnecessary columns right after importing data.
2. Optimize Query Design
Disable Background Preview: Disable "Enable Background Data Preview" in Power Query options to prevent unnecessary refreshes while editing.
Use Table.Buffer: For sorted or grouped operations, wrap the data in Table.Buffer to avoid re-evaluating steps multiple times.
Avoid Repeated Steps: Combine transformations into fewer steps to reduce dependency chains.
3. Leverage Native Queries
Push Operations to Source: Use SQL queries or other source-side operations (if supported) to filter, group, or transform data before it reaches Power Query.
Direct Query: When connecting to databases, write native SQL queries to retrieve only the required data.
4. Reduce Complexity
Avoid Nested Functions: Minimize the use of nested or custom functions in steps, as they can be slower with large datasets.
Simplify Logic: Break down complex logic into smaller, more efficient operations.
5. Manage Data Types
Set Data Types Once: Apply data type changes in the final steps, as Power Query evaluates this operation each time it's performed.
Remove Duplicates Smartly: Use deduplication only on necessary columns instead of entire rows.
6. Handle Merging and Joining Carefully
Sort Before Merge: Sort tables before merging to improve performance.
Reduce Table Size for Joins: Pre-filter data to include only rows that are necessary for merging.
7. Aggregate Efficiently
Group By Wisely: Avoid grouping by too many columns, and aggregate only the necessary metrics.
8. Use Power Query Settings
Increase Data Cache Memory: Go to Query Options > Data Load and increase cache memory to improve performance for large datasets.
Enable Parallel Loading: Allow parallel loading of tables in the query options.
9. Test with Sample Data
Work with Smaller Data Samples: Use a filtered subset of data while designing queries, then apply the full dataset for final execution.
10. Review Performance Metrics
Use Query Diagnostics: Analyze performance bottlenecks using Power Query's built-in Query Diagnostics feature.
Monitor Data Refresh Times: Identify steps that take the most time and optimize them.
Hi @aAmirkHan,
Thank you for reaching out to the Microsoft Fabric Community forum.
I trust the @Omid_Motamedise response is accurate and will address your issue.
If you have any further questions or updates regarding your issue, feel free to ask, and we will look into that.
If theSuper user's answer meets your requirements, please consider marking it as the Accepted solution to help the other members find it more quickly.
Regards,
Sahasra.
Its completely depend on the steps you apply, but if you use UI and the tasks took such amount, I recomend you consider the below points:
Minimize Data Load
Filter Early: Apply filters as early as possible in your query to reduce the number of rows processed in subsequent steps.
Select Relevant Columns: Use the "Choose Columns" option to remove unnecessary columns right after importing data.
2. Optimize Query Design
Disable Background Preview: Disable "Enable Background Data Preview" in Power Query options to prevent unnecessary refreshes while editing.
Use Table.Buffer: For sorted or grouped operations, wrap the data in Table.Buffer to avoid re-evaluating steps multiple times.
Avoid Repeated Steps: Combine transformations into fewer steps to reduce dependency chains.
3. Leverage Native Queries
Push Operations to Source: Use SQL queries or other source-side operations (if supported) to filter, group, or transform data before it reaches Power Query.
Direct Query: When connecting to databases, write native SQL queries to retrieve only the required data.
4. Reduce Complexity
Avoid Nested Functions: Minimize the use of nested or custom functions in steps, as they can be slower with large datasets.
Simplify Logic: Break down complex logic into smaller, more efficient operations.
5. Manage Data Types
Set Data Types Once: Apply data type changes in the final steps, as Power Query evaluates this operation each time it's performed.
Remove Duplicates Smartly: Use deduplication only on necessary columns instead of entire rows.
6. Handle Merging and Joining Carefully
Sort Before Merge: Sort tables before merging to improve performance.
Reduce Table Size for Joins: Pre-filter data to include only rows that are necessary for merging.
7. Aggregate Efficiently
Group By Wisely: Avoid grouping by too many columns, and aggregate only the necessary metrics.
8. Use Power Query Settings
Increase Data Cache Memory: Go to Query Options > Data Load and increase cache memory to improve performance for large datasets.
Enable Parallel Loading: Allow parallel loading of tables in the query options.
9. Test with Sample Data
Work with Smaller Data Samples: Use a filtered subset of data while designing queries, then apply the full dataset for final execution.
10. Review Performance Metrics
Use Query Diagnostics: Analyze performance bottlenecks using Power Query's built-in Query Diagnostics feature.
Monitor Data Refresh Times: Identify steps that take the most time and optimize them.
400 MB is not large.
- how many rows?
- how many columns?
- are you attempting merges?
- are you attempting sorting?
You can use the Power Query Diagnostics options to pinpoint exactly where your process takes the longest. Then you can decide on how to refactor that part.
Ideally you will want to ingest the CSV files as is - no transforms at all.
Thank you for your response to my query.
In this single CSV dataset, there are 25 Million rows and 16 columns. I need to do some basic data-cleaning tasks in Power Query.
Please suggest to me the best way to manage such a dataset in Power Query.
Please show a sanitized version of the query.
Check out the July 2025 Power BI update to learn about new features.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
10 | |
7 | |
7 | |
6 | |
6 |