Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
I'm building a report that use a big dataflow table (185 000 000 lines) in importer mode
Out of these 185 000 000 lines (roughly 20 Go) I'm using only around 2 000 000 once filtered
I'm looking for a way to either :
- filter my 2 000 000 lines before loading the data, so it does not load all the 20 Go of data but only a fraction of that
- partially load my power query (in a perfect world : not reload the data and only apply the step after that) so I don't have to reload the whole 20 Go of data every time I change something to the query
Is either possible ? I googled both and could'nt find any solution
Thank you PavanLalwani
Hi, @Arkhos94
If you use a database to store data, you can limit the amount of import when connecting by:
This method still works with DataFlow.
If you use some static files such as Excel, csv:
This step mainly uses Table.FirstN. Excel doesn't have a query engine, so it defaults to all workbook rows when importing.
How to Get Your Question Answered Quickly
If it does not help, please provide more details with your desired output and pbix file without privacy information (or some sample data) .
Best Regards
Jianpeng Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Handling such a large dataset in Power BI can be challenging, but there are ways to optimize your dataflow to reduce loading time and memory usage. Here are solutions to focus on loading only the required 2 million rows, either by filtering before data import or managing the data loading process more efficiently.
If possible, filter the data in the dataflow before it even reaches Power BI. This way, only the necessary rows are loaded into Power BI, significantly reducing memory load.
If you cannot modify the dataflow or use DirectQuery, you can try partitioning the data in Power Query to limit the rows loaded.
Each of these methods has trade-offs, but filtering directly in the dataflow or using DirectQuery will generally be the most efficient. If the underlying data source is too slow for DirectQuery, consider Incremental Refresh to avoid loading the entire dataset repeatedly.
If this solution brightened your path or made things easier, please consider giving kudos. Your recognition not only uplifts those who helped but inspires others to keep contributing for the good of our community!
Thanks a lot for your input.
To add some information :
The data that allows me to filter my 185 millions line down to 2 millions came from other tables
My Table contains all the component of all vehicle produced by my company (with a column with the component ID and one with the vehicle ID)
I filter down to 2 millions line by merging the table with a "billing" table (to identify the country of each vehicle and filter to my country only) and a "component" table (to filter only to the component related to some specific function : tyre, cooling, ...). The billing table is part of the same dataflow of the one I have issue with, the "component" table is local. I cannot reduce the data in the dataflow itself as it is used by other country/for other components
So the option 1 (filter direct filter) seems un-usable, do you agree ?
Option 2 => Seems like a good idea, I will try and check the nex option if it does not improve things enough
User | Count |
---|---|
64 | |
55 | |
46 | |
31 | |
31 |
User | Count |
---|---|
84 | |
73 | |
52 | |
48 | |
43 |