Don't miss your chance to take exam DP-600 or DP-700 on us!
Request nowLearn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now
Hi All,
we have a large tables with more than 2 billion of records in a table.
Is there any data limitation to bring the more than 2 billion records table into Dataflow Gen2? Will it be capable to load whole data? where the Dataflow Gen2 will be saved this data in colud?
Can anyone answers this?
Thanks,
Sri
Solved! Go to Solution.
Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.
In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn
Hi @Koritala ,
For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.
Recommended pattern:
Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).
Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.
Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.
Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.
In short:
Raw / lightly transformed data → Dataflow Gen2
Complex transformations → Lakehouse / Warehouse / Spark
This aligns with Microsoft Fabric best practices for large-scale data.
Thanks,
Sai Teja
Hi @Koritala,
While it may work with dataflows gen 2, I would highly recommend a more scaleable approach.
I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient.
Proud to be a Super User! | |
Hi @Koritala,
While it may work with dataflows gen 2, I would highly recommend a more scaleable approach.
I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient.
Proud to be a Super User! | |
Hi @Koritala ,
For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.
Recommended pattern:
Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).
Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.
Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.
Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.
In short:
Raw / lightly transformed data → Dataflow Gen2
Complex transformations → Lakehouse / Warehouse / Spark
This aligns with Microsoft Fabric best practices for large-scale data.
Thanks,
Sai Teja
Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.
In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
Check out the February 2026 Power BI update to learn about new features.
| User | Count |
|---|---|
| 20 | |
| 18 | |
| 11 | |
| 11 | |
| 7 |
| User | Count |
|---|---|
| 42 | |
| 38 | |
| 21 | |
| 21 | |
| 17 |