This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreLevel up your Power BI skills this month - build one visual each week and tell better stories with data! Get started
Hi All,
we have a large tables with more than 2 billion of records in a table.
Is there any data limitation to bring the more than 2 billion records table into Dataflow Gen2? Will it be capable to load whole data? where the Dataflow Gen2 will be saved this data in colud?
Can anyone answers this?
Thanks,
Sri
Solved! Go to Solution.
Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.
In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn
Hi @Koritala ,
For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.
Recommended pattern:
Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).
Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.
Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.
Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.
In short:
Raw / lightly transformed data → Dataflow Gen2
Complex transformations → Lakehouse / Warehouse / Spark
This aligns with Microsoft Fabric best practices for large-scale data.
Thanks,
Sai Teja
Hi @Koritala,
While it may work with dataflows gen 2, I would highly recommend a more scaleable approach.
I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient.
Proud to be a Super User! | |
Hi @Koritala,
While it may work with dataflows gen 2, I would highly recommend a more scaleable approach.
I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient.
Proud to be a Super User! | |
Hi @Koritala ,
For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.
Recommended pattern:
Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).
Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.
Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.
Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.
In short:
Raw / lightly transformed data → Dataflow Gen2
Complex transformations → Lakehouse / Warehouse / Spark
This aligns with Microsoft Fabric best practices for large-scale data.
Thanks,
Sai Teja
Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.
In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn
Check out the April 2026 Power BI update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
| User | Count |
|---|---|
| 11 | |
| 9 | |
| 8 | |
| 6 | |
| 6 |
| User | Count |
|---|---|
| 41 | |
| 27 | |
| 25 | |
| 22 | |
| 22 |