Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now

Reply
Koritala
Helper V
Helper V

Data volume Limitation With Dataflow Gen2

Hi All,

we have a large tables with more than 2 billion of records in a table. 

Is there any data limitation to bring the more than 2 billion records table into Dataflow Gen2? Will it be capable to load whole data? where the Dataflow Gen2 will be saved this data in colud?

Can anyone answers this?

Thanks,

Sri

3 ACCEPTED SOLUTIONS
Zanqueta
Super User
Super User

Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.

 

In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.

 

If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn

View solution in original post

SaiTejaTalasila
Super User
Super User

Hi @Koritala ,

 

For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.

 

Recommended pattern:

 

Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).

 

Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.

 

Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.

 

Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.

 

In short:

 

Raw / lightly transformed data → Dataflow Gen2

 

Complex transformations → Lakehouse / Warehouse / Spark

 

This aligns with Microsoft Fabric best practices for large-scale data.

 

Thanks,

Sai Teja 

View solution in original post

tayloramy
Super User
Super User

Hi @Koritala

 

While it may work with dataflows gen 2, I would highly recommend a more scaleable approach. 

I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient. 

 

 





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!





View solution in original post

3 REPLIES 3
tayloramy
Super User
Super User

Hi @Koritala

 

While it may work with dataflows gen 2, I would highly recommend a more scaleable approach. 

I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient. 

 

 





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!





SaiTejaTalasila
Super User
Super User

Hi @Koritala ,

 

For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.

 

Recommended pattern:

 

Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).

 

Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.

 

Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.

 

Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.

 

In short:

 

Raw / lightly transformed data → Dataflow Gen2

 

Complex transformations → Lakehouse / Warehouse / Spark

 

This aligns with Microsoft Fabric best practices for large-scale data.

 

Thanks,

Sai Teja 

Zanqueta
Super User
Super User

Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.

 

In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.

 

If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Power BI Update Carousel

Power BI Monthly Update - February 2026

Check out the February 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.