Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

We've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now

Reply
Koritala
Post Patron
Post Patron

Data volume Limitation With Dataflow Gen2

Hi All,

we have a large tables with more than 2 billion of records in a table. 

Is there any data limitation to bring the more than 2 billion records table into Dataflow Gen2? Will it be capable to load whole data? where the Dataflow Gen2 will be saved this data in colud?

Can anyone answers this?

Thanks,

Sri

3 ACCEPTED SOLUTIONS
Zanqueta
Super User
Super User

Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.

 

In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.

 

If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn

View solution in original post

SaiTejaTalasila
Super User
Super User

Hi @Koritala ,

 

For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.

 

Recommended pattern:

 

Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).

 

Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.

 

Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.

 

Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.

 

In short:

 

Raw / lightly transformed data → Dataflow Gen2

 

Complex transformations → Lakehouse / Warehouse / Spark

 

This aligns with Microsoft Fabric best practices for large-scale data.

 

Thanks,

Sai Teja 

View solution in original post

tayloramy
Super User
Super User

Hi @Koritala

 

While it may work with dataflows gen 2, I would highly recommend a more scaleable approach. 

I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient. 

 

 





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!





View solution in original post

3 REPLIES 3
tayloramy
Super User
Super User

Hi @Koritala

 

While it may work with dataflows gen 2, I would highly recommend a more scaleable approach. 

I would drop the dataflows entirely if possible. Use a pipeline to ingest the data into a lakehouse, and then use spark (either pyspark or sparksql) to do your transformations. That will be the most efficient. 

 

 





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!





SaiTejaTalasila
Super User
Super User

Hi @Koritala ,

 

For tables of this size (2+ billion rows), Dataflow Gen2 should be used primarily for raw or lightly transformed data, not heavy transformations.

 

Recommended pattern:

 

Use Dataflow Gen2 to ingest raw data (or apply only minimal, schema-level transformations such as column selection, renaming, basic type casting).

 

Store the data in OneLake (Lakehouse/Warehouse) as Delta tables.

 

Perform heavy transformations, joins, aggregations, and business logic downstream using Spark, Warehouse SQL, or semantic models.

 

Applying complex transformations during ingestion significantly increases refresh time, compute usage, and failure risk, especially with very large datasets.

 

In short:

 

Raw / lightly transformed data → Dataflow Gen2

 

Complex transformations → Lakehouse / Warehouse / Spark

 

This aligns with Microsoft Fabric best practices for large-scale data.

 

Thanks,

Sai Teja 

Zanqueta
Super User
Super User

Hi @Koritala , I am not entirely certain, but based on similar scenarios I have seen, Dataflow Gen2 is capable of handling extremely large datasets, including tables with billions of rows, provided that the overall architecture is designed appropriately.

 

In summary, there is no fixed limit preventing Dataflow Gen2 from ingesting more than 2 billion records.
It can load the full dataset if transformations preserve query folding and the workload is efficiently designed.
All data is stored as Delta Lake files within OneLake.

 

If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster.
Connect with me on LinkedIn

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Power BI Update Carousel

Power BI Community Update - March 2026

Check out the March 2026 Power BI update to learn about new features.

Top Solution Authors