Get certified for free when you join Fabric Data Days 2026 and dive into Fabric, Power BI, SQL, AI, and other essential data skills.
Join nowData Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more
When prepping data in Fabric, how do you decide whether to use Dataflows Gen2 or a notebook (PySpark/SQL)? Is it mainly about data volume, or are there other factors?
Hello!
I wouldn’t decide only by data volume.
For me it’s more about who will own it and how complex the logic is.
If the people maintaining it are analysts or Power BI developers, and the transformations are standard Power Query-style steps, I’d usually go with Dataflows Gen2. It’s easier to understand, easier to hand over, and better for low-code data prep.
If the logic needs a data engineer — PySpark/SQL, custom functions, complex joins, SCD logic, data quality checks, Delta maintenance, or reusable code — then I’d use a notebook.
So my simple rule is:
Dataflows Gen2 = analyst-friendly, visual, low-code, easier handover.
Notebooks = engineer-friendly, code-first, more control, better for complex logic.
Volume matters, but skillset, maintainability, and ownership matter just as much.
In real projects, I often use both and let a pipeline orchestrate them.
Best regards,
Parchitect - Solutions Architect
💡Did my response help you? Clicking Kudos is a small gesture that goes a long way!
✔️Did I answer your question? Please mark my post as a Solution to help others find it faster.