Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
I'm trying to load large datasets (over 3 million rows) from a Web API into Power BI.
I'm looking for guidance on how to structure and implement large-scale Web API ingestion (3M+ rows) using OData in Power BI, especially with Gen1. Any best practices, architectural suggestions, or technical tips would be greatly appreciated.
example of M code of dataflow gen1
Solved! Go to Solution.
I'm glad it worked. The most conventional tool for moving data are data pipelines inside Fabric. Dataflow gen2 can be used for sources that are not listed at pipelines connectors, or those with Fast Copy possibility. The idea is to move the data without any transformation to a lakehouse. Later you can transform the data with notebooks (python or SQL) or with dataflows gen2.
The cloud sources from API could also be gather with notebooks and python code directly. Some APIs could be tricky and running code at cloud is the best approach to control it. Of course store it at lakehouse.
You can read about medallion architecture to get a better understanding about the architecture or the flow of the data from the source to the report.
I hope that make sense
Happy to help!
Hi @CrouchingTiger,
Thank you for posting your query in the Microsoft Fabric Community Forum, and thanks to @ibarrau for sharing valuable insights.
Could you please confirm if your query has been resolved by the provided solutions? This would be helpful for other members who may encounter similar issues.
Thank you for being part of the Microsoft Fabric Community.
Hi. I don't think you can scale with dataflows gen1. I would strongly suggest to do it in a Fabric Notebook or a Data Pipeline in order to handle API sources (that's the best practice). You could even try dataflow gen2 just in case (copy paste the code). For any of those you will need a storage (lakehouse or warehouse) to build and store the table.
The M code looks good. I don't think you can improve much more. If incremental refresh is not helping then you need to understand your architecture won't scale. That's why you need to use other tool.
I hope that make sense.
Happy to help!
Hi, @ibarrau
Thank you for response
I’ve confirmed that the data loads correctly in Dataflow Gen2, but I still need to test whether the refresh and incremental settings are functioning properly.
Moving forward, I’ll need to ingest data not only from APIs but also from sources like SAP HANA. I’d appreciate your guidance on what would be the most appropriate architectural approach to handle these diverse data sources efficiently.
I would appreciate your advice on the most appropriate architectural approach to handle these diverse data sources efficiently.
I'm glad it worked. The most conventional tool for moving data are data pipelines inside Fabric. Dataflow gen2 can be used for sources that are not listed at pipelines connectors, or those with Fast Copy possibility. The idea is to move the data without any transformation to a lakehouse. Later you can transform the data with notebooks (python or SQL) or with dataflows gen2.
The cloud sources from API could also be gather with notebooks and python code directly. Some APIs could be tricky and running code at cloud is the best approach to control it. Of course store it at lakehouse.
You can read about medallion architecture to get a better understanding about the architecture or the flow of the data from the source to the report.
I hope that make sense
Happy to help!
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 48 | |
| 46 | |
| 44 | |
| 16 | |
| 15 |