Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Learn more

Reply
a_pereira
New Member

Reading large Azure data lake parquet files in power bi

Hi everyone,

 

I would like to implement a dashboard in power bi using a parquet file from Azure Data Lake blob storage.

 

It contains 4 columns (Date, ID, Product Price, number of stores) and this dashboard would later on be filtered by ID.

 

I thought I found the answer by using Azure Data Lake storage Gen 2 "Get Data" method and adding the parquet file URL in the data lake as bellow :

 

https://<accountname>.dfs.core.windows.net/<container>/<subfolder>

 

I get access to the correct file and I can combine and load the data.

 

Unfortunately, only the import method is available and the file is way too heavy to be imported in Power Bi Desktop. (~70GB)

 

I then found some documentation stating that there was a possibility to use Dataflows with Direct Query to avoid importing the data and still building reports by querying directly from the Azure Data Lake Storage.

 

I found how to connect Azure to Power Bi and build Dataflows but can't manage to reach the parquet file I need.

 

can't find a way to create the Common Data Model folder necessary to implement in the Dataflow to get the data directly from ADLS. (But found the way to create a dataflow via CDM folder)

 

Could you please help with creating a CDM file in Azure Data Lake Storage ?

I also saw in the documentation that only CSV files are permitted in a CDM file, is that so ? Can I not use my parquet file as a data source ?

 

If this is not the correct way to do so, could you please provide me with a solution for querying/importing large datasets (~70GB) into Power Bi Desktop with Azure Data Lake Storage without modifying the original dataset ?

 

Thank you for your help,

AP

7 REPLIES 7
bcdobbs
Community Champion
Community Champion

I'd agree with @TomMartens

As an alternative to the a spark sql pool you could give serverless sql in synapse ago. You can create a view against your exisiting data lake parquet which you can then use direct query against.

Have a look at https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/create-use-views



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Sorry to ask a follow up question in old thread. May I know what is the license requirement to setup serverless sql in synapse? Recently Fabric is released and it seems Synapse is under Fabric's umbrella. Also do we need SQL server license in this case?

Fabric is in preview and you should not use it in prod. 
Synapse is a separate product. to use serverless cluster you just need to create a synapse worksapce in azure. it'll have a default synapse serveless pool that you can use. it's pay per use. you pay per query and data movement.


--
Riccardo Perico
BI Architect @ Lucient Italia | Microsoft MVP

Blog | GitHub

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Is below the correct link I should look into?
Pricing - Azure Synapse Analytics | Microsoft Azure

yes it is. 

serverless is under data warehousing workloads

 

R1k91_0-1692114824949.png

 

dedicated is under the same category but is optional


--
Riccardo Perico
BI Architect @ Lucient Italia | Microsoft MVP

Blog | GitHub

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
a_pereira
New Member

Up

Hey @a_pereira ,

 

when you are starting to create a new dataflow select "Define you new table" instead of "Attach a Common data model folder"

image.png

 

I assume this will help, but please be aware that there might be data size limits as well. DirectQuery to dataflow does not mean the Parquet files will be queried in DQ mode, instead the dataflow will be queried.

If you have a Spark/Databricks Cluster, I would recommend using DQ against a Spark SQL query.

Hopefully, this provides some ideas on how to tackle your challenge.

 

Regards,
Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors