Architecture ideas for partitioned Parquet files i...

moritzmassimo · ‎10-19-2021

Hi all,

I'm curiuos to find the best way to work with multiple parquet files that are partitioned by day.
As our tables grew to big we had to move away from SQL DB and decided to let our databricks notebooks write parquet files to ADLS. Our first idea was then to create virtual tables with poly base in our existing Azure SQL and read with PowerBI from those, but unfortunately it does not support Poly Base.
I also tried to directly read those parquet files with the new PowerBI connector, but that did not work because following issues: In each folder are also the "started", "commited" and "Success" Files created by the Spark are inside and with those the Parquet Files can't be combined.

So I think there should be something between the parquets and power BI. With the sql db we also created views to limit the data size (eg. data from past two years) dependend on dashboard needs.
I suppose I'm not the only one with that "problem". Could you please share best practices for that?
Could be any technology or method.

Good to know: we have an Azure Pipeline with Datafactory and Databricks in the back.

Many thanks,
Massimo

v-kelly-msft · ‎10-21-2021

Hi @moritzmassimo ,

I noticed that Parquet connector is now available since September this year,if this case can be simplified to the connection between Parquet and Power bi,then you may try to use Parquet connector,below is the related reference:

https://docs.microsoft.com/en-us/power-platform-release-plan/2020wave2/cdm-data-integration/parquet-...

https://parquet.apache.org/documentation/latest/

Best Regards,
Kelly

Did I answer your question? Mark my reply as a solution!

Architecture ideas for partitioned Parquet files in ADLS

Helpful resources

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly