Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
SofiaGinger
New Member

DataflowGen2 on MS Fabric: Optimize column profiling on millions of rows (1-5mil)

Hello, 

I have datasets residing on hive_metastore of Azure Databricks storage, in the range of 1-5 millions of records. I wish to use the Column Profiling feature on the entire dataset after loading the data using Get Data experience on a DataflowGen2 object . 

When I am changing from "Column profiling based on top 1000 rows" to "Column profiling based on entire dataset" , the processing takes forever when applied on the 1 million dataset, and happens instataneously when applied on the 1000 rows one. 

What steps shall I take to optimize the performance here, to perform column profiling on entire data set?

SofiaGinger_0-1717502170623.png

 

1 ACCEPTED SOLUTION
miguel
Community Admin
Community Admin

This is something that we don't have full control over. It relies on 3 specific components:

  • Data source: and the resources available for it
  • Connector: owned by the Databricks team
  • Power Query editor: it effectively runs the queries that the connector tells it to run and it tries to later cache the results if they're fully computed

I can pass your feedback to the Databricks team, but there's nothing beyond what you're doing today that can impact the performance of the profiling for such scenario.

View solution in original post

1 REPLY 1
miguel
Community Admin
Community Admin

This is something that we don't have full control over. It relies on 3 specific components:

  • Data source: and the resources available for it
  • Connector: owned by the Databricks team
  • Power Query editor: it effectively runs the queries that the connector tells it to run and it tries to later cache the results if they're fully computed

I can pass your feedback to the Databricks team, but there's nothing beyond what you're doing today that can impact the performance of the profiling for such scenario.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors