Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredJoin us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.
Hi,
I am using the databricks connector to connect to a Hive table that exposes a table with a two digit million number of rows.
Users now would like to use the Column profiling functionality in Power BI:
However, as soon as one switches to profile the full data set:
It does not work anymore. After looking in detail to the queries that are sent to databricks one realizes that this cannot work, because Power BI is not doing any query folding and thus millions of rows are to be retrieved. I would expect Power BI to to fold and then send some DISTINCT/GROUP BY statements to the source.
- Is this a known limitation / bug?
- is this specific to the databricks connector or does it apply to all connectors?
- Is there any known work arround?
I have added an idea to fix it Microsoft Idea · Fix column profiling to use query folding (powerbi.com) .
Thanks alot
Felix
@fmdus you are asking it to analyze all rows. That cannot be folded. The column profiling is a feature of Power Query. To do the analysis, it must retrieve all records to do the analysis. It is like loading the data. At some point, it has to finally retrieve the full recordset. All query folding does is puts that off until the last possible moment when either folding breaks, or the data loads, or you ask the tool to do something that cannot be folded, like do a column profile on the entire column.
Theoretically could it generate multiple SQL statements just to show the column quality and profiling of the recordset? Yes. And theortically, that would hammer the SQL Server with a ton of queries. I suppose you could post that as an Idea in the Ideas forum, but remember, Power Query is a transformation tool, not an analysis tool. At some point, do the analysis in DAX.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingIt isn't that folding no longer works @fmdus - it is that Power Query retrieves 100% of the data, then must do an analysis on it. It is not optimized in any way. Whatever granularity you have when you request the full dataset is what you get, and on a few million records, it can take forever, or simply time out.
I rarely recommend changing the 1000 sample size to full dataset unless your full dataset is 100,000 or fewer records. Even at that level, it gets really slow.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI Reporting@edhans yes. I have realized that Power BI retrieves all data. However, I suppose we all agree that for any big data scenario folding will be a must. It is not reasonable to retrieve all rows.
Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!
Check out the October 2025 Power BI update to learn about new features.