DirectQuery SQL WHERE clause inefficiencies when d...

dpronta · ‎12-04-2023

Hi everyone,

I have a report that is a composite model that uses direct query to fetch newly changed values from DataVerse (we have a power apps integration where users can change values and we want it to flow from dataverse into our power bi report). Genereally speaking, i have a solution that works but its not efficient and is causing spikes in CPU usage across the users.

One of the main issues that comes up time and time again when trying to optimize the report, is around how the SQL gets generated when there are filters on our main dimension (lets call it DIMENSION_A) that we are splitting the data by. You can see in the picture below, that instead of just querying, then filtering using the data model, it sticks the entire list of ids into the where clause of the direct query. Because my interests are in speed to display updates with a medium size dataset (50-100k rows), this WHERE clause i think hurts performance not helps it.

My work around solution so far is essentially creating 2 tables. One "archive" table thats in import mode and another "update" table thats in direct query mode. Using dax i can get the "new" records after a certain last modified date then limit the entries in DIMENSION_A to only those who have been recently changed. In my dax formula i have a process to make sure things arent double counted. Its the best i could do without an alternative solution to prevent these huge WHERE clauses from being generated. I do see performance gains with this option but its not ideal.

note here: i am aware that strings are inefficient and and will be converting them to int. Doesnt change the fact the list is being put in the where clause.

Does anyone know an approach to prevent this from happening? Ive tried several things including:

Using a manual "VALUES(DIMENSION_A[ID]) in list" in a calculate condition
Creating a calculated table in the dax expression where the filter context is removed from DIMENSION_A then readded after (it still tires to "optimize" and passes the ids in)
The aforementioned archive vs new tables
Having DIMENSION_A be a DUAL table to JOIN to the record instead (still generates this SQL just on the DIMENSION_A table query instead)
Many other potential DAX optimizations and work arounds that try to prevent it from being generated.
disableing query folding

You help is greatly appreciate!!

JoshNewham · ‎12-28-2023

Had a similar problem recently, but even worse. They seem to have introduced a limit on the Dataverse which means some of these large, generated queries won't even run any more (Error: "Too many literal binary expressions in the query. The maximum number is 200")

My workaround (though it's definitely not a solution) was to use virtual relationships with TREATAS and group by a higher granularity (week start dates instead of dates). To fix this, Microsoft would probably need to introduce an option similar to "Assume referential integrity" for composite datasets that shifts the joining to the Formula Engine so it doesn't introduce a massive WHERE clause. I wouldn't hold your breath on that one!

dpronta · ‎12-04-2023

One thing i realize that might be a suggestion here is putting the things im filtering on about DIMENSION_A in DataVerse. That is one solution for more traditional characteristics about DIMENSION_A. However, the report I have often filters on calculated measures that are specific to the user so this isnt a general solution to the problem.

DirectQuery SQL WHERE clause inefficiencies when dimension is filtered

Helpful resources

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Become a Certified Power BI Data Analyst!

DirectQuery SQL WHERE clause inefficiencies when dimension is filtered

Helpful resources

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly