Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
I am using Power BI in direct query mode to create a simple table visualization that aggregates data. However, I have noticed that Power BI generates and sends two separate SQL queries to the database for this simple report. Moreover, instead of using a straightforward SELECT statement with a GROUP BY, the SQL queries contain multiple levels of subqueries, making them inefficient.
The columns in table emp are: id, level, mgrid, l1, l2, l3,l4,l5, l6, l7. It's a flatten employee hierarchy table. The ask is: Find the number of distinct l4, l5, l6 values mapped to each l2 value.
The SQL should be as simple as:
select l2, count(distinct l4) as l4_cnt, count(distinct l5) as l4_cnt, count(distinct l6) as l6_cnt from emp group by l2
But PowerBI generating below two SQLs and sending it to the database in DirectQuery mode, not sure whats going on.
Query-1
select "rows"."l2" as "l2", count(distinct("rows"."l6")) + max("rows"."ea448a3632b541b796945894c8baaf88..1") as "a0", count(distinct("rows"."l4")) + max("rows"."ea448a3632b541b796945894c8baaf88..2") as "a1", count(distinct("rows"."l5")) + max("rows"."ea448a3632b541b796945894c8baaf88..3") as "a2" from ( select "_"."id" as "id", "_"."level" as "level", "_"."mgrid" as "mgrid", "_"."l1" as "l1", "_"."l2" as "l2", "_"."l3" as "l3", "_"."l4" as "l4", "_"."l5" as "l5", "_"."l6" as "l6", "_"."l7" as "l7", case when "_"."l6" is null then 1 else 0 end as "ea448a3632b541b796945894c8baaf88..1", case when "_"."l4" is null then 1 else 0 end as "ea448a3632b541b796945894c8baaf88..2", case when "_"."l5" is null then 1 else 0 end as "ea448a3632b541b796945894c8baaf88..3" from ( select "_"."id", "_"."level", "_"."mgrid", "_"."l1", "_"."l2", "_"."l3", "_"."l4", "_"."l5", "_"."l6", "_"."l7" from ( select cast("$Table"."id" as varchar) as "id", "$Table"."level" as "level", cast("$Table"."mgrid" as varchar) as "mgrid", cast("$Table"."l1" as varchar) as "l1", cast("$Table"."l2" as varchar) as "l2", cast("$Table"."l3" as varchar) as "l3", cast("$Table"."l4" as varchar) as "l4", cast("$Table"."l5" as varchar) as "l5", cast("$Table"."l6" as varchar) as "l6", cast("$Table"."l7" as varchar) as "l7" from "public"."emp" "$Table" ) "_" where strpos((case when "_"."l2" is not null then "_"."l2" else '' end), 'aa6c6f1c-d83f-487c-8fb3-e52445747957') > 0 ) "_" ) "rows" group by "l2" limit 1000001
Query-2
select count(distinct("rows"."l6")) + max("rows"."b899be30b3d84b769a094f88077fb546..1") as "a0", count(distinct("rows"."l4")) + max("rows"."b899be30b3d84b769a094f88077fb546..2") as "a1", count(distinct("rows"."l5")) + max("rows"."b899be30b3d84b769a094f88077fb546..3") as "a2" from ( select "_"."l4" as "l4", "_"."l5" as "l5", "_"."l6" as "l6", case when "_"."l6" is null then 1 else 0 end as "b899be30b3d84b769a094f88077fb546..1", case when "_"."l4" is null then 1 else 0 end as "b899be30b3d84b769a094f88077fb546..2", case when "_"."l5" is null then 1 else 0 end as "b899be30b3d84b769a094f88077fb546..3" from ( select "_"."l4", "_"."l5", "_"."l6" from ( select cast("$Table"."id" as varchar) as "id", "$Table"."level" as "level", cast("$Table"."mgrid" as varchar) as "mgrid", cast("$Table"."l1" as varchar) as "l1", cast("$Table"."l2" as varchar) as "l2", cast("$Table"."l3" as varchar) as "l3", cast("$Table"."l4" as varchar) as "l4", cast("$Table"."l5" as varchar) as "l5", cast("$Table"."l6" as varchar) as "l6", cast("$Table"."l7" as varchar) as "l7" from "public"."emp" "$Table" ) "_" where strpos((case when "_"."l2" is not null then "_"."l2" else '' end), 'aa6c6f1c-d83f-487c-8fb3-e52445747957') > 0 ) "_" ) "rows"
sample report:
I would expect Power BI to run one simple SQL and get the results. Do I need to do any configuration changes for DirectQuery mode? Also, when I used Postgres DB as the source, it ran two queries. Where as if I use AWS Athena as the source, it has sent 4 queries. Looks super inefficient query generation.
Why you are getting this behavior ?
The SQL queries generated by Power BI can appear complex because it needs to account for various factors like:
This 1st query fetchs the database and handle null values through case statements, which I think it is necessary for correct aggregation handling.
For the 2nd query I can see that it performs additional distinct counts and aggregations, which might be due to other report interactions or visual requirements.
So you may need to pre-aggregated tables in your database in this way you can offload complex calculations from Power BI to the database.
Try to use PQ and see the differences :
let
Source = Odbc.Query("dsn=YourDSNName", "
select
l2,
count(distinct l4) as l4_cnt,
count(distinct l5) as l5_cnt,
count(distinct l6) as l6_cnt
from emp
group by l2
")
in
Source
@AmiraBedh Thank you for your response. I've tried using PQ to limit the number of queries, but whenever we use Odbc.query, it switches to "import mode". We want to use "direct query" due to various reasons such as data security compliance and preventing data sprawl.
What is the version of the connector you are using ?
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
15 | |
12 | |
8 | |
8 | |
7 |
User | Count |
---|---|
15 | |
13 | |
9 | |
7 | |
6 |