Solved: SQL Query generation in Direct Query

JothyGanesan · ‎06-06-2025

Hi Team, we are generating the report by connecting to Databricks as the source. The tabe volumes are running into millions.

The business ask is to have filters at date and few other columns. The slicers are being used.

We have to use Direct Query to fetch the data as we have the RLS enablement as well at databricks level depending on the user who uses the report.

We see the query generation using "IN" clause when we give a week filter. Can't we get this as data range "BETWEEN" clause instead? This would help to optimize the query to fetch the results in better response rate.

We are unable to do that switch in the queries generated in direct query

Kindly help

Poojara_D12 · ‎06-12-2025

Hi @JothyGanesan

When using DirectQuery mode in Power BI with Databricks as the source, especially with large tables and RLS (Row-Level Security) enforced at the Databricks level, query performance becomes critical. In such scenarios, applying filters—like date range selections—is essential for reducing data volume. However, Power BI's default behavior when using slicers (such as for selecting a week’s worth of dates) is to generate SQL queries using an IN clause that lists all selected dates individually (e.g., WHERE Date IN ('2025-06-10', '2025-06-11', ...)). While functionally correct, this approach is inefficient for large datasets because it results in unnecessarily verbose queries, and Databricks struggles to optimize those effectively compared to range-based predicates.

Unfortunately, in DirectQuery mode, Power BI does not provide a built-in way to force slicers to translate to a BETWEEN or >= AND <= range query instead of an IN clause. This behavior is by design, based on how slicers communicate selected values to the underlying SQL query generator. The only workaround is to use a custom filter mechanism—such as replacing the slicer with two separate date pickers (start and end date)—and then using a calculated column or measure that interprets those values in a BETWEEN-like logic, which Power BI may then translate into a more optimized query. This is more likely to happen if the filter is applied via a custom visual or within the DAX query that defines the report visual.

Another approach, albeit more advanced, is to redesign the semantic model to introduce calculated columns or parameters that facilitate range-based filtering, or to adjust how filters are passed to Databricks via views or stored procedures—though this may not be feasible with RLS enforced at source.

In summary, while Power BI currently lacks a direct toggle to force BETWEEN instead of IN in slicer-generated queries under DirectQuery, you can often get closer to that behavior by using range pickers or custom filtering logic. Hopefully, future updates may offer more control over query shaping in DirectQuery scenarios.

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS

View solution in original post

v-agajavelly · ‎06-24-2025

Hi @JothyGanesan

Just checking in one last time. Were you able to try out any of the suggestions shared earlier? If your issue is resolved, marking the accepted solution would be a big help to others who might be facing the same scenario.

If you went in a different direction or still need support, feel free to drop a quick update, we’re happy to keep helping.

Regards,
Akhil.

JothyGanesan · ‎06-24-2025

Apologies on the delay. We are still looking out for different solutions to optimize our reporting time as the below options are not directly feasible in using Power App or API to pass the dates to the custom views. The date filter is a integral part of our powerBI report and is one of the primary filters used by our consumers.

We are still finding out ways. Thank you all for your valuable inputs

v-agajavelly · ‎06-19-2025

Hi @JothyGanesan

Just checking in to see if the approach shared above helped with your DirectQuery filtering challenge in Power BI using Databricks. Using a custom SQL view with a parameter_date_range table and a BETWEEN clause can significantly optimize performance and avoid the long IN (...) lists that Power BI tends to generate especially with large datasets.

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily. Looking forward to your update.

Best regards,
Akhil

Poojara_D12 · ‎06-12-2025

Hi @JothyGanesan

When using DirectQuery mode in Power BI with Databricks as the source, especially with large tables and RLS (Row-Level Security) enforced at the Databricks level, query performance becomes critical. In such scenarios, applying filters—like date range selections—is essential for reducing data volume. However, Power BI's default behavior when using slicers (such as for selecting a week’s worth of dates) is to generate SQL queries using an IN clause that lists all selected dates individually (e.g., WHERE Date IN ('2025-06-10', '2025-06-11', ...)). While functionally correct, this approach is inefficient for large datasets because it results in unnecessarily verbose queries, and Databricks struggles to optimize those effectively compared to range-based predicates.

Unfortunately, in DirectQuery mode, Power BI does not provide a built-in way to force slicers to translate to a BETWEEN or >= AND <= range query instead of an IN clause. This behavior is by design, based on how slicers communicate selected values to the underlying SQL query generator. The only workaround is to use a custom filter mechanism—such as replacing the slicer with two separate date pickers (start and end date)—and then using a calculated column or measure that interprets those values in a BETWEEN-like logic, which Power BI may then translate into a more optimized query. This is more likely to happen if the filter is applied via a custom visual or within the DAX query that defines the report visual.

Another approach, albeit more advanced, is to redesign the semantic model to introduce calculated columns or parameters that facilitate range-based filtering, or to adjust how filters are passed to Databricks via views or stored procedures—though this may not be feasible with RLS enforced at source.

In summary, while Power BI currently lacks a direct toggle to force BETWEEN instead of IN in slicer-generated queries under DirectQuery, you can often get closer to that behavior by using range pickers or custom filtering logic. Hopefully, future updates may offer more control over query shaping in DirectQuery scenarios.

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS

v-agajavelly · ‎06-12-2025

Hi @JothyGanesan

Thank you @Nasif_Azam for your detailed explanation about DirectQuery mode in Power BI, especially with large datasets.

Yes, this 6th option using a custom SQL view in Databricks with parameters can definitely help with generating a BETWEEN clause instead of the default IN that Power BI often uses in DirectQuery mode.

You create a simple table in Databricks (say, parameter_date_range) that holds your desired date range like a start and end date. Then, you create a SQL view that joins your main fact table with this parameter table and filters it using a BETWEEN clause.
For example, let’s say your fact table is called fact_sales, and you want to filter it by a date range. First, you create a small parameter table.

CREATE OR REPLACE TABLE parameter_date_range (start_date DATE,end_date DATE);
Set the desired range (could be automated or updated externally)

DELETE FROM parameter_date_range, INSERT INTO parameter_date_range VALUES (DATE('2025-06-01'), DATE('2025-06-07')), Then, create a view using this parameter:

CREATE OR REPLACE VIEW v_filtered_sales AS
SELECT s.*
FROM fact_sales s
JOIN parameter_date_range p
ON s.sale_date BETWEEN p.start_date AND p.end_date.

Now, when Power BI connects to v_filtered_sales using DirectQuery, it will only query rows between the given dates, and Databricks will use a clean BETWEEN clause behind the scenes.

This completely avoids the issue where Power BI sends a long list of IN (...) values, which can slow things down, especially with large tables.
To update the Date Range. You have a few options here depending on how dynamic you want it to be:

If the range changes infrequently, just update the parameter_date_range table manually or via a Databricks job.
If it needs to reflect the user's selection in Power BI, then you'd need a small write-back mechanism (for example, a Power App or API call that updates the table when the user selects a date range).

This method ensures Databricks always uses a BETWEEN filter, which is faster and cleaner, and you get full control over how the date filter is applied, outside of Power BI’s auto-generated SQL. Which works well with RLS, large datasets, and lets Databricks handle the heavy lifting.

------------------------------------------------------------------------------------------------------------------------------
If this response helps, consider marking it as “Accept as solution” and giving a “kudos” to assist other community members.

Regards,
Akhil.

JothyGanesan · ‎06-09-2025

Thank you @Nasif_Azam

Is it possible for you to explain the 6th option of parameter in the view usage? (Custom SQL View in Databricks)

Nasif_Azam · ‎06-12-2025

Sure! The 6th option using a parameterized custom SQL view in Databricks to improve how Power BI interacts with your data when using DirectQuery:

Objective

To prevent Power BI from sending inefficient IN clause queries when using slicers with multiple values, and instead leverage optimized BETWEEN filters directly in the SQL layer of Databricks.

What Is a Parameterized View in Databricks?

Databricks (via Spark SQL) doesn't support true parameterized views like traditional SQL Server or Oracle (i.e., you can't pass dynamic parameters directly to a view).

However, you can emulate this behavior in one of these ways:

Approach A: Use a Dedicated View With a Date Range Filter

You create a view that already includes filtering logic based on a static or expected date range (e.g., “last 7 days”):

CREATE OR REPLACE VIEW v_fact_sales_last7days AS
SELECT *
FROM fact_sales
WHERE order_date BETWEEN current_date() - INTERVAL 7 DAYS AND current_date();

Use Case: Great for dashboards always showing recent activity without needing slicers. Power BI's DirectQuery now just fetches data directly without sending complex queries.

Approach B: Use a Control Table for Filtering

If you need dynamic filtering from Power BI, you can simulate parameters by joining with a single-row control table:

-- Assume you have this control table
CREATE OR REPLACE TEMP VIEW control_date_range AS
SELECT date_start, date_end FROM parameters_table WHERE report_id = 'SalesReport1';

-- Create your view
CREATE OR REPLACE VIEW v_filtered_fact_sales AS
SELECT f.*
FROM fact_sales f
JOIN control_date_range p
  ON f.order_date BETWEEN p.date_start AND p.date_end;

Approach C: Use Power BI to Write Values to a Helper Table

You can use Power BI with a writeback-enabled tool (e.g., Power Apps or a Fabric Warehouse trigger) to write the user's selected date range into a helper table (slicer_values), which Databricks then reads from in the view.

Things to remember

This method doesn't work out-of-the-box with Power BI slicers — you need some workaround to send the selected date range to Databricks.
Ideal for pre-filtered dashboards or static views used in embedded analytics.
Avoid over-complicating unless the performance gains are significant (e.g., in cases with millions of rows and frequent access).
For ultimate control, switch from DirectQuery to Import mode if RLS and performance can be handled in the model. You’ll avoid SQL inefficiencies altogether.

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.

Best Regards,
Nasif Azam

Did I answer your question?
If so, mark my post as a solution!
Also consider helping someone else in the forums!

Proud to be a Super User!

v-agajavelly · ‎06-08-2025

Hi @JothyGanesan

Thankyou @Nasif_Azam for taking time and providing solution.

we would like to follow up to see if the solution provided by the super user resolved your issue. Please let us know if you need any further assistance.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

Regards,
Akhil.

Nasif_Azam · ‎06-06-2025

Hey @JothyGanesan ,

This is a common concern when working with DirectQuery mode in Power BI, especially with large datasets like the millions of rows in your Databricks tables and RLS (Row-Level Security) enforced at the source.

Why is Power BI generating IN instead of BETWEEN?

Power BI’s DirectQuery engine typically uses the IN clause when:

You select multiple distinct values in a slicer (e.g., selecting individual dates),
The slicer is set to list selection instead of a continuous range.

This can happen even if the values represent a range (like a week's worth of dates), resulting in inefficient queries such as:

SELECT ... 
FROM your_table
WHERE date_column IN ('2025-06-01', '2025-06-02', ..., '2025-06-07')

Instead of:

WHERE date_column BETWEEN '2025-06-01' AND '2025-06-07'

The latter is more optimal, especially for columnar stores like Databricks.

Recommendations to Switch to BETWEEN-style Querying

1. Use a Continuous Date Range Slicer

Default Slicer Behavior: If you're using a slicer on a date field, switch its style from "List" or "Dropdown" to "Between".
- In Power BI:
  - Select the slicer visual.
  - Under the Visualizations pane, choose the slicer type as "Between" (slider-style).
- This encourages Power BI to generate a BETWEEN clause instead of an IN clause.

Power BI doesn’t always guarantee SQL-level optimization just by changing slicer style, but this is the most straightforward way to hint that you're filtering by range.

2. Create a Date Range Parameter Table

If slicer still results in IN, create a custom parameter using DAX:

DateFilter = 
VAR MinDate = MIN('DateTable'[Date])
VAR MaxDate = MAX('DateTable'[Date])
RETURN
    FILTER('FactTable', 'FactTable'[DateColumn] >= MinDate && 'FactTable'[DateColumn] <= MaxDate)

Then use this in a calculated table or measure to control visual-level filters and optimize query folding behavior.

3. Avoid Complex Calculated Columns on the Date Field

Make sure the Date field used in slicers is a direct column from your model or a related date dimension table.
Avoid wrapping it in calculated columns or FORMAT(...) functions in DAX, as this can block Power BI from pushing filters down effectively.

4. Databricks Query Diagnostics

Enable Performance Analyzer or Query Diagnostics in Power BI Desktop:

It will show you exactly what SQL is being sent to Databricks.
This helps confirm whether IN vs BETWEEN is being generated after slicer adjustments.

5. Report-Level Filtering

If the requirement is to apply the same date range across the report:

Use Report Filters (not just slicers) with is after or equal to and is before or equal to logic.
This too may help produce a BETWEEN clause or equivalent predicate in the SQL.

6. Custom SQL View in Databricks

If Power BI's auto-generated queries still don't satisfy you:

Create a Databricks view that accepts parameters or performs pre-filtering using BETWEEN.
Use DirectQuery on that view instead.

Things to Remember:

You can’t directly control how Power BI generates SQL in DirectQuery mode.
But by carefully configuring slicer styles, parameters, and date table modeling, you can influence it.
Using query folding-friendly constructs and avoiding DAX wrapping increases the chances of optimization.

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.

Best Regards,
Nasif Azam

Did I answer your question?
If so, mark my post as a solution!
Also consider helping someone else in the forums!

Proud to be a Super User!

SQL Query generation in Direct Query

Objective

What Is a Parameterized View in Databricks?

Approach A: Use a Dedicated View With a Date Range Filter

Approach B: Use a Control Table for Filtering

Approach C: Use Power BI to Write Values to a Helper Table

Things to remember

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

SQL Query generation in Direct Query

Objective

What Is a Parameterized View in Databricks?

Approach A: Use a Dedicated View With a Date Range Filter

Approach B: Use a Control Table for Filtering

Approach C: Use Power BI to Write Values to a Helper Table

Things to remember

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026