Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Hi Team, we are generating the report by connecting to Databricks as the source. The tabe volumes are running into millions.
The business ask is to have filters at date and few other columns. The slicers are being used.
We have to use Direct Query to fetch the data as we have the RLS enablement as well at databricks level depending on the user who uses the report.
We see the query generation using "IN" clause when we give a week filter. Can't we get this as data range "BETWEEN" clause instead? This would help to optimize the query to fetch the results in better response rate.
We are unable to do that switch in the queries generated in direct query
Kindly help
When using DirectQuery mode in Power BI with Databricks as the source, especially with large tables and RLS (Row-Level Security) enforced at the Databricks level, query performance becomes critical. In such scenarios, applying filters—like date range selections—is essential for reducing data volume. However, Power BI's default behavior when using slicers (such as for selecting a week’s worth of dates) is to generate SQL queries using an IN clause that lists all selected dates individually (e.g., WHERE Date IN ('2025-06-10', '2025-06-11', ...)). While functionally correct, this approach is inefficient for large datasets because it results in unnecessarily verbose queries, and Databricks struggles to optimize those effectively compared to range-based predicates.
Unfortunately, in DirectQuery mode, Power BI does not provide a built-in way to force slicers to translate to a BETWEEN or >= AND <= range query instead of an IN clause. This behavior is by design, based on how slicers communicate selected values to the underlying SQL query generator. The only workaround is to use a custom filter mechanism—such as replacing the slicer with two separate date pickers (start and end date)—and then using a calculated column or measure that interprets those values in a BETWEEN-like logic, which Power BI may then translate into a more optimized query. This is more likely to happen if the filter is applied via a custom visual or within the DAX query that defines the report visual.
Another approach, albeit more advanced, is to redesign the semantic model to introduce calculated columns or parameters that facilitate range-based filtering, or to adjust how filters are passed to Databricks via views or stored procedures—though this may not be feasible with RLS enforced at source.
In summary, while Power BI currently lacks a direct toggle to force BETWEEN instead of IN in slicer-generated queries under DirectQuery, you can often get closer to that behavior by using range pickers or custom filtering logic. Hopefully, future updates may offer more control over query shaping in DirectQuery scenarios.
Thank you @Nasif_Azam for your detailed explanation about DirectQuery mode in Power BI, especially with large datasets.
Yes, this 6th option using a custom SQL view in Databricks with parameters can definitely help with generating a BETWEEN clause instead of the default IN that Power BI often uses in DirectQuery mode.
You create a simple table in Databricks (say, parameter_date_range) that holds your desired date range like a start and end date. Then, you create a SQL view that joins your main fact table with this parameter table and filters it using a BETWEEN clause.
For example, let’s say your fact table is called fact_sales, and you want to filter it by a date range. First, you create a small parameter table.
CREATE OR REPLACE TABLE parameter_date_range (start_date DATE,end_date DATE);
Set the desired range (could be automated or updated externally)
DELETE FROM parameter_date_range, INSERT INTO parameter_date_range VALUES (DATE('2025-06-01'), DATE('2025-06-07')), Then, create a view using this parameter:
CREATE OR REPLACE VIEW v_filtered_sales AS
SELECT s.*
FROM fact_sales s
JOIN parameter_date_range p
ON s.sale_date BETWEEN p.start_date AND p.end_date.
Now, when Power BI connects to v_filtered_sales using DirectQuery, it will only query rows between the given dates, and Databricks will use a clean BETWEEN clause behind the scenes.
This completely avoids the issue where Power BI sends a long list of IN (...) values, which can slow things down, especially with large tables.
To update the Date Range. You have a few options here depending on how dynamic you want it to be:
This method ensures Databricks always uses a BETWEEN filter, which is faster and cleaner, and you get full control over how the date filter is applied, outside of Power BI’s auto-generated SQL. Which works well with RLS, large datasets, and lets Databricks handle the heavy lifting.
------------------------------------------------------------------------------------------------------------------------------
If this response helps, consider marking it as “Accept as solution” and giving a “kudos” to assist other community members.
Regards,
Akhil.
Thank you @Nasif_Azam
Is it possible for you to explain the 6th option of parameter in the view usage? (Custom SQL View in Databricks)
Sure! The 6th option using a parameterized custom SQL view in Databricks to improve how Power BI interacts with your data when using DirectQuery:
To prevent Power BI from sending inefficient IN clause queries when using slicers with multiple values, and instead leverage optimized BETWEEN filters directly in the SQL layer of Databricks.
Databricks (via Spark SQL) doesn't support true parameterized views like traditional SQL Server or Oracle (i.e., you can't pass dynamic parameters directly to a view).
However, you can emulate this behavior in one of these ways:
You create a view that already includes filtering logic based on a static or expected date range (e.g., “last 7 days”):
CREATE OR REPLACE VIEW v_fact_sales_last7days AS SELECT * FROM fact_sales WHERE order_date BETWEEN current_date() - INTERVAL 7 DAYS AND current_date();
Use Case: Great for dashboards always showing recent activity without needing slicers. Power BI's DirectQuery now just fetches data directly without sending complex queries.
If you need dynamic filtering from Power BI, you can simulate parameters by joining with a single-row control table:
-- Assume you have this control table CREATE OR REPLACE TEMP VIEW control_date_range AS SELECT date_start, date_end FROM parameters_table WHERE report_id = 'SalesReport1'; -- Create your view CREATE OR REPLACE VIEW v_filtered_fact_sales AS SELECT f.* FROM fact_sales f JOIN control_date_range p ON f.order_date BETWEEN p.date_start AND p.date_end;
You can use Power BI with a writeback-enabled tool (e.g., Power Apps or a Fabric Warehouse trigger) to write the user's selected date range into a helper table (slicer_values), which Databricks then reads from in the view.
This method doesn't work out-of-the-box with Power BI slicers — you need some workaround to send the selected date range to Databricks.
Ideal for pre-filtered dashboards or static views used in embedded analytics.
Avoid over-complicating unless the performance gains are significant (e.g., in cases with millions of rows and frequent access).
For ultimate control, switch from DirectQuery to Import mode if RLS and performance can be handled in the model. You’ll avoid SQL inefficiencies altogether.
If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.
Best Regards,
Nasif Azam
Thankyou @Nasif_Azam for taking time and providing solution.
we would like to follow up to see if the solution provided by the super user resolved your issue. Please let us know if you need any further assistance.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.
Regards,
Akhil.
Hey @JothyGanesan ,
This is a common concern when working with DirectQuery mode in Power BI, especially with large datasets like the millions of rows in your Databricks tables and RLS (Row-Level Security) enforced at the source.
Why is Power BI generating IN instead of BETWEEN?
Power BI’s DirectQuery engine typically uses the IN clause when:
You select multiple distinct values in a slicer (e.g., selecting individual dates),
The slicer is set to list selection instead of a continuous range.
This can happen even if the values represent a range (like a week's worth of dates), resulting in inefficient queries such as:
SELECT ... FROM your_table WHERE date_column IN ('2025-06-01', '2025-06-02', ..., '2025-06-07')
Instead of:
WHERE date_column BETWEEN '2025-06-01' AND '2025-06-07'
The latter is more optimal, especially for columnar stores like Databricks.
Recommendations to Switch to BETWEEN-style Querying
1. Use a Continuous Date Range Slicer
Default Slicer Behavior: If you're using a slicer on a date field, switch its style from "List" or "Dropdown" to "Between".
In Power BI:
Select the slicer visual.
Under the Visualizations pane, choose the slicer type as "Between" (slider-style).
This encourages Power BI to generate a BETWEEN clause instead of an IN clause.
Power BI doesn’t always guarantee SQL-level optimization just by changing slicer style, but this is the most straightforward way to hint that you're filtering by range.
2. Create a Date Range Parameter Table
If slicer still results in IN, create a custom parameter using DAX:
DateFilter = VAR MinDate = MIN('DateTable'[Date]) VAR MaxDate = MAX('DateTable'[Date]) RETURN FILTER('FactTable', 'FactTable'[DateColumn] >= MinDate && 'FactTable'[DateColumn] <= MaxDate)
Then use this in a calculated table or measure to control visual-level filters and optimize query folding behavior.
3. Avoid Complex Calculated Columns on the Date Field
Make sure the Date field used in slicers is a direct column from your model or a related date dimension table.
Avoid wrapping it in calculated columns or FORMAT(...) functions in DAX, as this can block Power BI from pushing filters down effectively.
4. Databricks Query Diagnostics
Enable Performance Analyzer or Query Diagnostics in Power BI Desktop:
It will show you exactly what SQL is being sent to Databricks.
This helps confirm whether IN vs BETWEEN is being generated after slicer adjustments.
5. Report-Level Filtering
If the requirement is to apply the same date range across the report:
Use Report Filters (not just slicers) with is after or equal to and is before or equal to logic.
This too may help produce a BETWEEN clause or equivalent predicate in the SQL.
6. Custom SQL View in Databricks
If Power BI's auto-generated queries still don't satisfy you:
Create a Databricks view that accepts parameters or performs pre-filtering using BETWEEN.
Use DirectQuery on that view instead.
Things to Remember:
You can’t directly control how Power BI generates SQL in DirectQuery mode.
But by carefully configuring slicer styles, parameters, and date table modeling, you can influence it.
Using query folding-friendly constructs and avoiding DAX wrapping increases the chances of optimization.
If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.
Best Regards,
Nasif Azam
User | Count |
---|---|
6 | |
4 | |
2 | |
2 | |
2 |
User | Count |
---|---|
6 | |
3 | |
3 | |
2 | |
2 |