Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
dhananjay_pbi
Frequent Visitor

Direct Query mode not applicable for merging two tables with left outer join in azure databricks

We have a large dataset and our requirement is to update a table data and refresh the report. However due to the large size of dataset, in IMPORT mode, it took us around 15-20 mins for dataset refresh.

We changed the tables to use direct query storage mode, but unable to merge two tables with left outer join. After using Merge Query, it's changing table storage to import mode.
 
I tried to use azure databricks SQL Table function but this does not show up in power bi table navigator. Any specific permission is required for power bi to access SQL table function in azure databricks - https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-ddl-create-sql...
 
Thanks
Dhananjay
 

 

3 REPLIES 3
AntrikshSharma
Super User
Super User

@dhananjay_pbi Are these tables coming from 2 different databases? If yes, then it is not possible, for DQ to work there should be a single SQL request to a database, in your case PQ has to fetch data from both databases and merge them in PQ.

Thanks @AntrikshSharma  & @daXtreme for your response. Let me eloborate complete scenatio that I am trying to achieve so that you could recommend optimum solution suppored by power bi.

 

We are trying to use power bi client in frontend to display reports. User should be able to update schedule dates and visualize updated report. We use Power BI rest api to refresh dataset & report.

 

We are using DB – Azure Databricks – SQL Endpoint

Frontend – React app with power bi client integration

 

Schedule table (record count – approx. 5 Million)

 

ScheduleID

Start_Date

End_Date

Region

1

Nov-15-2022

Nov-20-2022

R1

2

Nov-15-2022

Nov-25-2022

R1

3

Nov-20-2022

Nov-30-2022

R1

4

Dec-20-2022

Dec-25-2022

R1

5

Dec-20-2022

Dec-25-2022

R2

 

PlannedSchedule table (updated by user through frontend app)

 

ScenarioID

Planned_Start_Date

Planned_End_Date

ScheduleID (FK)

S1

Nov-17-2022

Nov-22-2022

1

S1

Nov-17-2022

Nov-27-2022

2

S2

Nov-20-2022

Nov-30-2022

3

 

Power BI report View:

 

Filter1

 

Scenario ID

(Blank)

S1 - Selected

S2

 

Filter2

 

Region

(Blank)

R1 - Selected

R2

 

Result

 

Planned Schedules (S1 selected)

 

ScheduleID

Planned_Start_Date

Planned_End_Date

Remarks

1

Nov-17-2022

Nov-22-2022

Fetch dates from PlannedSchedule table

2

Nov-17-2022

Nov-27-2022

Fetch dates from PlannedSchedule table

3

Nov-20-2022

Nov-30-2022

Fetch dates from Schedule table

4

Dec-20-2022

Dec-25-2022

Fetch dates from Schedule table

 

Filter1

 

Scenario ID

(Blank)

S1

S2 - Selected

 

Filter2

 

Region (Single Select & Mandatory)

R1 - Selected

R2

 

Result

 

Planned Schedules (S2 selected)

 

ScheduleID

Planned_Start_Date

Planned_End_Date

Remarks

1

Nov-15-2022

Nov-20-2022

Fetch dates from Schedule table

2

Nov-15-2022

Nov-25-2022

Fetch dates from Schedule table

3

Nov-20-2022

Nov-30-2022

Fetch dates from PlannedSchedule table

4

Dec-20-2022

Dec-25-2022

Fetch dates from Schedule table

 

Approach 1: I tried creating view with left join but was not able to get this result so tried using SQL Table function in azure databricks

 

CREATE FUNCTION schedule_forecast(SCENARIO_ID STRING)

    RETURNS TABLE(ScheduleID STRING, PLANNED_ START_DATE, PLANNED_FINISH_DATE)

    RETURN SELECT

                        s.ID,

                        ps.PLANNED_ START_DATE,

                        ps.PLANNED_ END_DATE,

                  FROM schedule as s

                  LEFT JOIN Planned_Schedule as ps ON s.ID = ps.ID AND ps. SCENARIO_ID = schedule_forecast. SCENARIO_ID;

 

Able to get expected result

 

select * from schedule_forecast ('S1')

select * from schedule_forecast ('S2')

 

I was hoping, Power Bi allows me to direct query based on SQL table function based on scenario filter as dynamic parameter. But This function is not showing up in power bi source navigator. Would this approach supported in power bi?

 

Approach 2:

 

Direct query to Schedule

Direct query to PlannedSchedule

 

Merge query as new and display expected result as above. This works but Merged query is changing to storage mode hence Its required to refresh.

 

Approach 3: Use partitioning concept to refresh dataset partially. I have not tried this yet. Let me know if this is only the option do achieve this result.

 

Approach 4: Use Import mode and refresh dataset but it takes around 20-30 mins.

 

Thanks,

 

Best Regards

Dhananjay

 

daXtreme
Solution Sage
Solution Sage

Hi @dhananjay_pbi 

 

First of all, you should optimize the transformations in Power Query for speed when you import data. Second, the source should be fast itself and this, in turn, calls for SQL optimization (if the source is a SQL query/sproc or a view). If this does not work, then, and only then, should you turn to the DQ mode.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Top Kudoed Authors