Re: Conditional Join in PowerQuery Advanced editor...

McSarah · ‎08-27-2020

I am hoping to replicate the following SQL self join logic in Power Query (or DAX I guess) -- NOT as several sequential steps (join on A, create conditional column for B, filter on B conditional column, create condional column for C, filter on C conditional column...which I could easily do in the GUI), but in a way that actually avoids loading the unfiltered data on the inital A join -- because that intermediate data is huge and *extremely* slow to load, while the data with all three joins applied is managable. Most of the other threads I've seen on conditional joins seem to apply the conditions as additional steps, not in the initial join itself.

Here's the SQL:

SELECT

A.ID ,A.Date ,A.State

,B.ID ,B.Date,B.State

FROM DailyLog A

FULL OUTER JOIN DailyLog B

ON A.ID = B.ID AND A.Date <> B.Date AND A.State <> B.State

Basically, I need to isolate records that show state changes for a given ID between any two dates. It might help to know that most IDs are present on most dates, but most IDs do not change state between most dates. Any date could have an ID that's not present on any other date.

My thought was to do this transformation in Power Query , then use DAX to retrieve pertinent data for the IDs/ Dates/ States that actually land in this table. These data are not practical to load into SQL, or I'd do this data prep there.

Icey · ‎08-31-2020

Hi @McSarah ,

If you connect to SQL Server, how about typing your sql sode here?

Reference: Connect Power BI to SQL Server

Best Regards,

Icey

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

McSarah · ‎08-31-2020

Unfortunately, I am not connecting to SQl server in this case. These data come from flat files. If I were connecting to SQL, I would do this join there.

edhans · ‎08-31-2020

@McSarah then you cannot do this. Power Query must read all data in the flat files to know what to pull. There is no server available to handle that filtering for you.

Also, you should never use that advanced SQL box. It has a number of bad side effects:

Can cause refresh issues in the service as your permissions have to be elevated.
It breaks all further query folding in Power Query.
It prevents Incremental Refresh from working at all.

Either do the full query in Power Query, or create a view on the SQL Server. Both will avoid all of the issues listed above.

Please mark one of these posts as the solution so this thread can be marked as solved to assist others that may be searching for similar info.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

McSarah · ‎08-31-2020

I am not trying to filter on the initial data pull from the files, I'm trying to filter inside a join downstream from the import -- I am comparing/ self joining tables after I've already imported and merged the source files into a single tall table. Power BI can handle the initial imports just fine; what I'm trying to avoid is getting an extremely large intermediate product on the downstream self join in Power Query.

edhans · ‎08-31-2020

Perhaps if you shared some data and told us what your end goal was vs saying "this is how I do something in SQL, but Power Query won't do it" we could help. Power BI and SQL Server have a lot of the same logic, but self-joins in Power BI can be very inefficent. There may be other ways.

But starting off with trying to make Product A behave the same as Product B can often take us down the wrong path.

How to get good help fast. Help us help you.
How to Get Your Question Answered Quickly
How to provide sample data in the Power BI Forum

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

McSarah · ‎08-31-2020

I just need to process the join I put in my original question in an efficient way.

For the purposes of this question we can assume that DailyLog is the only table, and it only contains three columns -- ProductID, Date, State.

The underlying table, DailyLog, is itself the result of a PowerQuery append operation on many underlying flat files--one per date--with identical format (ProductID, Date, State). These files were imported using a folder connection.
The need is to isolate all records where a product ID has different states on different days (where they had different states in different source flat files)
The issue is that the intermediate PowerQuery product created by any one of the join conditions is huge because every component appears many time in the underlying table, while the combination of the three join conditions is relatively small. I can't just join on ProductID, or date<>date, or state<>state without blowing up the data. I want PBI to evaluate all three at once one way or another.

If it were a regular join on three columns, I would just create one calculated column that combines the three source columns into one, then self join on that, but since the needed join is conditional, I'm having trouble wrapping my head around how this could work.

Here's how I would do this in SQL:

SELECT

A.ProductID ,A.Date ,A.State

,B.ProductID ,B.Date,B.State

FROM DailyLog A

FULL OUTER JOIN DailyLog B

ON A.ID = B.ID AND A.Date <> B.Date AND A.State <> B.State

edhans · ‎08-31-2020

Well, without data, I am afraid we are talking in circles. I'd like to try to help, but I'm not great at just verbalizing or writing down how to do something without some hard data to show what I'm doing. The "picture is worth a thousand words" concept.

Perhaps someone more versed in SQL can do what you said in their head, then do the Power Query part in their head, then write down the answer you need.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

edhans · ‎08-27-2020

I am not aware of a way to set conditions like that in the join, but if you are connecting to SQL Server, this shouldn't be an issue. If you go "the long way" it should fold all of that and SQL Server do all of the work for you.

If this is not a SQL Server connection, but you are replicating SQL logic on text files, it won't matter. Processing it in the join or later is the same thing. Your statement "These data are not practical to load into SQL, or I'd do this data prep there." suggests you are doing this outside of SQL Server.

So if there is no db engine to do the processing, the Power Query mashupengine has to do all of the work. So if you have 1,000,000 records and only need 10 records it will have to read and discard the other 999,990 records. No way around that. That is the beauty of query folding with a server. You tell the server to do all of the work and just return 10 records.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

Conditional Join in PowerQuery Advanced editor?

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - February 2026