Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
I am hoping to replicate the following SQL self join logic in Power Query (or DAX I guess) -- NOT as several sequential steps (join on A, create conditional column for B, filter on B conditional column, create condional column for C, filter on C conditional column...which I could easily do in the GUI), but in a way that actually avoids loading the unfiltered data on the inital A join -- because that intermediate data is huge and *extremely* slow to load, while the data with all three joins applied is managable. Most of the other threads I've seen on conditional joins seem to apply the conditions as additional steps, not in the initial join itself.
Here's the SQL:
SELECT
A.ID ,A.Date ,A.State
,B.ID ,B.Date,B.State
FROM DailyLog A
FULL OUTER JOIN DailyLog B
ON A.ID = B.ID AND A.Date <> B.Date AND A.State <> B.State
Basically, I need to isolate records that show state changes for a given ID between any two dates. It might help to know that most IDs are present on most dates, but most IDs do not change state between most dates. Any date could have an ID that's not present on any other date.
My thought was to do this transformation in Power Query , then use DAX to retrieve pertinent data for the IDs/ Dates/ States that actually land in this table. These data are not practical to load into SQL, or I'd do this data prep there.
Hi @McSarah ,
If you connect to SQL Server, how about typing your sql sode here?
Reference: Connect Power BI to SQL Server
Best Regards,
Icey
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Unfortunately, I am not connecting to SQl server in this case. These data come from flat files. If I were connecting to SQL, I would do this join there.
@McSarah then you cannot do this. Power Query must read all data in the flat files to know what to pull. There is no server available to handle that filtering for you.
Also, you should never use that advanced SQL box. It has a number of bad side effects:
Either do the full query in Power Query, or create a view on the SQL Server. Both will avoid all of the issues listed above.
Please mark one of these posts as the solution so this thread can be marked as solved to assist others that may be searching for similar info.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingI am not trying to filter on the initial data pull from the files, I'm trying to filter inside a join downstream from the import -- I am comparing/ self joining tables after I've already imported and merged the source files into a single tall table. Power BI can handle the initial imports just fine; what I'm trying to avoid is getting an extremely large intermediate product on the downstream self join in Power Query.
Perhaps if you shared some data and told us what your end goal was vs saying "this is how I do something in SQL, but Power Query won't do it" we could help. Power BI and SQL Server have a lot of the same logic, but self-joins in Power BI can be very inefficent. There may be other ways.
But starting off with trying to make Product A behave the same as Product B can often take us down the wrong path.
How to get good help fast. Help us help you.
How to Get Your Question Answered Quickly
How to provide sample data in the Power BI Forum
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingI just need to process the join I put in my original question in an efficient way.
For the purposes of this question we can assume that DailyLog is the only table, and it only contains three columns -- ProductID, Date, State.
If it were a regular join on three columns, I would just create one calculated column that combines the three source columns into one, then self join on that, but since the needed join is conditional, I'm having trouble wrapping my head around how this could work.
Here's how I would do this in SQL:
SELECT
A.ProductID ,A.Date ,A.State
,B.ProductID ,B.Date,B.State
FROM DailyLog A
FULL OUTER JOIN DailyLog B
ON A.ID = B.ID AND A.Date <> B.Date AND A.State <> B.State
Well, without data, I am afraid we are talking in circles. I'd like to try to help, but I'm not great at just verbalizing or writing down how to do something without some hard data to show what I'm doing. The "picture is worth a thousand words" concept.
Perhaps someone more versed in SQL can do what you said in their head, then do the Power Query part in their head, then write down the answer you need.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingI am not aware of a way to set conditions like that in the join, but if you are connecting to SQL Server, this shouldn't be an issue. If you go "the long way" it should fold all of that and SQL Server do all of the work for you.
If this is not a SQL Server connection, but you are replicating SQL logic on text files, it won't matter. Processing it in the join or later is the same thing. Your statement "These data are not practical to load into SQL, or I'd do this data prep there." suggests you are doing this outside of SQL Server.
So if there is no db engine to do the processing, the Power Query mashupengine has to do all of the work. So if you have 1,000,000 records and only need 10 records it will have to read and discard the other 999,990 records. No way around that. That is the beauty of query folding with a server. You tell the server to do all of the work and just return 10 records.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingThe Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 17 | |
| 9 | |
| 9 | |
| 7 | |
| 7 |