Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi,
I am trying to merge two queries. Each query has three keys, I have tried merging with multiple columns and I have merged columns and then tried the merge the two query. Both approaches with no luck.
Actually the problem only manifest itself when you expand the resulting query. I seem to loose records.
when i click on the merged table column i can see the record it matched on but after i expand the table is empty.
I have seen a simular post dated dec 2015 but it had no resolution other than to send a frown.
any help would be appreciated.
Thanks
What type of join are you doing in the Merge? Depending on what type you have chosen will depend on whether you should lose rows on the Left or right of the join. There is also a type to keep all rows, but you need to be aware of the data gaps this will create when rows can't be matched.
Thanks Ross,
I am trying to do an inner join and I am aware of the consequences of the various joins. The situation is this.
File one:
h2_id Date Code
123 12/2/2005 C18
123 12/4/2015 C00
File two:
h2_id Date Code
123 12/2/2005 C18
123 12/3/2012 C18
This is a simplified picture but I need to rank both file and get separate queries with the first and second ranks.
I used this grouping command to get a ranking:
= Table.Group(#"Sorted Rows", {"h2_id"}, {{"Count", each Table.AddIndexColumn( _, "Index",1,1), type table}})
So now I have:
h2_id Date Code Rank
123 12/2/2005 C18 1
123 12/4/2015 C00 2
And
h2_id Date Code Rank
123 12/2/2005 C18 1
123 12/3/2012 C18 2
Now I am trying to get a join on the highlighted records! Is there something peculiar about the index column. I can see a “one” but is it really a one? If I explicitly create “Ones” in each of the columns I get a successful join. I am going to try a
different method to get my first and second ranking by doing a group and Min aggregate.
Any advice is welcome.
Fred
The highlighting didn't survive my copy/paste but you can see i should get a match on the first record in each file
I'm quite lost at what you are trying to achieve. You have created a Rank field in both tables, but they are a different type of ranking correct? What is your expected output?
Ross. The second file is a subset of the first. All records of the first exist in the second. the first file has aound 16 thousand reocrds and the seciond file has about two thousend. The final result is likely to be a couple of hundred. So I work for the cancer council victoria and files are all to do with types fo cancer. So the first file has all of the cancers people in aparticular study have had adn the second file is just one type of cancer. but even the one type people can have mutiple times. so the relationship between the two file is m:n. I am looking for all the people that had second cancer after the first theta is different from the first one.
In my example it was just the first step of joining the two file on the first cancer that is failing. It should have matched on
123 12/2/2005 C18 1
But when I use the rank created with Table.AddIndexColumn it does not join correctly. I have tried it on a small sample just imported from excel. If I create the index column in excel it joins correctly.
The full blown files come from a SQL server DW so creating the index column before hand is noyt an option.
Hope i made myself a little clearer.
I have a small pbix file an Excel file that illustrates the point. How can i attach this example to this post?
Still trying to get a handle on the problem. Based on your description:
File 1 contains every instance of a person having cancer in your study.
File 2 is the same as file 1, but is filtered down on a specific cancer.
The object is to achieve a listing of People (H2_ID?) from File 2, who have had another cancer (Code?) occur after the instance shown in File 2?
Yes nearly there, The brief looked simple enough to me as well until I asked some questions and the in and outs come in to play.
The object is to achieve a Count of People (H2_ID?) by Code from File 2, who have had another cancer (Code?) occur after the instance shown in File 2 and is the first cancer they have had in file 2 is the first cancer in file 1.
Perhaps an alternate method? Here is a quick example:
"File One"
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtIBkvpAaGBgCmQ7G1ooxeogyRgDZQyNsMmYgGTAegwMlGJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [h2_id = _t, Date = _t, Code = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"h2_id", Int64.Type}, {"Date", type date}, {"Code", type text}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Date", Order.Ascending}})
in
#"Sorted Rows"
"File One First Entry" (Patients with 1 or less cancers are removed)
let
Source = #"File One",
#"Grouped Rows" = Table.Group(Source, {"h2_id"}, {{"First Code", each List.First([Code]), type text}, {"Count", each Table.RowCount(_)}}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Count] > 1))
in
#"Filtered Rows"
"File Two"
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtIBkvpAaGBgCmQ7G1ooxeogyRgDZQyNYDKxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [h2_id = _t, Date = _t, Code = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"h2_id", Int64.Type}, {"Date", type date}, {"Code", type text}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"Date", type date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type1",{"h2_id", "Code"},#"File One First Entry",{"h2_id", "First Code"},"File One First Entry",JoinKind.LeftOuter),
#"Expanded File One First Entry" = Table.ExpandTableColumn(#"Merged Queries", "File One First Entry", {"Count"}, {"Count"})
in
#"Expanded File One First Entry"
File One, is designed to be the complete list you had mentioned. From here we reference this into a new table called "File One First Entry" with the purpose of only taking the first cancer entry for each person. Each row should have have the Person's ID, the First Cancer recorded, and how many cancer findings were recorded in total. Since we don't need anyone with 1 finding or less, we filter them out.
File Two is loaded with a subsequent Left outer join to the "File One First Entry" table. This join will only succeed if both the Person ID and Cancer match. This means the only matches you will receive will be people with the Cancer listed in 'File Two' and where that was their first cancer match. Unfortunately this will also capture people who had the cancer first, but also had it in all subsequent findings.
I know this is deviating from your original design, however it might help resolve your joining problem.
I will try your appoach and built on that. I will not be able to do anything tomorrow but will let you know later this week how it went.
Thanks very much for your interest and help.
Fred
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Power BI update to learn about new features.