loosing rows after expanding merge query

Fnoorda · ‎02-15-2018

Hi,

I am trying to merge two queries. Each query has three keys, I have tried merging with multiple columns and I have merged columns and then tried the merge the two query. Both approaches with no luck.

Actually the problem only manifest itself when you expand the resulting query. I seem to loose records.

when i click on the merged table column i can see the record it matched on but after i expand the table is empty.

I have seen a simular post dated dec 2015 but it had no resolution other than to send a frown.

any help would be appreciated.

Thanks

Anonymous · ‎02-15-2018

What type of join are you doing in the Merge? Depending on what type you have chosen will depend on whether you should lose rows on the Left or right of the join. There is also a type to keep all rows, but you need to be aware of the data gaps this will create when rows can't be matched.

Fnoorda · ‎02-18-2018

Thanks Ross,

I am trying to do an inner join and I am aware of the consequences of the various joins. The situation is this.

File one:

h2_id Date Code

123 12/2/2005 C18

123 12/4/2015 C00

File two:

h2_id Date Code

123 12/2/2005 C18

123 12/3/2012 C18

This is a simplified picture but I need to rank both file and get separate queries with the first and second ranks.

I used this grouping command to get a ranking:

= Table.Group(#"Sorted Rows", {"h2_id"}, {{"Count", each Table.AddIndexColumn( _, "Index",1,1), type table}})

So now I have:

h2_id Date Code Rank

123 12/2/2005 C18 1

123 12/4/2015 C00 2

And

h2_id Date Code Rank

123 12/2/2005 C18 1

123 12/3/2012 C18 2

Now I am trying to get a join on the highlighted records! Is there something peculiar about the index column. I can see a “one” but is it really a one? If I explicitly create “Ones” in each of the columns I get a successful join. I am going to try a

different method to get my first and second ranking by doing a group and Min aggregate.

Any advice is welcome.

Fred

Fnoorda · ‎02-18-2018

The highlighting didn't survive my copy/paste but you can see i should get a match on the first record in each file

Anonymous · ‎02-18-2018

I'm quite lost at what you are trying to achieve. You have created a Rank field in both tables, but they are a different type of ranking correct? What is your expected output?

Fnoorda · ‎02-18-2018

Ross. The second file is a subset of the first. All records of the first exist in the second. the first file has aound 16 thousand reocrds and the seciond file has about two thousend. The final result is likely to be a couple of hundred. So I work for the cancer council victoria and files are all to do with types fo cancer. So the first file has all of the cancers people in aparticular study have had adn the second file is just one type of cancer. but even the one type people can have mutiple times. so the relationship between the two file is m:n. I am looking for all the people that had second cancer after the first theta is different from the first one.

In my example it was just the first step of joining the two file on the first cancer that is failing. It should have matched on

123 12/2/2005 C18 1

But when I use the rank created with Table.AddIndexColumn it does not join correctly. I have tried it on a small sample just imported from excel. If I create the index column in excel it joins correctly.

The full blown files come from a SQL server DW so creating the index column before hand is noyt an option.

Hope i made myself a little clearer.

Fnoorda · ‎02-18-2018

I have a small pbix file an Excel file that illustrates the point. How can i attach this example to this post?

Anonymous · ‎02-18-2018

Still trying to get a handle on the problem. Based on your description:

File 1 contains every instance of a person having cancer in your study.

File 2 is the same as file 1, but is filtered down on a specific cancer.

The object is to achieve a listing of People (H2_ID?) from File 2, who have had another cancer (Code?) occur after the instance shown in File 2?

Fnoorda · ‎02-18-2018

Yes nearly there, The brief looked simple enough to me as well until I asked some questions and the in and outs come in to play.

The object is to achieve a Count of People (H2_ID?) by Code from File 2, who have had another cancer (Code?) occur after the instance shown in File 2 and is the first cancer they have had in file 2 is the first cancer in file 1.

Anonymous · ‎02-18-2018

Perhaps an alternate method? Here is a quick example:

"File One"

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtIBkvpAaGBgCmQ7G1ooxeogyRgDZQyNsMmYgGTAegwMlGJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [h2_id = _t, Date = _t, Code = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"h2_id", Int64.Type}, {"Date", type date}, {"Code", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Date", Order.Ascending}})
in
    #"Sorted Rows"

"File One First Entry" (Patients with 1 or less cancers are removed)

let
    Source = #"File One",
    #"Grouped Rows" = Table.Group(Source, {"h2_id"}, {{"First Code", each List.First([Code]), type text}, {"Count", each Table.RowCount(_)}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Count] > 1))
in
    #"Filtered Rows"

"File Two"

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtIBkvpAaGBgCmQ7G1ooxeogyRgDZQyNYDKxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [h2_id = _t, Date = _t, Code = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"h2_id", Int64.Type}, {"Date", type date}, {"Code", type text}}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"Date", type date}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type1",{"h2_id", "Code"},#"File One First Entry",{"h2_id", "First Code"},"File One First Entry",JoinKind.LeftOuter),
    #"Expanded File One First Entry" = Table.ExpandTableColumn(#"Merged Queries", "File One First Entry", {"Count"}, {"Count"})
in
    #"Expanded File One First Entry"

File One, is designed to be the complete list you had mentioned. From here we reference this into a new table called "File One First Entry" with the purpose of only taking the first cancer entry for each person. Each row should have have the Person's ID, the First Cancer recorded, and how many cancer findings were recorded in total. Since we don't need anyone with 1 finding or less, we filter them out.

File Two is loaded with a subsequent Left outer join to the "File One First Entry" table. This join will only succeed if both the Person ID and Cancer match. This means the only matches you will receive will be people with the Cancer listed in 'File Two' and where that was their first cancer match. Unfortunately this will also capture people who had the cancer first, but also had it in all subsequent findings.

I know this is deviating from your original design, however it might help resolve your joining problem.

Fnoorda · ‎02-18-2018

I will try your appoach and built on that. I will not be able to do anything tomorrow but will let you know later this week how it went.

Thanks very much for your interest and help.

Fred

loosing rows after expanding merge query

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

Fabric Data Days starts November 4th!

loosing rows after expanding merge query

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly