Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
bigrig33
New Member

5000 Rows of Data with duplicate data

Apologies if something similar has been posted. 

 

I have a table with 5281 rows of data. I have taken screenshot, data hid as priviliged. so the first two rows are the same person and he did two types of CPD training on different dates, I would like just one row but that row to show both his trainings and both start dates as I need statistics from this. So rows 2-9 there are 4 different people but they have done two CPD trainings. Now not all 5281 row are duplicates, some are unique and also some entries have 3 entries. How do I just have one row for the duplicates containing all the data and I don't want to delete any information. Any help greatly appreciated. Capture duplicates.PNG

12 REPLIES 12
v-xinruzhu-msft
Community Support
Community Support

Hi @bigrig33 

You can group by the fullname and use Text.Combine() to combine them, you can refer to the following code to advanced editor.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlLSUUrUKEnVBNJGBkbG+ob6pkqxOlAJhKAZWNAYU7UhQgJd0ARTtRFYwhRVtbFSbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Type = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Type", type text}, {"Date", type date}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Text.From([Date])),
    #"Grouped Rows1" = Table.Group(#"Added Custom", {"ID"}, {{"Count", each  Text.Combine([Type],","), type text}, {"aa", each Text.Combine([Custom],","), type text}})
in
    #"Grouped Rows1"

 Output

vxinruzhumsft_0-1697685263303.png

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

mussaenda
Super User
Super User

Hi @bigrig33 ,

 

Better to Pivot the data with the Traning and Start Date as the Value

 

ooooor

 

You can have a dim table of the entries and the table on your screenshot will be the fact table

@mussaenda Many thanks for replies, but again my stats will be skewed. So take the first two entires on screenshot, same full name, same payno, but they did two trainings on two seperate days, is there any way of getting all that info on the one line, without any data being lost.  Like for Instance as i have over 5000 rows, I will have the same fullname duplicated 3 and 4 times because they have done 3 or 4 different trainings, but I still need to get stats on all the trainings they did, but I just want one entry of the Fullname and also the Payno not 3 or 4, if that makes sense. Regards..

I understand, but have you tried the suggestion? That is why it is important to have a star schema when building your data. For situations like this.

@mussaenda i tried it but unfortunately doesn't seem to be working, see I receive this data on two seperate sheets, I don't create the data, so alot of cleanup, but ultimately, I want to know who has completed training and who hasn't but I don't want 3 or 4 records per person, I just want one row to show all the information relating to that person including there trainings and start dates for the training

Please provide a sample data that we can copy to provide with a solution

Hi @mussaenda 

So I had to edit data as some sensitive, but you will get the jist. I have multipe entries of similar data based over 5200 rows, I want just one row per entry per payno but i want all the data to follow, so lets take the first two lines, can I have one entry for those two lines,but both training entries and also start dates on the same line. 

 

 

Capture1910.PNG

Hi @bigrig33 ,


you can do like this to have one entry

mussaenda_0-1697710569515.png
or you can do like this

mussaenda_1-1697710747054.png

 

or you can do like this and create a star schema

mussaenda_2-1697710823942.png

mussaenda_3-1697710833262.png

 

other than those, i hope others will meet your requirements.

Hope this helps

 

Greg_Deckler
Super User
Super User

@bigrig33 In Power Query Editor, select all of the columns where there is duplicate information. (Everything except Training and Start Date in your case). Right-click one of those column headers and select "Remove duplicates".

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJR8s9LBZIh5flA0jnARSOguDI5Iz8nP71SEyhiaKBvBEJGhkqxOji0YCoLKUpMriTVbPfU/KL0VGKMx6USv/m+mXkpWByFpjAWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type date}}),
    #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"Column1", "Column2", "Column3"})
in
    #"Removed Duplicates"

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler many thanks for the response, so when I highlight those columns, remove duplicates isn't coming up as an option. It's only if I highlight the first column will remove duplicates happen, but If i remove duplicates from that, it removes the stats I need. Any way around this. Regards..

@bigrig33 Not sure why that isn't coming up. I click the first column, hold down Shift, select the last column ( or use Ctrl to select multiple, discontiguous columns). Right-click the header of one of the columns and it shows up (see screen shot). But regardless, the code you want is simply this (you can add it in Advanced Editor):

#"Removed Duplicates" = Table.Distinct(#"Previous Step", {"Column1", "Column2", "Column3"})

Greg_Deckler_0-1697637737678.png

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler again really appreciate you taking the time to respond, but I'm really a novice with excel. So I suppose I'm looking for it to be as straighforward as to take the first two entries on screenshot, same full name, same payno, but they did two trainings on two seperate days, is there any way of getting all that info on the one line, without any data being lost.  Like for Instance as i have over 5000 rows, I will have the same fullname duplicated 3 and 4 times because they have done 3 or 4 different trainings, but I still need to get stats on all the trainings they did, but I just want one entry of the Fullname and also the Payno not 3 or 4, if that makes sense. Regards..

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors