Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Deduplication troubles

Im sure im missing something simple but appricate the help.  Im importing a big XML file that has some duplications in it.

Changes some info for provacy but the main point is I get duplications on the IP and ID ccolumns.  For asmpe the first two lines you have ID 105146 twice.  In the tracking column one has AGENT and one has IP.  I want to deploucate all these remccing the link with IP in the tracking column.

 

Help would be appreciated.  Thanks.

 

lfkent73_0-1687159080729.png

 

5 REPLIES 5
Anonymous
Not applicable

BUMP

Anonymous
Not applicable

Hi @Anonymous,

Can you please share some dummy data that keep the raw data structure with expected results? It should help us clarify your scenario and test to coding formula.

How to Get Your Question Answered Quickly  

Regards,

Xiaoxin Sheng

Anonymous
Not applicable

Raw data looks like this

NameTrackingOwnerApplicationIPQID
server1Agentjohnwindows1.2.3.4112334
server1IPjohnwindows1.2.3.4112334
server2Agentjohnlinux5.6.7.8113445
server2IPjohnlinux5.6.7.8113445
server3Agentjohnsql10.11.12.13115667
server3IPjohnsql10.11.12.13115667

 

There are rows wich are duplicates except for the tracking column which is different between the two duplicate row.

 

I need to find duplicate rows and remove the row with IP in the tracking.  End result would be this:

 

NameTrackingOwnerApplicationIPQID
server1Agentjohnwindows1.2.3.4112334
server2Agentjohnlinux5.6.7.8113445
server3Agentjohnsql10.11.12.13115667

 

Anonymous
Not applicable

HI @Anonymous,

I think they may relate to your source data structure. For this scenario, you can add filter on the category to filter records equal to blank or filter on tracking field if it equal to IP.

Regards,

Xiaoxin Sheng

Anonymous
Not applicable

Your right the source data contains the duplciation but i am pulling to from an API and I have no ability to clean it up before ingesting the data.

 

I considerd just filtering out field equal to IP however there are plenty of rows that IP is not a duplcate so i need to keep them in the data.  I need to find duplicate rows based on IP and ID matching and remove the line which has IP in the tracking column.

 

 

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.