Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Jack007
Frequent Visitor

Cleaning contaminated data

I have a dataset with about 8 million rows with both customer names and order numbers contaminated.  

 

I can clean it manually by searching for the customer name and getting the correct order number(Aaaa).  Then I filter rows that contain text Aaaa. Replace empty cells with Jack, can anyone help me automate the process?

 

Order numCustomer nameOrder Date
AaaaJack2021/1/1
BBBBJack2021/1/2
BBBB/178451=####  ####2021/1/3
Aaaa_46464646?????2021/1/4

 

Order numCustomer nameOrder Date
AaaJack2021/1/1
BBBBJack2021/1/2
AaaJack2021/1/3
BBBB Jack2021/1/4
1 REPLY 1
lbendlin
Super User
Super User

Learn about ReplaceValue and FillDown.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WckxMTFTSUfJKTM5WitWB8ONNzCAQKKEUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order num" = _t, name = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"name"}),
    #"Filtered Rows" = Table.SelectRows(#"Replaced Value", each Text.StartsWith([Order num], "Aaa")),
    #"Replaced Value1" = Table.ReplaceValue(#"Filtered Rows",each [Order num],each Text.Start([Order num],3),Replacer.ReplaceValue,{"Order num"}),
    #"Filled Down" = Table.FillDown(#"Replaced Value1",{"name"})
in
    #"Filled Down"

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.