Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am working with a bad quality dataset that has many annoying errors. Please help me with one that I haven't solved yet.
There is two columns: SERIAL-NUMBER and DESCRIPTION
- Serialnumber should be in format ####-#### for example 1230-0123.
- Description is just a text
The problem:
There is several rows where description is written into serial-number column. Serial-number may be mentioned in the middle of that text. I want to find and keep only the serial number and remove all extra text. Serial-number to keep can be recognized by its' format.
Question:
How to only keep value that is in format ####-#### and remove all other text from the serial-number column?
Solved! Go to Solution.
Thanks to all who answered!
In fact, I took a part from everybody's answer to build my own solution.
I created a new column where...
1...all acharacters but digits and used delimeters are removed.
2...text is splitted by any defined delimeter.
3...text is combined back with delimeter "-", but only if format is correct (4 digits + 4 digits)
Thanks to all who answered!
In fact, I took a part from everybody's answer to build my own solution.
I created a new column where...
1...all acharacters but digits and used delimeters are removed.
2...text is splitted by any defined delimeter.
3...text is combined back with delimeter "-", but only if format is correct (4 digits + 4 digits)
Hi @SamiS ,
Glad to hear that it can be solved by yourself with others' ideas and thanks for your sharing your workaround.
You can accept your reply as solution, that way, other community members could easily find the answer when they get same issues.
Best Regards,
Community Support Team _ Yingjie Li
Hi,
if you have only that hyphen in your text, you can easily achieve this:
You have to:
- extract text before delimiter -
- extract last 4 characters
- extract text after delimiter -
- extract first 4 characters
- merge the two column
- remove columns not necessary
That's done
If this post is useful to help you to solve your issue consider giving the post a thumbs up
and accepting it as a solution !
NewStep=let lst={"0".."9","#","-"},fx=(x,y,z)=>Splitter.SplitTextByCharacterTransition(x,y,z) in Table.TransformColulmns(PreviousStepName,{"Serial Number",each List.RemoveNulls(List.TransformMany(fx(lst,each not List.Contains(lst,_),_),each fx((x)=>not List.Contains(lst,x),lst,_),(x,y)=>let a=Text.SplitAny(y,"-#") in if List.Count(a)=2 and Text.Length(a{0})=4 and Text.Length(a{1})=4 then Text.Combine(a,"-") else null)){0}?})
I didn't get this to work. After fixing typos teh first error was something like "there is 3 arguments for a function that excepts 2". I fixed it and then there was an other one "cannot convert a value of type Function to type List".
However, I got some ideas how to approach this from your answer. Thank you!
Hi,
Here is an example of existing data:
And here is how it should look like:
Those are shop floor quality notes from manufacturing operators. Unfortunately system hasn't forced operators to fill fields correctly.
I already have filled empty description fields by adding custom column:
= Table.AddColumn(#"Renamed Columns2", "NewDescription", each if [Description] = null then [Serial-Number] else [Description])
Next I want to remove extra strings from Serial-Number and keep actual serial number only.
If there are not numbers or hyphens in the other text, you can use this expression in your custom column.
= Text.Select([Description], {"-"} & {"0".."9"})
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.