Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
SamiS
Frequent Visitor

Replace or remove values that are not in specified format. (Find and keep values)

I am working with a bad quality dataset that has many annoying errors. Please help me with one that I haven't solved yet.

 

There is two columns:   SERIAL-NUMBER and DESCRIPTION

- Serialnumber should be in format  ####-####  for example 1230-0123.

- Description is just a text

 

The problem:

There is several rows where description is written into serial-number column. Serial-number may be mentioned in the middle of that text. I want to find and keep only the serial number and remove all extra text. Serial-number to keep can be recognized by its' format.

 

Question:

How to only keep value that is in format ####-#### and remove all other text from the serial-number column?

 

SamiS_0-1646040351587.png

 

 

1 ACCEPTED SOLUTION
SamiS
Frequent Visitor

Thanks to all who answered!

In fact, I took a part from everybody's answer to build my own solution.

 

SamiS_0-1646126540346.png

 

I created a new column where...

1...all acharacters but digits and used delimeters are removed.

2...text is splitted by any defined delimeter.

3...text is combined back with delimeter "-", but only if format is correct (4 digits + 4 digits)

 

 

View solution in original post

8 REPLIES 8
SamiS
Frequent Visitor

Thanks to all who answered!

In fact, I took a part from everybody's answer to build my own solution.

 

SamiS_0-1646126540346.png

 

I created a new column where...

1...all acharacters but digits and used delimeters are removed.

2...text is splitted by any defined delimeter.

3...text is combined back with delimeter "-", but only if format is correct (4 digits + 4 digits)

 

 

v-yingjl
Community Support
Community Support

Hi @SamiS ,

Glad to hear that it can be solved by yourself with others' ideas and thanks for your sharing your workaround.

 

You can accept your reply as solution, that way, other community members could easily find the answer when they get same issues.

 

Best Regards,
Community Support Team _ Yingjie Li

serpiva64
Super User
Super User

Hi,

if you have only that hyphen in your text, you can easily achieve this:

serpiva64_0-1646055057734.png

You have to:

- extract text before delimiter - 

serpiva64_1-1646055161149.png

- extract last 4 characters

serpiva64_2-1646055238034.png

- extract text after delimiter - 

- extract first 4 characters

- merge the two column

serpiva64_3-1646055378424.png

- remove columns not necessary

That's done

If this post is useful to help you to solve your issue consider giving the post a thumbs up 

 and accepting it as a solution !

 

 

 

wdx223_Daniel
Super User
Super User

NewStep=let lst={"0".."9","#","-"},fx=(x,y,z)=>Splitter.SplitTextByCharacterTransition(x,y,z) in Table.TransformColulmns(PreviousStepName,{"Serial Number",each List.RemoveNulls(List.TransformMany(fx(lst,each not List.Contains(lst,_),_),each fx((x)=>not List.Contains(lst,x),lst,_),(x,y)=>let a=Text.SplitAny(y,"-#") in if List.Count(a)=2 and Text.Length(a{0})=4 and Text.Length(a{1})=4 then Text.Combine(a,"-") else null)){0}?})

I didn't get this to work. After fixing typos teh first error was something like "there is 3 arguments for a function that excepts 2". I fixed it and then there was an other one "cannot convert a value of type Function to type List".

 

However, I got some ideas how to approach this from your answer. Thank you!

BeaBF
Impactful Individual
Impactful Individual

@SamiS  can you paste some data and an example of the expected result?

 

BF

 

SamiS
Frequent Visitor

Hi,

 

Here is an example of existing data:

SamiS_0-1646049730915.png

 

And here is how it should look like:

SamiS_1-1646049827110.png

 

Those are shop floor quality notes from manufacturing operators. Unfortunately system hasn't forced operators to fill fields correctly.

 

I already have filled empty description fields by adding custom column:

= Table.AddColumn(#"Renamed Columns2", "NewDescription", each if [Description] = null then [Serial-Number] else [Description])

 

Next I want to remove extra strings from Serial-Number and keep actual serial number only.

If there are not numbers or hyphens in the other text, you can use this expression in your custom column.

 

= Text.Select([Description], {"-"} & {"0".."9"})

 

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors