Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
EpicTriffid
Helper IV
Helper IV

Extracting result of Regular Expression using R

Hi all,

 

I've spent all day on this and don't feel any closer to achieving what I need. I don't really understand R or Python. I have a use case where I'm expecting to need to search a field for a specific unique ID. My data looks like the below:

 

EpicTriffid_0-1687963019622.png

 

In this case, I need to extract where is says 55133025_Preq_1. The ID will change in the future, but that is the basic format it will appear in. 8 digits, an underscore, a few letters (which will be either pReq or DCTReq), and underscore, then a number that will gradually increase over time. I have the regular expression, which is "[0-9]+_[A-Za-z]+_[0-9]+"

 

Using the following R script, I have gotten to the point where i can identify those fields with the ID in them:

 

# 'dataset' holds the input data for this script
pattern <- "[0-9]+_[A-Za-z]+_[0-9]+"
hasUID <- function(x) {grepl(pattern, as.character(x), ignore.case=TRUE)}
output <- within(dataset,{UID=hasUID(dataset$Column1)})

 

Which gives me:

EpicTriffid_1-1687963268424.png

 

Now what I need is for the new column to actually contain the extracted ID, not just to tell me if it exists in the field. This is what I just cannot get it to do. I am aware there is the Javascript method, but this is a report that will be in the Power BI Service, meaning I can't use that method. I am happy to use Python, but i know even less about that than I do R. 

 

Any help is much much appreciated.

 

7 REPLIES 7
Ahmedx
Super User
Super User

plse try this cod:

# 'dataset' holds the input data for this script
pattern <- "[0-9]+_[A-Za-z]+_[0-9]+"
extractUID <- function(x) {
  match <- regexpr(pattern, as.character(x), ignore.case = TRUE)
  if (match != -1) {
    substring(x, match, match + attr(match, "match.length") - 1)
  } else {
    NA
  }
}
output <- within(dataset, {UID <- sapply(Column1, extractUID)})
Ahmedx
Super User
Super User

your problem can be solved in Power query like this:

[ y = 
Text.Contains([Column1],"55133025_Preq_1"),
to = if y= true then "55133025_Preq_1" else null][to]

Screen Capture #1347.png

Hi @Ahmedx 

 

Unfortunately, this doesn't work as the ID I've used above is an example. I would expect it to change all the time. Hence the need for a regular expression.

you can specify how it will change and in which column it will happen

This picture might be a better example of the data I'll be using:

 

EpicTriffid_0-1688026163378.png

 

pls try

Text.Combine( List.LastN(
Splitter.SplitTextByCharacterTransition(List.Transform({1..47},Character.FromNumber) & List.Transform({58..127},Character.FromNumber),{"0".."9"})([Column1]),2))

Screen Capture #1350.png

No, I'm sorry, this doesn't help either. The ID can appear anywhere in the string, not just what I have shown above, hence needing to use R for the RegEx

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.