Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!

Reply
EpicTriffid
Helper IV
Helper IV

Extracting result of Regular Expression using R

Hi all,

 

I've spent all day on this and don't feel any closer to achieving what I need. I don't really understand R or Python. I have a use case where I'm expecting to need to search a field for a specific unique ID. My data looks like the below:

 

EpicTriffid_0-1687963019622.png

 

In this case, I need to extract where is says 55133025_Preq_1. The ID will change in the future, but that is the basic format it will appear in. 8 digits, an underscore, a few letters (which will be either pReq or DCTReq), and underscore, then a number that will gradually increase over time. I have the regular expression, which is "[0-9]+_[A-Za-z]+_[0-9]+"

 

Using the following R script, I have gotten to the point where i can identify those fields with the ID in them:

 

# 'dataset' holds the input data for this script
pattern <- "[0-9]+_[A-Za-z]+_[0-9]+"
hasUID <- function(x) {grepl(pattern, as.character(x), ignore.case=TRUE)}
output <- within(dataset,{UID=hasUID(dataset$Column1)})

 

Which gives me:

EpicTriffid_1-1687963268424.png

 

Now what I need is for the new column to actually contain the extracted ID, not just to tell me if it exists in the field. This is what I just cannot get it to do. I am aware there is the Javascript method, but this is a report that will be in the Power BI Service, meaning I can't use that method. I am happy to use Python, but i know even less about that than I do R. 

 

Any help is much much appreciated.

 

7 REPLIES 7
Ahmedx
Super User
Super User

plse try this cod:

# 'dataset' holds the input data for this script
pattern <- "[0-9]+_[A-Za-z]+_[0-9]+"
extractUID <- function(x) {
  match <- regexpr(pattern, as.character(x), ignore.case = TRUE)
  if (match != -1) {
    substring(x, match, match + attr(match, "match.length") - 1)
  } else {
    NA
  }
}
output <- within(dataset, {UID <- sapply(Column1, extractUID)})
Ahmedx
Super User
Super User

your problem can be solved in Power query like this:

[ y = 
Text.Contains([Column1],"55133025_Preq_1"),
to = if y= true then "55133025_Preq_1" else null][to]

Screen Capture #1347.png

Hi @Ahmedx 

 

Unfortunately, this doesn't work as the ID I've used above is an example. I would expect it to change all the time. Hence the need for a regular expression.

you can specify how it will change and in which column it will happen

This picture might be a better example of the data I'll be using:

 

EpicTriffid_0-1688026163378.png

 

pls try

Text.Combine( List.LastN(
Splitter.SplitTextByCharacterTransition(List.Transform({1..47},Character.FromNumber) & List.Transform({58..127},Character.FromNumber),{"0".."9"})([Column1]),2))

Screen Capture #1350.png

No, I'm sorry, this doesn't help either. The ID can appear anywhere in the string, not just what I have shown above, hence needing to use R for the RegEx

Helpful resources

Announcements
FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.