Extracting result of Regular Expression using R

EpicTriffid · ‎06-28-2023

Hi all,

I've spent all day on this and don't feel any closer to achieving what I need. I don't really understand R or Python. I have a use case where I'm expecting to need to search a field for a specific unique ID. My data looks like the below:

In this case, I need to extract where is says 55133025_Preq_1. The ID will change in the future, but that is the basic format it will appear in. 8 digits, an underscore, a few letters (which will be either pReq or DCTReq), and underscore, then a number that will gradually increase over time. I have the regular expression, which is "[0-9]+_[A-Za-z]+_[0-9]+"

Using the following R script, I have gotten to the point where i can identify those fields with the ID in them:

# 'dataset' holds the input data for this script
pattern <- "[0-9]+_[A-Za-z]+_[0-9]+"
hasUID <- function(x) {grepl(pattern, as.character(x), ignore.case=TRUE)}
output <- within(dataset,{UID=hasUID(dataset$Column1)})

Which gives me:

Now what I need is for the new column to actually contain the extracted ID, not just to tell me if it exists in the field. This is what I just cannot get it to do. I am aware there is the Javascript method, but this is a report that will be in the Power BI Service, meaning I can't use that method. I am happy to use Python, but i know even less about that than I do R.

Any help is much much appreciated.

Ahmedx · ‎06-29-2023

plse try this cod:

# 'dataset' holds the input data for this script
pattern <- "[0-9]+_[A-Za-z]+_[0-9]+"
extractUID <- function(x) {
  match <- regexpr(pattern, as.character(x), ignore.case = TRUE)
  if (match != -1) {
    substring(x, match, match + attr(match, "match.length") - 1)
  } else {
    NA
  }
}
output <- within(dataset, {UID <- sapply(Column1, extractUID)})

Ahmedx · ‎06-28-2023

your problem can be solved in Power query like this:

[ y = 
Text.Contains([Column1],"55133025_Preq_1"),
to = if y= true then "55133025_Preq_1" else null][to]

EpicTriffid · ‎06-29-2023

Hi @Ahmedx

Unfortunately, this doesn't work as the ID I've used above is an example. I would expect it to change all the time. Hence the need for a regular expression.

Ahmedx · ‎06-29-2023

you can specify how it will change and in which column it will happen

EpicTriffid · ‎06-29-2023

This picture might be a better example of the data I'll be using:

Ahmedx · ‎06-29-2023

pls try

Text.Combine( List.LastN(
Splitter.SplitTextByCharacterTransition(List.Transform({1..47},Character.FromNumber) & List.Transform({58..127},Character.FromNumber),{"0".."9"})([Column1]),2))

EpicTriffid · ‎06-29-2023

No, I'm sorry, this doesn't help either. The ID can appear anywhere in the string, not just what I have shown above, hence needing to use R for the RegEx