This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
Hi all
I'm trying to do some pattern matching in the query editor. The intention being that I can then merge a reformatted postcode to an enormous table, which I do not want to upload in it's entirety to PBI online.
I have got validation working fine with grepl, but I need to ensure a common 8 character format. But I am getting an error in the column, the R scri[pt executes, but the column returns "error" on all rows.
This is where I am:
# 'dataset' holds the input data for this script
pattern1 <- "([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))"
pattern2 <- "[0-9][A-Za-z]{2}"
matches <- function(x) {gregexpr(pattern2 , as.character(x), perl=TRUE)}
part2 <- regmatches(dataset$post_code, matches(dataset$post_code))
PCDS <- within(dataset, {PCDS = part2})
I think the problem is that grepexpr returns a multidimensional array. I need the 1st index to go in the column "PCDS" but can't get the syntax right.
Please help! Thanks in advance!
Solved! Go to Solution.
Fixed it! The regex could be more comprehensive and it's very step by step, but this R script converts UK postcodes matching the regex pattern to an 8 character format in the query editor.
# 'dataset' holds the input data for this script
# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("\\s+|\\s+", "", x)
ws <- function(x){
if(x>0){strrep(" ",x)}
else {""}
}
pattern <- "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?)))) ?[0-9][A-Za-z]{2}"
match <- function(x) {gregexpr(pattern , as.character(x), perl=TRUE)}
part <- function(x) {c(regmatches(x, match(x)))}
pCodeList <- part(dataset$post_code)
pCodeVector <- as.character(pCodeList)
for(i in 1:length(pCodeVector))
{
pCodeVector[i] <- trim(pCodeVector[i])
pCodeLength <- nchar(pCodeVector[i])
pCodeRight <- toupper(substr(pCodeVector[i], pCodeLength - 2, pCodeLength))
pCodeLeft <- toupper(substr(pCodeVector[i],1, pCodeLength -3))
pCodeVector[i] <- paste(pCodeLeft, ws((8 - nchar(pCodeLeft)) - nchar(pCodeRight)), pCodeRight, sep='')
}
PCDS <- within(dataset, {PCDS = (pCodeVector)})Development discussion warmly welcomed.
Beth
x
@Anonymous,
Don't know to much about R script and the error message you have encountered with, however, you may modify the last statement with below:
dataset$PCDS <- within(dataset, {dataset$PCDS = part2})
Community Support Team _ Jimmy Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Unfortunately putting "dataset$" in either of those positions returns an empty table which then causes an error in the query editor. 😞
I'm still stuck with:
# 'dataset' holds the input data for this script
pattern1 <- "([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))"
pattern2 <- "[0-9][A-Za-z]{2}"
match1 <- function(x) {gregexpr(pattern1 , as.character(x), perl=TRUE)}
part1 <- function(x) {c(regmatches(x, match1(x)))}
PCDS <- within(dataset, {PCDS = part1(post_code)})After playing around in the R shell:
> post_code<-c("A12 3BC","A123BC","A12 3BC","a12 3bc","1234")
> pattern1 <- "([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))"
> match1 <- function(x) {gregexpr(pattern1 , as.character(x), perl=TRUE)}
> part1 <- function(x) {c(regmatches(x, match1(x)))}
> part1(post_code)
[[1]]
[1] "A12"
[[2]]
[1] "A12"
[[3]]
[1] "A12"
[[4]]
[1] "a12"
[[5]]
character(0)
> post_code
[1] "A12 3BC" "A123BC" "A12 3BC" "a12 3bc" "1234" As you can see the data has a very different arrangement between post_code and part1(postcode). So I'm stilthinking down the lines that the vector just needs reorganising.
Problem is, I learned R last week. This didn't work!
output1<-vector(mode="character", length = 0)
for(val in part1){
if(length(val)==0){
output1<=c(part1,"Not UK Postcode")
}
else{
output1<-val
}
}So, if you know anything about R vector manipulationm, please help!
xx
Fixed it! The regex could be more comprehensive and it's very step by step, but this R script converts UK postcodes matching the regex pattern to an 8 character format in the query editor.
# 'dataset' holds the input data for this script
# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("\\s+|\\s+", "", x)
ws <- function(x){
if(x>0){strrep(" ",x)}
else {""}
}
pattern <- "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?)))) ?[0-9][A-Za-z]{2}"
match <- function(x) {gregexpr(pattern , as.character(x), perl=TRUE)}
part <- function(x) {c(regmatches(x, match(x)))}
pCodeList <- part(dataset$post_code)
pCodeVector <- as.character(pCodeList)
for(i in 1:length(pCodeVector))
{
pCodeVector[i] <- trim(pCodeVector[i])
pCodeLength <- nchar(pCodeVector[i])
pCodeRight <- toupper(substr(pCodeVector[i], pCodeLength - 2, pCodeLength))
pCodeLeft <- toupper(substr(pCodeVector[i],1, pCodeLength -3))
pCodeVector[i] <- paste(pCodeLeft, ws((8 - nchar(pCodeLeft)) - nchar(pCodeRight)), pCodeRight, sep='')
}
PCDS <- within(dataset, {PCDS = (pCodeVector)})Development discussion warmly welcomed.
Beth
x
Check out the April 2026 Power BI update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
| User | Count |
|---|---|
| 34 | |
| 32 | |
| 25 | |
| 21 | |
| 16 |
| User | Count |
|---|---|
| 61 | |
| 49 | |
| 28 | |
| 23 | |
| 23 |