Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
TLDR: Using Power Query’s browser, a substring can be extracted now based on a pattern using regex in Web.Script.
In my previous post, I talked about how to use javascript inside Power Query. In this post, I want to talk about one of the practical examples for doing so. Regex which stands for Regular Expression is a very powerful string-searching algorithm technique used in the browser for substring extraction based on a pattern.
Since js can run inside pqwry, it means that regex extraction is natively possible without needing any other external package at all.
A regex extraction can be done as following where it is extracts out all the substring with digits.
let
regex=let fx=(input)=>
Web.Page(
"<script>
var x='"&input&"'; // this is the input string for regex
var b=x.match(/\d{3}/gm); // specify the desired regular expression inside string.match()
//https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
document.write(b);
</script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text]
in
fx,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HYqxCoAwEEN/JXTuoq06+BtupcNpDy1oB6+Kn+8phCS8JAQzbVmgIhz54QSpZy4rpGmd70aIej+YaIMhRT8jJZ3XLOnr3rUWs473kvi/g05G3Rg37ReLifEF", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each regex([Column1]))
in
#"Added Custom"
How to test regex and performance ? Go to https://regex101.com/ and select ECMAScript (JavaScript) as FLAVOR
Regex resources - https://www.regular-expressions.info/tutorial.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
Some of the frequently used patterns are listed below
String | Regex | Alternative Regex | Explanation | Returns what? | Workaround |
This is a mixed string s12345; s345678 | /\d/gm | /[0-9]/gm | Matches all individual digits | 1,2,3,4,5,3,4,5,6,7,8 | |
This is a mixed string s12345; s345678 | /\d+/gm | /[0-9]+/gm | Matches digits | 12345,345678 | |
This is a mixed string s12345; s345678 | /\d+/ | /[0-9]+/ | Matches digit and returns only first match | 12345 | |
This is a mixed string s12345; s345678 | /\d{3}/gm | /[0-9]{3}/gm | Matches 3 consecutive digits | 123,345,678 | |
This is a test string 123 345 | /[a-zA-Z0-9]/gm | Matches a single character in the range between a-z A-Z 0-9 | T,h,i,s,i,s,a,t,e,s,t,s,t,r,i,n,g,1,2,3,3,4,5 | ||
This is a test string 123 345 | /[a-zA-Z]+/gm | Matches a single character in the range between a z unlimited times | This,is,a,test,string | ||
This returns all word 123 345; except non-alphanumeric characters | /\b\w+\b/gm | Matches any alphanumeric characters equivalent to [a-zA-Z0-9_] | This,returns,all,word,123,345,except,non,alphanumeric,characters | ||
This extracts any word that are only 4 characters long a12345 1123 a112 | /\b\d{4}\b/gm | /\b[^a-zA-Z\W]{4}\b/gm | Matches only 4 digit numbers | 1123 | |
This extracts any word that are only 4 characters long a12345 1123 a112 | /\b\w{4}\b/gm | /\b[^\W]{4}\b/gm | Matches word only 4 characters in length | This,word,that,only,long,1123,a112 | |
This extracts any word that are only 4 characters long a12345 1123 a112 | /\b[^\d\W]{4}\b/gm | Matches only 4 character words that are not digit | This,word,that,only,long | ||
Journal entry for tran12345, tran3456, tran, 123456 | /tran[0-9]+/gm | /tran\d+/gm | Matches string that starts with "tran" and unlimited digit | tran12345,tran3456 | |
Power BI consists of power query and DAX | /\bPower\b/gm | Matches the word - "Power", case sensitive | Power | ||
Power BI consists of power query and DAX | /\bPower\b/gmi | Matches the word - "Power", case insensitive | Power,power | ||
Power BI consists of power query and DAX | /\bpower\b/gmi | Matches the word - "Power", case insensitive | Power,power | ||
This is a mixed string s12345; s34567 1234 four 3587 | /\b[^a-zA-Z\W]{4}\b/gm | Matches only 4 digit numbers | 1234,3587 | ||
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ | /\d+(?=\$|€|₹)/gm | Poitive Lookahead - A(?=B)Match A only when A is followed by B Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C - in the example, returns all digits only when followed by $ or € or ₹ - $ symbol is escaped in regex with \$ |
50,72,4300,111 | ||
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ | /\d+(?=\$€₹)/gm | Poitive Lookahead - A(?=B)Match A only when A is followed by B Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C -in the example, returns all digits only when followed by $ and € and ₹ |
111 | ||
The price 30 lt of milk is 110$ | /\d+\b(?!\$)/gm | Negative Looahead- A(?!B)Match A only when A is not followed by B - in the example, returns all digits when not followed by $ |
30 | ||
1 steak costs $30; 2 steaks cost $55 | /(?<=\$)\d+/gm | Positive lookbehind: (?<=B)A Match A when preceded by B - does not work in PQ probably because PQ's browser not compatible with lookbehind |
30,55 | var a = '1 steak costs $30; 2 steaks cost $55'; var b = a.match(/\$\d+/gm); var c = b.toString().replace(/\$/gm,'') PQ Code attached [Code1] |
|
The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789 | /(?<!\$)\b\d+/gm | Negative Lookbehind: (?<!B)A Match A not preceded by B - does not work in PQ PQ probably because PQ's browser not compatible with lookbehind |
3,4,456 | var a = 'The cost of 3 repairing and 4maintenance jobs are repectively $700 and $500 test123 456test789'; var b = a.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm); var c = b.toString().replace(/ /gm,'') PQ Code attached [Code2] |
|
power_bi | /(power)/gm | Capturing Group; group syntax (..) | power | ||
power_bi | /(?:power)/gm | Non-capturing Group; group syntax (..) Capturing and non-capturing groups return the same , but non-capturing groups are utilized in Lookarounds |
power |
//code1
let fx=(input)=>
Web.Page(
"<script>
var x='"&input&"';
var b = x.match(/\$\d+/gm);
var c = b.toString().replace(/\$/gm,'');
document.write(c);
</script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
result = fx("1 steak costs $30; 2 steaks cost $55")
in
result
//code2
let fx=(input)=>
Web.Page(
"<script>
var x='"&input&"';
var b = x.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm);
var c = b.toString().replace(/ /gm,'');
document.write(c);
</script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
result = fx("The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789")
in
result
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.