Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
Check it out now!Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
TLDR: Using Power Query’s browser, a substring can be extracted now based on a pattern using regex in Web.Script.
In my previous post, I talked about how to use javascript inside Power Query. In this post, I want to talk about one of the practical examples for doing so. Regex which stands for Regular Expression is a very powerful string-searching algorithm technique used in the browser for substring extraction based on a pattern.
Since js can run inside pqwry, it means that regex extraction is natively possible without needing any other external package at all.
A regex extraction can be done as following where it is extracts out all the substring with digits.
let
regex=let fx=(input)=>
Web.Page(
"<script>
var x='"&input&"'; // this is the input string for regex
var b=x.match(/\d{3}/gm); // specify the desired regular expression inside string.match()
//https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
document.write(b);
</script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text]
in
fx,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HYqxCoAwEEN/JXTuoq06+BtupcNpDy1oB6+Kn+8phCS8JAQzbVmgIhz54QSpZy4rpGmd70aIej+YaIMhRT8jJZ3XLOnr3rUWs473kvi/g05G3Rg37ReLifEF", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each regex([Column1]))
in
#"Added Custom"
How to test regex and performance ? Go to https://regex101.com/ and select ECMAScript (JavaScript) as FLAVOR
Regex resources - https://www.regular-expressions.info/tutorial.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
Some of the frequently used patterns are listed below
String | Regex | Alternative Regex | Explanation | Returns what? | Workaround |
This is a mixed string s12345; s345678 | /\d/gm | /[0-9]/gm | Matches all individual digits | 1,2,3,4,5,3,4,5,6,7,8 | |
This is a mixed string s12345; s345678 | /\d+/gm | /[0-9]+/gm | Matches digits | 12345,345678 | |
This is a mixed string s12345; s345678 | /\d+/ | /[0-9]+/ | Matches digit and returns only first match | 12345 | |
This is a mixed string s12345; s345678 | /\d{3}/gm | /[0-9]{3}/gm | Matches 3 consecutive digits | 123,345,678 | |
This is a test string 123 345 | /[a-zA-Z0-9]/gm | Matches a single character in the range between a-z A-Z 0-9 | T,h,i,s,i,s,a,t,e,s,t,s,t,r,i,n,g,1,2,3,3,4,5 | ||
This is a test string 123 345 | /[a-zA-Z]+/gm | Matches a single character in the range between a z unlimited times | This,is,a,test,string | ||
This returns all word 123 345; except non-alphanumeric characters | /\b\w+\b/gm | Matches any alphanumeric characters equivalent to [a-zA-Z0-9_] | This,returns,all,word,123,345,except,non,alphanumeric,characters | ||
This extracts any word that are only 4 characters long a12345 1123 a112 | /\b\d{4}\b/gm | /\b[^a-zA-Z\W]{4}\b/gm | Matches only 4 digit numbers | 1123 | |
This extracts any word that are only 4 characters long a12345 1123 a112 | /\b\w{4}\b/gm | /\b[^\W]{4}\b/gm | Matches word only 4 characters in length | This,word,that,only,long,1123,a112 | |
This extracts any word that are only 4 characters long a12345 1123 a112 | /\b[^\d\W]{4}\b/gm | Matches only 4 character words that are not digit | This,word,that,only,long | ||
Journal entry for tran12345, tran3456, tran, 123456 | /tran[0-9]+/gm | /tran\d+/gm | Matches string that starts with "tran" and unlimited digit | tran12345,tran3456 | |
Power BI consists of power query and DAX | /\bPower\b/gm | Matches the word - "Power", case sensitive | Power | ||
Power BI consists of power query and DAX | /\bPower\b/gmi | Matches the word - "Power", case insensitive | Power,power | ||
Power BI consists of power query and DAX | /\bpower\b/gmi | Matches the word - "Power", case insensitive | Power,power | ||
This is a mixed string s12345; s34567 1234 four 3587 | /\b[^a-zA-Z\W]{4}\b/gm | Matches only 4 digit numbers | 1234,3587 | ||
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ | /\d+(?=\$|€|₹)/gm | Poitive Lookahead - A(?=B)Match A only when A is followed by B Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C - in the example, returns all digits only when followed by $ or € or ₹ - $ symbol is escaped in regex with \$ |
50,72,4300,111 | ||
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ | /\d+(?=\$€₹)/gm | Poitive Lookahead - A(?=B)Match A only when A is followed by B Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C -in the example, returns all digits only when followed by $ and € and ₹ |
111 | ||
The price 30 lt of milk is 110$ | /\d+\b(?!\$)/gm | Negative Looahead- A(?!B)Match A only when A is not followed by B - in the example, returns all digits when not followed by $ |
30 | ||
1 steak costs $30; 2 steaks cost $55 | /(?<=\$)\d+/gm | Positive lookbehind: (?<=B)A Match A when preceded by B - does not work in PQ probably because PQ's browser not compatible with lookbehind |
30,55 | var a = '1 steak costs $30; 2 steaks cost $55'; var b = a.match(/\$\d+/gm); var c = b.toString().replace(/\$/gm,'') PQ Code attached [Code1] |
|
The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789 | /(?<!\$)\b\d+/gm | Negative Lookbehind: (?<!B)A Match A not preceded by B - does not work in PQ PQ probably because PQ's browser not compatible with lookbehind |
3,4,456 | var a = 'The cost of 3 repairing and 4maintenance jobs are repectively $700 and $500 test123 456test789'; var b = a.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm); var c = b.toString().replace(/ /gm,'') PQ Code attached [Code2] |
|
power_bi | /(power)/gm | Capturing Group; group syntax (..) | power | ||
power_bi | /(?:power)/gm | Non-capturing Group; group syntax (..) Capturing and non-capturing groups return the same , but non-capturing groups are utilized in Lookarounds |
power |
//code1
let fx=(input)=>
Web.Page(
"<script>
var x='"&input&"';
var b = x.match(/\$\d+/gm);
var c = b.toString().replace(/\$/gm,'');
document.write(c);
</script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
result = fx("1 steak costs $30; 2 steaks cost $55")
in
result
//code2
let fx=(input)=>
Web.Page(
"<script>
var x='"&input&"';
var b = x.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm);
var c = b.toString().replace(/ /gm,'');
document.write(c);
</script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
result = fx("The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789")
in
result
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.