Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

smpa01

Using JavaScript in power query for regex- Part2

TLDR: Using Power Query’s browser, a substring can be extracted now based on a pattern using regex in Web.Script.

 

In my previous post, I talked about how to use javascript inside Power Query. In this post, I want to talk about one of the practical examples for doing so. Regex which stands for Regular Expression is a very powerful string-searching algorithm technique used in the browser for substring extraction based on a pattern.

Since js can run inside pqwry, it means that regex extraction is natively possible without needing any other external package at all.

 

A regex extraction can be done as following where it is extracts out all the substring with digits.

 

 

 

 

let
   regex=let   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"'; // this is the input string for regex
            var b=x.match(/\d{3}/gm); // specify the desired regular expression inside string.match()
                                    //https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match 
            document.write(b);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text]

in
fx,
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HYqxCoAwEEN/JXTuoq06+BtupcNpDy1oB6+Kn+8phCS8JAQzbVmgIhz54QSpZy4rpGmd70aIej+YaIMhRT8jJZ3XLOnr3rUWs473kvi/g05G3Rg37ReLifEF", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each regex([Column1]))
in
    #"Added Custom" 

 

 

 

 

 

smpa01_0-1634141754734.png

 

 

How to test regex and performance ?  Go to  https://regex101.com/ and select ECMAScript (JavaScript) as FLAVOR

 

Regex resources - https://www.regular-expressions.info/tutorial.html

                             https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-refere...

                            https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

 

 

 

 Some of the frequently used patterns are listed below

 

String Regex Alternative Regex Explanation Returns what? Workaround
This is a mixed string s12345; s345678 /\d/gm /[0-9]/gm Matches all individual digits 1,2,3,4,5,3,4,5,6,7,8  
This is a mixed string s12345; s345678 /\d+/gm /[0-9]+/gm Matches digits 12345,345678  
This is a mixed string s12345; s345678 /\d+/ /[0-9]+/ Matches digit and returns only first match 12345  
This is a mixed string s12345; s345678 /\d{3}/gm /[0-9]{3}/gm Matches 3 consecutive digits 123,345,678  
This is a test string 123 345 /[a-zA-Z0-9]/gm   Matches a single character in the range between a-z A-Z 0-9 T,h,i,s,i,s,a,t,e,s,t,s,t,r,i,n,g,1,2,3,3,4,5  
This is a test string 123 345 /[a-zA-Z]+/gm   Matches a single character in the range between a z unlimited times This,is,a,test,string  
This returns all word 123 345; except non-alphanumeric characters /\b\w+\b/gm   Matches any alphanumeric characters equivalent to [a-zA-Z0-9_] This,returns,all,word,123,345,except,non,alphanumeric,characters  
This extracts any word that are only 4 characters long a12345 1123 a112 /\b\d{4}\b/gm /\b[^a-zA-Z\W]{4}\b/gm Matches only 4 digit numbers 1123  
This extracts any word that are only 4 characters long a12345 1123 a112 /\b\w{4}\b/gm /\b[^\W]{4}\b/gm Matches word only 4 characters in length This,word,that,only,long,1123,a112  
This extracts any word that are only 4 characters long a12345 1123 a112 /\b[^\d\W]{4}\b/gm   Matches only 4 character words that are not digit This,word,that,only,long  
Journal entry for tran12345, tran3456, tran, 123456 /tran[0-9]+/gm /tran\d+/gm Matches string that starts with "tran" and unlimited digit tran12345,tran3456  
Power BI consists of power query and DAX /\bPower\b/gm   Matches the word - "Power", case sensitive Power  
Power BI consists of power query and DAX /\bPower\b/gmi   Matches the word - "Power", case insensitive Power,power  
Power BI consists of power query and DAX /\bpower\b/gmi   Matches the word - "Power", case insensitive Power,power  
This is a mixed string s12345; s34567 1234 four 3587 /\b[^a-zA-Z\W]{4}\b/gm   Matches only 4 digit numbers 1234,3587  
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ /\d+(?=\$|€|₹)/gm   Poitive Lookahead - A(?=B)Match A only when A is followed by B
Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C
Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C
- in the example, returns all digits only when followed by $ or € or ₹
- $ symbol is escaped in regex with \$
50,72,4300,111  
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ /\d+(?=\$€₹)/gm   Poitive Lookahead - A(?=B)Match A only when A is followed by B
Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C
Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C
 -in the example, returns all digits only when followed by $ and € and ₹
111  
The price 30 lt of milk is 110$ /\d+\b(?!\$)/gm   Negative Looahead- A(?!B)Match A only when A is not followed by B
- in the example, returns all digits when not followed by $
30  
1 steak costs $30; 2 steaks cost $55 /(?<=\$)\d+/gm   Positive lookbehind: (?<=B)A Match A when preceded by B
- does not work in PQ probably because PQ's browser not compatible with lookbehind
30,55 var a = '1 steak costs $30; 2 steaks cost $55';
var b = a.match(/\$\d+/gm);
var c = b.toString().replace(/\$/gm,'')

PQ Code attached [Code1]
The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789 /(?<!\$)\b\d+/gm   Negative Lookbehind: (?<!B)A Match A not preceded by B
- does not work in PQ PQ probably because PQ's browser not compatible with lookbehind
3,4,456 var a = 'The cost of 3 repairing and 4maintenance jobs are repectively $700 and $500 test123 456test789';
var b = a.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm);
var c = b.toString().replace(/ /gm,'')

PQ Code attached [Code2]
power_bi /(power)/gm   Capturing Group; group syntax (..) power  
power_bi /(?:power)/gm   Non-capturing Group; group syntax (..)
Capturing and non-capturing groups return the same , but non-capturing groups are utilized in Lookarounds
power  

 

 

 

//code1
let   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"';
            var b = x.match(/\$\d+/gm);
            var c = b.toString().replace(/\$/gm,'');
            document.write(c);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
        result = fx("1 steak costs $30; 2 steaks cost $55")

in
result
//code2
let   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"';
            var b = x.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm);
            var c = b.toString().replace(/ /gm,'');
            document.write(c);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
        result = fx("The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789")

in
result

 

 

 

 

Comments