Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
sun-and-sea
Frequent Visitor

Filter URL in column using regex & R script

Hi everyone,

 
I'm trying to create a regex expression that only select web URLs with 3 or more non-consecutive forward slash and nothing has worked so far. All these URLs begin with "page.com/language (en or fr)/content_type". Here are examples of the desired results and the unwanted results:
 
Desired result: page.com/language (en or fr)/content_type/category/subcategory/year/content
 
Unwanted results: page.com/language (en or fr)/content_type/category/ | page.com/language (en or fr)/content_type/category/subcategory/ | page.com/language (en or fr)/content_type/category/year
 
Notice the desired results only contain the content pages... the desired results have 3 non-consecutive forward slash after the "page.com/language/content_type" portion...
 
Note: many pages have "page.com/language (en or fr)/content_type/category/year/content" as the URL format. Not all pages have a "/subcategory" component.
 
Any help from you or the community is much appreciated!
4 REPLIES 4
sun-and-sea
Frequent Visitor

Hi @smpa01 , thank you for your advice. I'm good with using Javascript if I can use Google Analytics and SQL as the data sources, since I need to parse URL from multiple web pages (same domain, different sites) that follow the above format I have mentioned above. I did read through your posted articles but find it hard to follow the steps outlined.

sun-and-sea
Frequent Visitor

Thank you @smpa01 ! The format of this URL column in a table imported from Google Analytics. I'm also not familiar with JSON and just need the regex syntax so I can put this into R script.

@sun-and-sea  I meant js(javascript) and not JSON. Since I found out about the native capacity, I have not been using stringR anymore. I am sorry, someone else will help you.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
smpa01
Super User
Super User

@sun-and-sea  js and regex can be natively written in PQ now. 

 

https://community.powerbi.com/t5/Community-Blog/How-to-use-JavaScript-inside-power-query-for-data-ex...

 

https://community.powerbi.com/t5/Community-Blog/Using-JavaScript-in-power-query-for-regex-Part2/ba-p...

 

In this case, you only want the string that has a total forwardslash count=6 which you can write by writing js and combining regex in it

 

 

var x='"&input&"';
var b = (x.match(/\//gm)).length;
document.write(b)

 

 

The full code

 

 

let
   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"';
            var b = (x.match(/\//gm)).length;
           document.write(b);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKkhMT9VLzs/Vz0nMSy8FchQ0UvMU8osU0oo09ZPz80pS80riSyoLUvWTE0tS0/OLKvWLS5Pg7MrUxCKYMqVYHbLMI1cfsjvINQPkfqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Filtered Rows" = Table.SelectRows(Source, each (fx([Column1]) = "6"))
in
    #"Filtered Rows"

 

 

 

 

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.