Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Anonymous
Not applicable

REGEX in Power Query using Python Transformation

I am struggling to get a  REGEX expression working inside Power Query using the "Run Python Script" transformation.

I  do have a simple REGEX  tranformation that uses the Python REGEX "findall()" method  working as expected  in the attached pbix file (link here)

Working example using findall.

 

 

dataset['Object_Name'] = dataset['M_Code'].str.findall('(?:Name|Schema|Item)(?: = \")(\w+_*)(?:\")')

 

 

I am, however strugging to get a more advanced REGEX, that uses "finditer" method working.  The REGEX works fine inside a Jupyter Notebook (link to Google Colab here)

REGEX working inside Notebook

 

 

text = """section Section1;

shared #"EDWP_D2PM CUSTOMER_DIM_CURR" = let
    Source = DB2.Database("6170.2671.88280.372", "bcudb", [Implementation = "Microsoft"]),
    Navigation = Source{[Schema = "TREDWP_DP2PM", Item = "CUST_DIM_CURR"]}[Data],
 
shared #"PPDD2PM CUSTOMER_CURR" = let
    Source = DB2.Database("45561.7550.26571.88280.372", "bcudb", [Implementation = "Microsoft"]),
    Navigation = Source{[Schema = "TREDWP_D2PM", Item = "CUSTOMER_DIM"]}[Data],
"""


import re
import pandas as pd
r = re.compile('(?i)((?:Schema\s*=\s*\")(?P<Schema_Item>\w+_*).*(?:Item\s*=\s*\")(?P<Table_Item>\w+_*)|(?:Schema\s*=\s*\")(?P<Schema_Table>\w+_*).*(?:Name\s*=\s*\")(?P<Tables_Name>\w+_*))')
matches =[m.groupdict() for m in r.finditer(text)]
print(matches)

 

 

when I run this REGEX  inside the Power Query  Python Transformation I get the following error 

 

 

DataSource.Error: ADO.NET: Python script error.
<pi>TypeError: expected string or bytes-like object
</pi>
Details:
    DataSourceKind=Python
    DataSourcePath=Python
    Message=Python script error.
<pi>TypeError: expected string or bytes-like object
</pi>
    ErrorCode=-2147467259

 

 

This identifes that I have a type error, and Power Query is expecting a string format. The working REGEX does have "str.findall" in the code, but I am struggling to identify where to "chain/add" the string conversion into the non-working REGEX to make it work.

Any suggestions on a fix would be much appreciated.

4 REPLIES 4
Anonymous
Not applicable

@Anonymous Thanks for your help with this one. But I think rather than trying to figure out how to do this REGEX inside of Power BI, I am better off doing the REGEX in a database, and then having Power BI connect to the database.  Cheers Steve.

Let me know what you want me to do with this question. Shall  I leave it open ? 

 

Anonymous
Not applicable

Hi @Anonymous ,

Could you please provide the applied field on Phython visual with screenshot and completed Phython codes? Please review the following thread and check whether it can also resolve your problem.

TypeError: expected string or bytes-like object pandas variable

In addition, you can refer the content in the following links to create word cloud by using phython visual.

Simple word cloud in Python

Generating WordClouds in Python

Solved: Python script error in Power Query Editor PythonUn... - Microsoft Power BI Community

 

Best Regards

Community Support Team _ Polly

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Anonymous
Not applicable

@Anonymous I am not using a Python visual, only the Python Transformation inside Power Query. I have read the attached links. They don't appear to provides any obvious answers. The  REGEX  that works in the Notebook, but not in the Python transformation in Power Query is linked above (both in the Notebook & PBIX file). Let me know if you need additional information

Anonymous
Not applicable

Hi @Anonymous ,

It is so strange. Could you please provide a screenshot with the REGEX  that works in the Notebook?  Could you please try to change the data type before regex?

 

for says in says:
say = re.search('<p>(,*?)</p>',say)

 

Change the above to the following:

 

for says in says:
say=str(say)
say=re.search('<p>(.*?)</p>',say)

 

 

What is the rule in the compile? Is there only one parameter in finditer? What is the version of Power BI Desktop? Please try to use the newest version.

 

Python Regex findall() Function By Practical Examples (pythontutorial.net)

Using Python in Power BI Power Query Editor - Power BI | Microsoft Learn

 

 

Best Regards

Community Support Team _ Polly

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.