Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Appreciate for your help in advance.
I built regex based text extract codes to extract the timestamp information from freetext.
# 'dataset' holds the input data for this script
regex = r'(((\d{1})|(\d{2}))/((\d{1})|(\d{2}))/((\d{2})|(\d{4}))\s\d{2}:\d{2}:\d{2}(\s[aAPp][mM])*)'
dataset['Timestamp'] = dataset['description'].str.extract(regex)
The regex is valid, but output with some error messages in below. But the code works with some simple regex without "|" or "*"
DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Timestamp'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1069, in set
loc = self.items.get_loc(item)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Timestamp'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 14, in <module>
dataset['Timestamp'] = dataset['description'].str.extract(regex)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 3472, in __setitem__
self._set_item(key, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 3550, in _set_item
NDFrame._set_item(self, key, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3381, in _set_item
self._data.set(key, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1072, in set
self.insert(len(self.items), item, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1181, in insert
block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\blocks.py", line 3267, in make_block
return klass(values, ndim=ndim, placement=placement)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\blocks.py", line 2775, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\blocks.py", line 128, in __init__
"{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
ValueError: Wrong number of items passed 11, placement implies 1
Details:
DataSourceKind=Python
DataSourcePath=Python
Message=Python script error.
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Timestamp'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1069, in set
loc = self.items.get_loc(item)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\p...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException
Hi @Anonymous ,
I'm not good at Python script, are you sure your python script correct?
By my test with the script below in Query Editor, there is no any error. So I 'm afraid that Python script is working with Regex contains "|" or "*" .
From the blog, we could know how to use "|" and "*" in Python script.
# 'dataset' holds the input data for this script import re pattern = '^a|b..cd*n$' test_string = 'abycdn' result = re.match(pattern, test_string) if result: print("Search successful.") else: print("Search unsuccessful.")
Please check your script again, if that is completely correct and you still have problem in Power BI , please let me know.
Best Regards,
Cherry
Hi @v-piga-msft Cherry,
Thanks for looking into my issue. The Python script is correct, if I use is for a simple regex pattern. Like the below one.
# 'dataset' holds the input data for this script
regex = r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})'
dataset['Timestamp'] = dataset['description'].str.extract(regex)
So, it is only not working when the regex become complex somehow. Does POWER BI offers any native filter option based on the regex, then the Python script is not really necessary.
Hi @Anonymous @v-piga-msft,
I am having the same issue using a Python script with regular expressions in Power BI. Did you find a solution to run the script? Please let me know. You can also reply to my post here. Thank you!
User | Count |
---|---|
9 | |
7 | |
5 | |
5 | |
4 |
User | Count |
---|---|
15 | |
13 | |
8 | |
6 | |
6 |