Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hello all,
I'm pretty new to Power BI and I'm facing some difficulties in running python script.
In fact, I'm getting an error in Power Query when running python script, for which I couldn't find a solution that works.
I have already worked with the latest version of python (3.9), but at this moment I'm working with version 3.6 because I suspect it may be more compatible with Power BI. Moreover, the packages used in the code have also been installed and imported.
My code:
import pandas as pd
pd.crosstab(index=dataset[‘clientID’], columns=dataset[‘productID’])
The error:
DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 15, in <module>
pd.crosstab(index=dataset['clientID'], columns=dataset['productID'])
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 577, in crosstab
**kwargs
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6089, in pivot_table
observed=observed,
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 127, in pivot_table
table = agged.unstack(to_unstack)
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6393, in unstack
return unstack(self, level, fill_value)
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 412, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 442, in _unstack_frame
constructor=obj._constructor,
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 142, in __init__
self._make_selectors()
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 177, in _make_selectors
mask.put(selector, True)
IndexError: index 1323169739 is out of bounds for axis 0 with size 1322985896
Details:
DataSourceKind=Python
DataSourcePath=Python
Message=Python script error.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 15, in <module>
pd.crosstab(index=dataset['clientID'], columns=dataset['productID'])
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 577, in crosstab
**kwargs
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6089, in pivot_table
observed=observed,
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 127, in pivot_table
table = agged.unstack(to_unstack)
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6393, in unstack
return unstack(self, level, fill_value)
File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 412, in un...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException
I would appreciate if someone could help me.
Thank you for your answer.
If pandas crosstab has a limit, I cannot find it...
Both columns have about 1 263 000 rows. ClienteID has almost 40 000 unique values and ProductID has more than 460 000 unique values.
So you should expect a crosstab with 18.4 billion cells. A bit rich.
I have already tried with a sample of the dataset and I think it works. However, when running python script, a decimal place is added to "ProductID". For example, before I had "12345" and now "12345.0". I tried to round it in python but it says it is a string, so I cannot understand where it came from. I think it happens before applying crosstab function since in the input dataset (created after running the script), this change is already there.
Also, when applying the power query changes it returns the error: Fail to save modifications on the server. Error returned: 'The SUM function only accepts column reference as the argument number 1. The '215965.0' column does not exist in the rowset. ' .
What is the cardinality of your clientID and ProductID columns? Looks like you are hitting a pandas crosstab limit. Might want to try with a smaller dataset.