Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all, I am trying to build a word cloud (using ngrams) to read from a free text column using python. I face this error in PBI but it dont seem like the issue is on the script we wrote. Hence, anyone can help us to elaborate what is the issue here?
DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
dataset = pandas.read_csv('input_df_xxxxxxxxx.csv')
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 454, in _read
data = parser.read(nrows)
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
Details:
DataSourceKind=Python
DataSourcePath=Python
Message=Python script error.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
dataset = pandas.read_csv('input_df_XXXXXXXX.csv')
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 454, in _read
data = parser.read(nrows)
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "C:\USERS\XXXXF\APPDATA\LOCAL\CONTINUUM\ANACONDA3\lib\site-packages\pandas\io\parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException
as the error says you are consuming too much memory. Try reducing the data set, or check if your Python code has a runaway recurrency somewhere.
thank you Ibendlin, i have actually reduce the column to 3 columns and row to less than 3000 rows. but i guess its the description column which contains free text that is eating up the memory. Thanks for the sharing!
Power Query can distinguish between quoted line feeds and "regular" csv files. This is especially important when your free text column may contain line feeds. Not sure if your panda importer is equally smart.
So maybe import the csv in Power Query and then run your python script against the imported rows?