Fabric Notebooks - Error Trying To Create Pandas D...

Anonymous · ‎10-30-2023

Context: The below block of code works for the smaller Lakehouse Table = OrderData2021pt1, but when I try with a larger table (3M rows, maybe 20 columns) I get some errors on creating. Is this too large for a Pandas DataFrame?

My Follow Along Source For Testing/Reference 6:34 minute mark:

https://www.youtube.com/watch?v=8Xu1M-ORbK8&list=PLn1m_aBmgsbH5M_v7aZB_4GT9hecrDrEH&index=16

Error #1: Buffer overflow. Available: 0, required: 1506902. To avoid this, increase spark.kryoserializer.buffer.max value.

Error #2: Job was aborted due to user runtime error. This can be be for many reasons, a common cause is: 1. Ensure the files you are loading are of the format. If you're loading data via read.parquet, ensure the format of the data that is being read is indeed parquet. Consider gating wildcard loads with the file type suffix you intend to load to avoid. For example, instead of using a load string like /path/to/my/parquet/files/* Change this to: /path/to/my/parquet/files/*.parquet To avoid loading JSON files that might exist in the directory.

GilbertQ · ‎10-30-2023

Hi @Anonymous

It would appear to me (I am not Fabric/notebook expert) that it is a limitation and too much data. Could you make the data into a smaller size or do it daily to then load it?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Power BI Blog