The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Context: The below block of code works for the smaller Lakehouse Table = OrderData2021pt1, but when I try with a larger table (3M rows, maybe 20 columns) I get some errors on creating. Is this too large for a Pandas DataFrame?
My Follow Along Source For Testing/Reference 6:34 minute mark:
https://www.youtube.com/watch?v=8Xu1M-ORbK8&list=PLn1m_aBmgsbH5M_v7aZB_4GT9hecrDrEH&index=16
Error #1: Buffer overflow. Available: 0, required: 1506902. To avoid this, increase spark.kryoserializer.buffer.max value.
Error #2: Job was aborted due to user runtime error. This can be be for many reasons, a common cause is: 1. Ensure the files you are loading are of the format. If you're loading data via read.parquet, ensure the format of the data that is being read is indeed parquet. Consider gating wildcard loads with the file type suffix you intend to load to avoid. For example, instead of using a load string like /path/to/my/parquet/files/* Change this to: /path/to/my/parquet/files/*.parquet To avoid loading JSON files that might exist in the directory.
Hi @Anonymous
It would appear to me (I am not Fabric/notebook expert) that it is a limitation and too much data. Could you make the data into a smaller size or do it daily to then load it?