The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi ,
When native execution is enabled , the notebook failes when it tries to read the parquet files stating
The parquet files are not corrupted, the process runs fine when native execution is diabled.
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: No magic bytes found at end of the Parquet file
Retriable: False
DisplayReason: True
Expression: strncmp(copy.data() + readSize - 4, "PAR1", 4) == 0
Context: Split [Hive: abfss://x@x.dfs.core.windows.net/DDF/x/Config/Query_Config/part-00001-tid-5345263482584726652-dce53c07-931c-4dde-9f84-4a61d53d8282-19329-1-c000.snappy.parquet 0 - 22686] Task Gluten_Stage_866_TID_3611_VTID_295
Additional Context: Operator: TableScan[0] 0
Function: loadFileMetaData
File: /__w/1/s/Velox/velox/dwio/parquet/reader/ParquetReader.cpp
Line: 227
Solved! Go to Solution.
Hello @skb16
I did some research
Native execution uses Apache Gluten and Velox for optimized query processing, which requires valid Parquet files with:
1. The `PAR1` magic bytes at the start and end of the file (identifying it as a valid Parquet file).
2. Complete metadata in the footer.
make sure parquet file is Par1 bytes. Try parquet-tools` to check for `PAR1` bytes:
Native execution works best with Fabric’s default lakehouse paths instead of `abfss://`
Try something like
df = spark.read.parquet("/lakehouse/default/Files/....")
HI @skb16
I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!
Best Regards,
Community Support Team _ C Srikanth.
Hi @skb16
I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!
Best Regards,
Community Support Team _ C Srikanth.
Hi @skb16
I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!
Best Regards,
Community Support Team _ C Srikanth.
Hi @skb16 ,
Just checking in – did the steps I shared help resolve the issue?
✅ If it’s working now, feel free to mark the response as the Accepted Solution. This helps others who face the same issue find the fix faster.
✨ And of course, a little Kudos would be much appreciated!
If you're still running into trouble, let me know what you've tried so far and I’ll help you dig deeper. We’ll get it sorted!
Hey there,
Just circling back to see if you were able to get this working or still facing the same issue with reading Parquet files using native execution.
If it's still not working, feel free to share:
Sometimes it's a path issue or a permissions thing, especially if you're working in a Fabric workspace. Happy to help troubleshoot further if needed!
Hi @skb16
Thank you for being part of the Microsoft Fabric Community.
As highlighted by @burakkaragoz @nilendraFabric , the proposed approach appears to effectively address your requirements. Could you please confirm if your issue has been resolved
If you are still facing any challenges, kindly provide further details, and we will be happy to assist you.
If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.
Hi @skb16
Hello!!! The error message you shared is related to the inability to read Parquet files with native execution turned on, and it is quite technical. According to the details of the error, the problem occurs at this point:
“No magic bytes found at the end of the Parquet file”
This indicates that the expected signature “PAR1” was not found at the end of the Parquet file. This signature is used to verify that the file is correct and complete.
Possible Causes and Solutions:
File may have been written incomplete
If the process was interrupted during the file write (e.g. a process was interrupted), the end-of-file “PAR1” signature may be missing.
Native execution is more strict against such omissions. This is why you will get an error in native mode, but not in normal mode.
Parquet File May Have Incompatible Formatting
The file may have been written with a tool that does not fully comply with the Apache Parquet standard. For example, some specialized libraries (such as fastparquet, especially on the Python side) can cause such incompatibilities.
Solution: Try rewriting the file with a more compatible library such as pyarrow.
Native Execution Mode Version Incompatibility
The Fabric or Dremio version you are using may have a known bug related to native execution.
Solution: Try upgrading to the latest version or check for known issues with native execution.
Workaround
If native execution is not mandatory for you, disabling this feature may be the fastest solution.
Suggestion:
Try rewriting the Parquet file. For example:
import pandas as pd
df = pd.read_parquet("your_file.parquet")
df.to_parquet("rewritten_file.parquet", engine="pyarrow")
Hello @skb16
I did some research
Native execution uses Apache Gluten and Velox for optimized query processing, which requires valid Parquet files with:
1. The `PAR1` magic bytes at the start and end of the file (identifying it as a valid Parquet file).
2. Complete metadata in the footer.
make sure parquet file is Par1 bytes. Try parquet-tools` to check for `PAR1` bytes:
Native execution works best with Fabric’s default lakehouse paths instead of `abfss://`
Try something like
df = spark.read.parquet("/lakehouse/default/Files/....")
User | Count |
---|---|
14 | |
9 | |
5 | |
5 | |
3 |
User | Count |
---|---|
44 | |
23 | |
17 | |
16 | |
12 |