Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
j_hoolachan
Regular Visitor

Direct Lake Mode error: "Class: 'ParquetException' Status: 'Unexpected end of stream'"

Hello,

 

I have created a star schema using the TPCH10 benchmark dataset in two separate Power BI datasets: one uses import mode and one uses direct lake mode. The model is below. 

 

star_schema.png

 

Fact_orders has ~60M records and Fact_orders[l_extendedprice] is a decimal column. Two separate thin reports have been created in PBI desktop, one connecting to each model in PBI service. When Fact_orders[l_extendedprice] is summed via a card visual in a report based on import mode dataset, the visual renders successfully. When the same visual is created in a report based on the the direct lake dataset, the visual fails to render with the following message:

 

"Unexpected parquet exception occurred. Class: 'ParquetException' Status: 'Unexpected end of stream'"

 

The same error appears in other scenarios when using the direct lake mode dataset. For example, if a card visual used to count the rows in the fact_orders table, the visual successfully renders the correct value (~60M). However, if a filter is added to the report using dim_customer[c_mktsegment], the visual fails to render with the same error. Below is a screenshot of the error when the filter query is evaluated in TE3.

 

j_hoolachan_0-1685980244544.png

 

Please let me know if you need additional information.

 

Thanks!

 

1 ACCEPTED SOLUTION

This issue has been resolved. My dim and fact tables had skewed data which was causing the issue (somehow). Running OPTIMIZE on each of the tables and then refreshing my PBI dataset allowed the measures to calculate successfully.

 

One thing that is unclear to me: I explicitly enabled OptimizeWrite (and it's enabled by default https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparks...) so why were explicit OPTIMIZE commands required as well? I assumed OptimizeWrite would result in...optimized writes...

View solution in original post

12 REPLIES 12
AkshaiM
Microsoft Employee
Microsoft Employee

There is a known bug for this "Unexpected end of stream" error - it can happen in some relatively uncommon Parquet column layouts. A fix for this should be rolling out around next week (hopefully by Sep 8th). Please give it a try after that...

Thanks for the info. I was able to resolve the issue today by removing two high cardinality comment columns. Neither are involved in the query that fails, but their removal solved the issue. They are both strings. 


Do you think they meet the criteria that the bug will fix? 

I wouldn't have expected string columns to cause this - I think it was related to specific encodings. But it may be there there is a consequence to other columns that results from removing these two columns.

 

Note that this issue during an analysis phase, so it wouldn't matter which columns the query would access.

Here is the notebook code if you want to try to reproduce...the data is publically available. Let me know if there's a better way to share the notebook; I just copy/pasted each cell into the file. The bottom of the file also has some info about custom PBI dataset model.

 

TPCH Notebook/Direct Lake Code - Pastebin.com

This issue has been resolved. My dim and fact tables had skewed data which was causing the issue (somehow). Running OPTIMIZE on each of the tables and then refreshing my PBI dataset allowed the measures to calculate successfully.

 

One thing that is unclear to me: I explicitly enabled OptimizeWrite (and it's enabled by default https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparks...) so why were explicit OPTIMIZE commands required as well? I assumed OptimizeWrite would result in...optimized writes...

mtna-1990
New Member

I fixed the issue by changing the databricks runtime version to 11.3 TLS, with spark version 3.3.0, which is the same that fabric uses. After that I have deleted the folder with the delta table in the storage account. Finally, I recreated the delta table and the issue was resolved. Hopefully MS comes with a better solution.

fabricos
Regular Visitor

Got this error as well when tesitng out fabric's new direct lake pbi connection mode. 

 

I load a table form azure data lake gen 2 with shortcut to a lakehouse in microsoft fabric. I then create a dataset based on this lakehouse. The table I get trouble with is the largest in my model and has 75 000 000 rows. 

 

 

j_hoolachan
Regular Visitor

No update from my end. The docs do mention that some column types are not supported but I haven't found further detail. 

kydang
New Member

Hello,

Do you have any update on this topic, I'm having the same issue with directlake.
Thanks

 

Tomiasp
New Member

We are having the same issue. I've created a table in Databricks and tried to bring it in via OneLake to Fabric, but getting this same error. I can see the table just fine when just looking at it from the Lakehouse view but as soon I as I try to build a report on it, this...

Same situation for me. I created a delta table in Databricks, and while the SQL endpoint can read the data properly, pulling it with DirectLake/Power BI gives this exception. It appears to be related to specific data types--either floats/decimals or timestamps.

rcufley-lrdist
Frequent Visitor

I was wondering if you were able to resolve this, j_hoolachan.  I am having the same issue and reported error for visuals using decimal columns in some tables.  It is occuring on a direct lake mode query against the default dataset for a lakehouse.

 

Thanks

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugPowerBI_Carousel

Power BI Monthly Update - August 2024

Check out the August 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.