Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
m_cherriman
Advocate I
Advocate I

Error using data imported through DataFlow when using pyspark

Since Sunday (13-11-2023) I have been getting this error message on data imported through DataFlows.  Previously they have been working.  Has anyone else experienced this issue?

 

SparkRuntimeException: Error while decoding: java.lang.IllegalArgumentException: requirement failed: Mismatched minReaderVersion and readerFeatures. newInstance(class scala.Tuple3).

 

The Data Flows are linking to an on prem sql server database and I'm able to query the table in the sql endpoint, but cannot read the table into a dataframe.  I've tried restricting the data to a couple of lines to see if there is unusual item in any of the queries, but nothing obvious and they previously worked.  To get round this I've had to extract the data into csv files (using a python script in vs code) and then uploading the csvs to Files folders and then importing those files into a new table, which can be read using pyspark.

 

I've had a couple of issues with Data Flows over the last week which makes me wonder how reliable they are.

26 REPLIES 26
souldish
Frequent Visitor

Still seeing issues with this....not connected to gateway and using DFG2 to connect to REST API and copy the data to a table in the lakehouse.

 

Any updates on this issue? it makes DFG2 completely unusable. 

I think i got the issue resolved. I'm not sure what exactly fixed it, but I removed a previous connection to an on prem gateway in my DFG2 even though the current flow wasn't using a gateway at all. Then, I re created the dataflow in a new DFG2. Something about the connection to the gateway was still being written to the files even though I wasn't using the on prem gateway connection. Weird. 

naanii
Frequent Visitor

Hello,

 

I am still experiencing the same issue.

Any update about this case?

EsteraKot
Microsoft Employee
Microsoft Employee

If you are using the gateway, make sure that you are using the latest version. The Dataflows Gen2 has been updated in all regions by 2/23. Let me know if that works. 

Hi naanii, You'll also have to update your power bi gateway. 

Reidy
Regular Visitor

Hi,

 

We are still having this issue. What do you mean by "update your PowerBI gateway"? Any help would be greatly appreciated.

Hey! I'd suggest starting a new thread so we can talk about your particular scenario in more detail. If your dataflow is not using a gateway then the suggestion made by other users in this thread won't apply to your scenario.

 

You can also raise a support ticket so you can engage directly with our support team to troubleshoot the issue.

naanii
Frequent Visitor

Thanks for the solution!

KA78
Advocate I
Advocate I

UPDATE:

 

I also created a ticket for this bug. I just received an email from support telling me it will be fixed this week ! 😀

 

"Would like to inform you that I have received an update from engineering team. The update says it was a dataflow bug whose fix will be done and rolled out to production by the end of this week.

For Dataflows Gen2, the fix should reach all production regions by the end of this week."

 

Thanks @Microsoft for the great support!

 

It would be nice if we could see this kind of major issue in the known issues list in the future : Microsoft Fabric Known Issues

@jcvega I'm just tagging you here so you'll get a notice that you don't have to put effort in the workaround. Should be fixed this week.

KA78
Advocate I
Advocate I

So, I found the root of the bug and also a temporary workaround.

 

The bug is that Dataflows gen2 writes this in the json delta log files :

  • {"protocol":{"minReaderVersion":1,"minWriterVersion":2,"readerFeatures":[],"writerFeatures":[]}}

 That's the reason for this error message : requirement failed: Mismatched minReaderVersion and readerFeatures

 

A temporary fix is to change this string in all the _delta_log json files for the table to:

  • {"protocol":{"minReaderVersion":1,"minWriterVersion":2}}

You can change the log files within the onelake in your files browser:

KA78_0-1701182590563.png

 

jcvega
Frequent Visitor

Thansk for the workarround.. I had the same issue.. How can I edit these files?

You can change the files with the Onelake plugin that you can install in your windows file explorer :https://www.microsoft.com/en-us/download/details.aspx?id=105222

 

Keep in mind that every time you execute the dataflow, you'll have to modify the new log file that has been created for this workaround. Hopefully they'll quickly fix this bug.

I'm still having this issue. 

 

I load data from my on-prem sql server using the `copy data` option, then read this on the Fabric Notebooks. 

 

I frequently get this error:
SparkRuntimeException: Error while decoding: java.lang.IllegalArgumentException: requirement failed: Mismatched minReaderVersion and readerFeatures.

When running a pipeline with the:

copy data (which reads from on-prem and copies to LH Table with overwrite) --> Notebook --> (anything else)

 

This will break when runing the 2nd or 3rd time. Super frustrating, and feels like this shouldn't really be happening at this point.

I just started having the same issue.

Hi,

 

We managed to resolve this issue by changing the spark runtime version. You'll need to restart your notebook after changing the environment.

 

Reidy_0-1714147055235.png

 

jcvega
Frequent Visitor

Thanks very much... I suppose that Microsoft is working on this.. it doesn't make  sense every day doing this process (if the DF runs each day)

KA78
Advocate I
Advocate I

Hi, 

@v-cboorla-msft is there perhaps an update you can give us on this issue?

Also, if anyone should have a workaround, we would aprecieate it greatly. This is a real showstopper now. 

 

Our ERP data resides in an on premise database that we can only access with ODBC. So the only way to get this data into Fabric (when using only fabric - we want to stay with SaaS) is by using Dataflow gen2. And now the tables created by Dataflow gen2, can't be used in a notebook. So we can't use a notebook / pyspark to tranform our data...

 

Does anyone know of a workaround, inside Fabric, so that we can combine ODBC on prem data + notebooks? Any help is greatly appreciated. 

 

@m_cherriman , did you perhaps find a solution?

 

Thanks in advance for any update or insight into this issue.

Hi @KA78 

 

Thanks for using Fabric Community and posting your question.

Can you please create a new post as the initial ask is different from your issue? We will definitely look into the issue and help.

 

Thanks for understanding.

Thanks @v-cboorla-msft for your reply. I'll create a new post for my question regarding the workaround. 

Although, the root of the problem is the same as the initial ask. Is there perhaps something you can share about the status of the inital ask : "Error using data imported through DataFlow when using pyspark". Is this a known issue?


(related to this error : "SparkRuntimeException: Error while decoding: java.lang.IllegalArgumentException: requirement failed: Mismatched minReaderVersion and readerFeatures. newInstance(class scala.Tuple3)." )

Helpful resources

Announcements
Sept Fabric Carousel

Fabric Monthly Update - September 2024

Check out the September 2024 Fabric update to learn about new features.

Expanding the Data Factory Forums

New forum boards available in Data Factory

Ask questions in Apache Airflow Job and Mirroring.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Top Solution Authors