Solved: Re: Error ingesting json files in warehouse table

smeetsh · ‎09-10-2025

Hi All,

We have a pipeline that uses an API to get data from one of our suppliers. It will create a number of json files, which we then ingest into a warehouse table so we can ETL, join, upsert etc ..all the fun stuff 🙂

As of this morning 5.30 am new zealand time, we are getting the below error. We have not made any changes and the array it is pointing at (or seems to be pointing at) has had NULL there in the past, for as far as i can check.

ErrorCode=UserErrorWriteFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The file operation is failed, upload file failed at path: '9e9fce10-9b68-486f-8d48-b77f907bba71/_system/services/DI/pipelines/a69bad01-bb02-46a4-8b26-f369e5bfe237/MSSQLImportCommand'.,Source=mscorlib,''Type=System.InvalidOperationException,Message=Not able to get enumerator for non-array. Path: databody.daysQuality.testResults,Source=Microsoft.DataTransfer.Common,'

we are not sure what is going on and how we can fix this, is there a Fabric issue perhaps? Wer have found one setting in the activity that ingests the json files into the warehouse table, "skip incompatible rows", I am hesitant to tick that, since i dont know what the overall effect is and the json consists of multiple (nested at times) arrays

Below a sample of part of the json, with potentially the cause of the error, and one that does not have the NULL, but as i said it has never been a problem in the past, and this has been stable for about 1,5 years now and rund 3 times a day, 7 days a week.
With NULL:

Without NULL

We are stumped, any help would be greatly appreciated.
Cheers

Hans

smeetsh

A final update for now at least: Microsoft has confirmed this is a bug, they hope to fix in the 1st quarter of 2026.

In the mean time they suggest to ingest it as a string and than use a notebook to parse the json array back into its original columns, or change the source, which for us wasn't an option.

Since it took ages for them to got to this stage I had already figured out I could use a notebook to read the json files in the lakehouse, read them into a dataframe, and when all files were read, write that dataframe to a table, in my case with an option "overwrite" since this is just a raw staging table.

Final kicker: They want me to give permission to archive the case (no doubt because of their KPI's), but the bug is not registered on MS's known issue list. I want them to document it there, but they for some reason don't want to do that, they want to keep it internal, so now my Dutch stubberness is playing up .

The python code was quite simple in the end, even though we had multiple nested arrays. I used co-pilot to do the heavy lifting for me since i had to alias about 30 columns since i had to put the data in an existing table and the names needed to match

These are the libraries I needed:

Folder path to read json files

Exploding the main body of the json (just part of it, as example)

Thats it in a nutshell

View solution in original post

smeetsh

A final update for now at least: Microsoft has confirmed this is a bug, they hope to fix in the 1st quarter of 2026.

In the mean time they suggest to ingest it as a string and than use a notebook to parse the json array back into its original columns, or change the source, which for us wasn't an option.

Since it took ages for them to got to this stage I had already figured out I could use a notebook to read the json files in the lakehouse, read them into a dataframe, and when all files were read, write that dataframe to a table, in my case with an option "overwrite" since this is just a raw staging table.

Final kicker: They want me to give permission to archive the case (no doubt because of their KPI's), but the bug is not registered on MS's known issue list. I want them to document it there, but they for some reason don't want to do that, they want to keep it internal, so now my Dutch stubberness is playing up .

The python code was quite simple in the end, even though we had multiple nested arrays. I used co-pilot to do the heavy lifting for me since i had to alias about 30 columns since i had to put the data in an existing table and the names needed to match

These are the libraries I needed:

Folder path to read json files

Exploding the main body of the json (just part of it, as example)

Thats it in a nutshell

smeetsh · ‎11-17-2025

Good morning All, sadly another update that isn't an update. I tried some suggestion of MS last week, which didn't work. They wanted me to manually edit the json of the pipeling to make the column type "Array", which is an option that is not in the GUI, but apparently you can edit the json code. The error was still the same.

Does anyone have an escalation email address of someking, becuause (just like with the lakehouse sql endpoint delay) I am getting the feeling this is not going to get fixed anytime soon. For the sqlendpoint delay user now have to fix that themselfs.

Cheers

Hans

v-kpoloju-msft · ‎11-19-2025

Hi @smeetsh,

Thank you for sharing the latest update. I understand how discouraging it can be when suggested workarounds such as manually modifying the pipeline JSON to set the column type to Array do not resolve the issue.

Given the repeated behavior and lack of resolution, this may require direct review from the engineering team. While there is no public escalation email, opening a support ticket through Fabric → Help & Support → Create a Support Request is the official escalation path. Once the case is created, you can request escalation so it reaches the appropriate product group.

Thank you.

smeetsh · ‎11-20-2025

I already created a ticket ages ago, and regulalry get "We are still looking into it" and sometimes something to try.

smeetsh · ‎10-19-2025

Hi All,

Just a further update , that is not really an update. Due to the amount of time it is taking something that wasn't urgen, now is becomming urgent and I may have no option than to tackle one of our biggest and most datasets and add in some form or python notebook, that can take a json array, that now needs to be loaded in a single column and flatten it into its respective columns like it just to be.

MS assured me Friday, after i pointed this out and asked for an escalation, it is already escalated to the "Product Group Team" . However I wont have time to wait months for them to figure this out.

This change will be massive, since the dataset goes back over 10 years and data can not be retrieved again from the source if i get this wrong . So it means backup plans, change requests, risk etc.😥

v-kpoloju-msft · ‎10-21-2025

Hi @smeetsh,

Thank you for the update. I understand the urgency and impact this has on your workload. Since the issue has already been escalated to the Product Group, I’d recommend continuing to monitor the escalation progress.

smeetsh · ‎10-14-2025

Another small update, but sadly not with a resolution yet. Yesterday I got another email from them, asking me for some more patience. A number of microft internal teams is working on it and the are finalizing the next steps.

It realy feels like a change was made with unintended consequences, well I hope at least, or else I will have my work cut out for me 😁

v-kpoloju-msft · ‎10-17-2025

Hi @smeetsh,

Thank you so much for the update and for your continued patience. It is good to hear that the internal teams are actively investigating and finalizing the next steps.

It does sound like there may have been a recent change that had some unintended side effects, and I really appreciate you keeping us posted while this is being worked on.

Thanks again for your patience and collaboration while the engineers work toward a fix.

smeetsh · ‎10-07-2025

Good morning All,

A small update, which is not really and update, but MS came back indicating they need more time and that the issue encountered seems to be more complex than expected.

@v-kpoloju-msft We are already writing the data to a raw json file, this allowed us to identify the issue quick.

I will be looking into python as a notebook, as soon as i get a chance.

It is the copy activity to the lakehouse that is actaully throwing the error, preventing us from loading like we used to (this pipeline has worked fine for a least a year). The only thing we were able to do was change the 1st copy step to flatten the whole json array into one single column and next use a notebook.

Cheers

Hans

smeetsh · ‎09-30-2025

Hi All,

A small update, The data engineering team is now trying to schedule a teams meeting with me, so it is still in the hands of MicroSoft.

I think, in de mean time, I have a working notebook, but the bussines analyst still needs to verify the data.

If at all possible, I want the old solution to start working again, the notebook way seems very convoluted and the fact Fabric was great in dealing with jsons in the past, using a notebook feels as a step back (Granted me not being very versed in SparkSQL doesn't help either, I had a lot of help from AI to get it to work, and it always bugs me if I dont fully understand what I have written...SparkSQL turns out to be quite different from SQL, once you get into more complex fucntions)

Some snippets of the code below, especially having to define each datatype for each key is a lot of work which is prone to error, and as i said we have several nested arrays getting returned from the API. I have removed quite a few lines of code from the sample below. Maybe the python route is easier, I don't know?

==============

%%sql

INSERT INTO lh_raw.supplier_statement_raw (

pdcAmount,

pdcTax,

pdcTotal,

piSessionStatus,

psHaulDesc,

psSessionError,

`xrAllPayDate.__type`,

`xrAllPayDate.pdDate`,

`xrAllPayDate.pdcAmount`,

`xrAllPayDate.pdcTax`,

`xrAllPayDate.pdcTotal`,

`xrAllPayDate.xrAllPayDateMade`,

`xrAllPayDate.xrAllPayDateReceived.__type`,

`xrAllPayDate.xrAllPayDateReceived.crAudParamCurrencyConvert`,

`xrAllPayDate.xrAllPayDateReceived.crCreditor`,

`xrAllPayDate.xrAllPayDateReceived.crDebtor`,

`xrAllPayDate.xrAllPayDateReceived.crDocQueueFiDist`,

`xrAllPayDate.xrAllPayDateReceived.crFormula`,

)

SELECT

src.pdcAmount,

src.pdcTax,

src.pdcTotal,

src.piSessionStatus,

src.psSessionError,

col.__type,

col.pdDate,

col.pdcAmount,

col.pdcTax,

col.pdcTotal,

receivedPaySummary.__type AS `xrAllPayDate.xrAllPayDateReceived.__type`,

receivedPaySummary.crAudParamCurrencyConvert AS `xrAllPayDate.xrAllPayDateReceived.crAudParamCurrencyConvert`,

receivedPaySummary.crCreditor AS `xrAllPayDate.xrAllPayDateReceived.crCreditor`,

receivedPaySummary.crDebtor AS `xrAllPayDate.xrAllPayDateReceived.crDebtor`,

receivedPaySummary.crDocQueueFiDist as `xrAllPayDate.xrAllPayDateReceived.crDocQueueFiDist`,

FROM lh_raw.supplier_statement_stage_raw src

LATERAL VIEW explode(from_json(src.json_text_raw, 'array<struct<

__type:string,

pdDate:string,

pdcAmount:double,

pdcTax:double,

pdcTotal:double,

xrAllPayDateMade:array<struct<

__type:string,

pdcAmount:double,

pdcTax:double,

piMoneyType:int,

piStatus:int,

psNote:string,

piFormula:int,

piSeqNum:int,

psAgreement:string,

psCreditor:string,

psDebtor:string,

psDetail:string,

psFarm:string,

psPayClass:string,

psText:string,

psType:string

>>,

xrAllPayDateReceived:array<struct<

__type:string,

crAudParamCurrencyConvert:string,

crCreditor:string,

crDebtor:string,

>>

>>')) payDateWeb AS col

LATERAL VIEW OUTER explode(col.xrAllPayDateMade) madePaySummary AS madePaySummary

LATERAL VIEW OUTER explode(col.xrAllPayDateReceived) receivedPaySummary AS receivedPaySummary

===========

v-kpoloju-msft · ‎10-01-2025

Hi @smeetsh,

Thanks a lot for the update. It’s good to hear you have a notebook running, even though it feels a bit complicated right now. I understand what you mean about SparkSQL being tricky compared to the way Fabric used to handle JSON, it can feel like extra work having to define all those data types.

It’s great that your data engineering team is already in touch with Microsoft. Hopefully, that will help bring back the old solution, since that was much simpler for your use case. In the meantime, keeping the notebook as a backup makes sense.

If you want to explore alternatives, using Python in a notebook might make working with nested JSON a bit easier than SparkSQL, since you don’t always need to write out the full schema by hand. But ideally, once the engineering team and Microsoft confirm things, you may not need this workaround long term.

Thank you for using the Microsoft Fabric Community Forum.

v-kpoloju-msft · ‎10-02-2025

Hi @smeetsh,

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.

smeetsh · ‎10-05-2025

Good morning, No it has not been resolved and to be honest my last session with MS was downright painfull. The person from the MS data engineering team had a lot of troubles understanding how an API call wrote to a json file in the lakehouse (which is about as standard as it gets)

MS went back to do more homework.

v-kpoloju-msft · ‎10-06-2025

Hi @smeetsh,
Thank you for sharing this update.

It is good that the case has been escalated back to the engineering team for deeper review. Since they’re looking into the behaviour of how the API writes JSON into the Lakehouse, we will hopefully get clearer guidance or a fix from their side soon. While waiting for their update, you could try a few alternate approaches that might help. Below those are mentioned:

Use a Python notebook instead of Spark SQL to read the JSON it can handle nested structures more easily without defining every data type.

Ingest the API output to a landing folder first (as raw JSON) and then use a Dataflow Gen2 or Copy activity in Fabric to flatten and load it into your Lakehouse.

If the schema is changing often, try inferSchema=True in Spark when loading JSON to reduce manual schema issues.

Hopefully one of these can serve as a temporary workaround until the engineering team provides a fix.

Thanks again for keeping us updated. Please let us know if you have any doubts regarding this. We will be happy to help.

smeetsh · ‎09-15-2025

A small update on this. It is not resolved yet. I have tried skipping incompatible rows, since our data is built in such a way that that wouldn't case an issue. But for some reason came up with the same error. Flattening all arrays to strings I havent tried yet, since this would cause havoc with our ETL in the next step.

Does anyone have a good trick to change the empty arrays to [] I have been looking at doing this with a notebook, but I wasn't able to get that working.

v-kpoloju-msft · ‎09-12-2025

Hi @smeetsh,

Thank you for reaching out to the Microsoft Fabric Community Forum and providing the details and the JSON examples.

The error you are seeing is happening because the testResults field is sometimes an array (with values) and sometimes null. The pipeline expects it to always be an array, so when it encounters null, it fails with the “Not able to get enumerator for non-array” error. This wasn’t enforced as strictly before, which is why it may have only started breaking recently.

To fix this, the recommended approach is to normalize the JSON before ingestion so that null values are replaced with empty arrays []. This keeps the schema consistent and prevents the pipeline from failing. https://learn.microsoft.com/en-in/azure/data-factory/copy-activity-schema-and-type-mapping

As a quick workaround, you can also enable Skip incompatible rows under Copy Activity → Settings → Fault tolerance but do note that this will drop any rows where testResults is null.
https://learn.microsoft.com/en-in/azure/data-factory/copy-activity-fault-tolerance

For a long-term stable solution, I would suggest going with the normalization option, so you don’t risk losing records when the source sends null.

Hope this clears it up. Let us know if you have any doubts regarding this. We will be happy to help.

Thank you for using the Microsoft Fabric Community Forum.

v-kpoloju-msft · ‎09-14-2025

Hi @smeetsh,

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.

smeetsh · ‎09-18-2025

I have tried the above solution, but it doesn't work.

I have a case open with MS, according to co-pilot the way Fabric handles json has become more strict since they launched the preview of the new activity copy job. But I can't find any real resource to confirm this. I will be pushing MS for an answer on that.

Right now I am trying the solution @tayloramy suggested, which puts the json as a string in a column and than parsing it. That is going to be a bit of a learning curve for me though. I will keep this thread updated if I find a solution and how i have done it.

v-kpoloju-msft · ‎09-21-2025

Hi @smeetsh,

Thanks for getting back with an update. It sounds like you are already exploring a solid workaround with @tayloramy, suggestion of storing the JSON as a string and then parsing it totally understandable that it might take a little learning curve.

Good call on keeping the case open with Microsoft as well, since their confirmation on the stricter JSON handling will be the most reliable path forward. Please do share back here once you hear more or have success with your approach, I’m sure it will help others running into the same issue.

Thank you.

v-kpoloju-msft · ‎09-23-2025

Hi @smeetsh,

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.