Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

Reply
pwr20
Regular Visitor

Dataflow Gen2 File Destination - JSON and Parquet Format Support Needed for Nested OData Data

I'm integrating a third party OData API hosted on an on-premises server accessible only via on-premises data gateway.

Current Situation:

  • OData source has deeply nested columns
  • Dataflow Gen2 can connect and stream data fine via gateway
  • We need to land raw data as files in Lakehouse before flattening in a Notebook downstream

The Problem: Dataflow Gen2 file destination currently only supports delimited/CSV format. CSV cannot preserve nested column structures from OData - nested columns are either dropped or corrupted.

What I need: JSON or Parquet file format support in Dataflow Gen2 file destination. Both formats natively support nested structures and would solve this completely.

Questions:

  1. Is JSON or Parquet file destination on the Fabric roadmap for Dataflow Gen2?
  2. Is there any current workaround to preserve nested OData columns when writing from Dataflow Gen2?

What I tried so far:

  • Writing to Lakehouse table without expanding nested columns - nested columns are silently dropped, only scalar columns are written.
  • Expanding nested columns fully in Dataflow before writing. it works for smaller nested tables but times out for larger ones due to high volume of records navigating through on-premises gateway sequentially.
  • Staging destination in Dataflow - same as Lakehouse, drops nested columns.
  • CSV file destination - cannot preserve nested structure.
  • ADLS Gen2 destination - only delimited format available.
  • KQL Database destination - not visible in our Dataflow Gen2 destinations.
  • Pipeline Copy Activity - OData not available as a source connector.
  • Notebook - OData connector not available as a data source.

Root Cause: OData navigation properties mean each nested table requires sequential OData requests per parent record through the gateway - 400 parent records × multiple nested tables = thousands of sequential gateway requests causing timeouts. JSON/Parquet file destination would allow streaming pages directly to file without needing full expansion in memory.

 

Any help appreciated. Thank you.

1 ACCEPTED SOLUTION
v-prasare
Community Support
Community Support

Hi @pwr20 , 

Currently, Microsoft Fabric Dataflow Gen2 supports only CSV/delimited formats for file-based destinations such as Lakehouse Files and ADLS Gen2. These formats do not support nested data structures, which makes them unsuitable for landing raw nested OData data without flattening.

When working with nested OData navigation properties, Dataflow Gen2 treats nested records as unsupported for tabular or file destinations. As a result, nested columns are either dropped or replaced with null values when writing to Lakehouse tables, staging destinations, or files. Fully expanding nested data within Dataflow can work for small volumes but does not scale reliably, especially when using an on‑premises data gateway, due to sequential API calls and timeout limitations.

At this time, JSON or Parquet file destinations are not available in Dataflow Gen2, and there is no built‑in, scalable solution to preserve nested OData structures end‑to‑end. This is a current product limitation, not a configuration issue.

 

I’d encourage you to submit your detailed feedback and ideas via Microsoft's official feedback channels, such as the Microsoft Fabric Ideas.

Feedback submitted here is often reviewed by the product teams and can lead to meaningful improvement.

 

 

 

 

Thanks,

Prashanth

View solution in original post

7 REPLIES 7
v-prasare
Community Support
Community Support

Hi @pwr20 , 

Currently, Microsoft Fabric Dataflow Gen2 supports only CSV/delimited formats for file-based destinations such as Lakehouse Files and ADLS Gen2. These formats do not support nested data structures, which makes them unsuitable for landing raw nested OData data without flattening.

When working with nested OData navigation properties, Dataflow Gen2 treats nested records as unsupported for tabular or file destinations. As a result, nested columns are either dropped or replaced with null values when writing to Lakehouse tables, staging destinations, or files. Fully expanding nested data within Dataflow can work for small volumes but does not scale reliably, especially when using an on‑premises data gateway, due to sequential API calls and timeout limitations.

At this time, JSON or Parquet file destinations are not available in Dataflow Gen2, and there is no built‑in, scalable solution to preserve nested OData structures end‑to‑end. This is a current product limitation, not a configuration issue.

 

I’d encourage you to submit your detailed feedback and ideas via Microsoft's official feedback channels, such as the Microsoft Fabric Ideas.

Feedback submitted here is often reviewed by the product teams and can lead to meaningful improvement.

 

 

 

 

Thanks,

Prashanth

Thank you for the detailed explanation. This really clears things up and confirms what I have been experiencing. It is reassuring to know this is a product limitation rather than a configuration issue on our end.

 

Based on your feedback and our own investigation, we are now looking at two alternative approaches for the nested OData entities:

1. Using the Fabric Pipeline Copy Data Activity with the on-premises data gateway to land raw JSON or Parquet files into Lakehouse Files, and then using a Fabric Notebook to flatten the nested structures from there. we are currently investigating whether Fabric Pipeline Copy Data Activity supports our OData on-premises scenario and will update the thread with our findings

 

2. Azure Data Factory with a self-hosted integration runtime as a fallback option if the Copy Data Activity approach does not work out. For the flat/scalar entities, we will continue using Dataflow Gen2 as it works well for those.

 

I will also submit feedback via Microsoft Fabric Ideas as you suggested, as JSON and Parquet file destinations in Dataflow Gen2 would be a very valuable addition for scenarios like this.

 

Thanks again for taking the time to respond.

pwr20
Regular Visitor

@nilendraFabric 

Thank you for the detailed response!

I tried the Text.FromBinary(Json.FromValue()) approach, but unfortunately it triggers a 404 error in our case. It appears that when Dataflow applies this transformation, the OData connector attempts to resolve the nested table via a navigation URL rather than operating on the already loaded data. The URL it constructs appends the nested entity path and that endpoint returns a 404 from our API.

 

We worked around this by using $select in the OData URL to return the nested data inline, combined with Table.ExpandTableColumn. This avoids the navigation request and works cleanly for smaller loads.

 

However, we are now facing a scale challenge. We have a large number of parent records each containing thousands of nested records, and the $filter parameter only applies at the parent level not the nested level. The standalone endpoint for the nested entity is also restricted in our API.

 

Do you have any suggestions for handling incremental or filtered loads of nested OData data at scale within Fabric Dataflow Gen2?

Hi @pwr20 

 

give this a try

 

Use Dataflow Gen2 to pull just the parent records (scalar columns only, fast and reliable). Write those to a Lakehouse table. Then trigger a Notebook that reads the parent IDs and calls the OData API directly using requests over HTTP for the nested data. The Notebook can handle retries, parallelism, and write JSON/Parquet directly to Lakehouse Files.

Hi @nilendraFabric 

Thank you for the suggestion.

 

Unfortunately this approach won't work in our case because the OData API is hosted on-premises and is only accessible through an on-premises data gateway. Fabric Notebooks cannot route requests through the on-premises gateway, so a direct HTTP call using requests from a Notebook would not be able to reach the API.

This is why we are constrained to Dataflow Gen2 for the data extraction part, as it is the only Fabric component that can connect through the on-premises gateway to our OData source.

 

Let me know your thoughts.

v-prasare
Community Support
Community Support

Hi @pwr20,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.


@nilendraFabric ,Thanks for your prompt response

 

 

Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support

nilendraFabric
Super User
Super User

Hi @pwr20 

 

The Lakehouse Files destination (currently in Preview) only supports writing CSV files. The ADLS Gen2 destination is also in Preview but limited to delimited format. You should submit this as a feature request at aka.ms/FabricIdeas

 

Your diagnosis is accurate. Dataflows treat nested records in columns as errors and replace them with null when writing to tabular destinations.

There is no clean solution but try this 

 

 

Serialize nested columns to JSON text in M before writing to a table destination. In Power Query, use Text.FromBinary(Json.FromValue([NestedColumn])) to convert each nested column into a JSON string. Write those string columns to a Lakehouse table. Then parse them back in a Notebook downstream. This preserves the structure without needing to expand navigation properties.

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.