Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

Reply
vitaly
Advocate II
Advocate II

Apache Hudi and OneLake

Has anyone been able to successfully read or write Apache Hudi files via Fabric Notebooks? If so, would you mind please sharing your approach?

 

We've tried a range of code approaches and seem to be running into a compatibility issue between the Hudi package and OneLake.

9 REPLIES 9
v-kpoloju-msft
Community Support
Community Support

Hi @vitaly,

Thank you for reaching out to the community and sharing the details of your scenario.

Currently, Apache Hudi is not officially supported in Microsoft Fabric Notebooks, particularly when interacting with One Lake. While Fabric Notebooks are powered by Apache Spark, the current managed runtime does not allow importing external JAR packages (like Hudi bundles), and One Lake does not expose a file system interface compatible with Hudi's timeline and metadata operations.

If your primary goal is to perform upserts, incremental processing, or ACID-compliant writes, I recommend using Delta Lake instead, which is natively supported in Microsoft Fabric and One Lake. Here is the one example shown below. Please go through them.

df.write.format("delta").mode("overwrite").save("Tables/my_delta_table")
df = spark.read.format("delta").load("Tables/my_delta_table")

You can find the list of supported file formats in One Lake here:

vkpolojumsft_0-1754386402690.png

 

 One Lake indexer (preview) - Azure AI Search | Microsoft Learn

While it is technically possible to use Hudi in Apache Spark by including relevant JARs and configuring write options, Microsoft Fabric currently does not support external JAR imports or Spark reconfiguration using “%%configure.”
Apache Spark runtime in Fabric - Microsoft Fabric | Microsoft Learn

If your use case absolutely requires Hudi, you may consider:

  • Running your Hudi workload in a Databricks or Synapse Spark environment
  • Writing to an ADLS Gen2 container, then accessing it in Fabric through shortcuts or ingestion pipelines.

If your scenario requires Apache Hudi and full control over Spark JARs, you may consider using Azure HDInsight or Azure Databricks with One Lake integration:

Use OneLake with Azure HDInsight

Use OneLake with Azure Databricks

These environments support advanced file formats like Apache Hudi, and still allow data exchange with One Lake, even though they run outside the Microsoft Fabric runtime.

Hope this helps clarify things and let me know what you find after giving these steps a try happy to help you investigate this further.

Thank you for using the Microsoft Fabric Community Forum.

Sorry, what's your source for the statment "Microsoft Fabric currently does not support external JAR imports or Spark reconfiguration using “%%configure”?

 

According to Manage Apache Spark libraries in Microsoft Fabric it's officially stated as possible. And it seems to work well from my experiments.

 

For our use case we specifically need Hudi, not Delta Lake. And the compatibility issue seems be specific to OneLake storage, not Fabric's Spark execution runtime.

Hi @vitaly

Thank you for the clarification and you are right to point out the official documentation regarding custom library support in Microsoft Fabric.

You are referring to this article: Manage Libraries in Microsoft Fabric Notebooks

You are correct: as per the latest Fabric updates, custom libraries can now be managed at the workspace and session level, including via %pip and %mssparkutils.library.installPyPI. This indeed enables more flexibility when working with packages like Apache Hudi in Fabric Notebooks. To work arounds this issue:

Option 1: Use ADLS Gen2 via Shortcuts

Option 2: Run Full Hudi Workload Externally (HDInsight or Databricks)


Hope this helps clarify things and let me know what you find after giving these steps a try happy to help you investigate this further.

Thanks again for your insights and for contributing to the community discussion.

Sorry, is this an AI-generated response? I am looking for expert opinion and community insight rather than GenAI.

Hi @vitaly,
Apologize for the delayed response.

Store your Hudi data in ADLS Gen2 (outside One Lake). Use Apache XTable (Incubating) in Fabric to translate the Hudi dataset into Delta Lake format.
https://xtable.apache.org/docs/fabric/

Access the translated Delta dataset via standard Fabric engines (Spark, T-SQL, Power BI), leveraging One Lake or shortcuts. This ensures full compatibility and performance.
https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-and-delta-tables

Hope this helps. If you have any doubts regarding this, please feel free to ask here. We will be happy to help.

Thank you for using the Microsoft Community Forum.

Hi @vitaly,

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.

Hi @vitaly,

Hope you had a chance to try out the solution shared earlier. Let us know if anything needs further clarification or if there's an update from your side always here to help.

Thank you.

@v-kpoloju-msft 

 

The specific requirement again is to access Hudi files hosted in OneLake from Fabric Notebooks. Native OneLake storage, not shortcuts. Fabric Notebooks with Spark for the runtime, not any other runtimes such as T-SQL, etc. Hudi format, not Delta Lake, Parquet or other formats.

 

I've reviewed the links and explanation you provided and do not see anything that addresses this specific requirement. Please let me know if I missed it.

Hi @vitaly,

Since you are facing issues while integrating lakehouse with apachi hudi, please raise your issue here: Issues - Microsoft Fabric Community

Thank you.

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2026

Check out the April 2026 Fabric update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.