Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Digidank
Helper I
Helper I

Can you call Fabric Spark Jobs from Airflow?

Is it possible to use the AzureSynapseRunSparkBatchOperator in airflow to trigger jobs in Fabric similar to how you can for Synapse?  I have been scouring looking for this answer, but haven't seem to have found it.

5 REPLIES 5
v-cboorla-msft
Community Support
Community Support

Hi @Digidank 

 

Thanks for using Microsoft Fabric Community.

No, directly using AzureSynapseRunSparkBatchOperator within Microsoft Fabric isn't recommended.

 

Leveraging Spark Jobs within Microsoft Fabric : While the Apache Airflow library offers the AzureSynapseRunSparkBatchOperator for triggering Spark jobs within Azure Synapse Analytics, this operator is not directly applicable when working with Microsoft Fabric.

Microsoft Fabric and Spark Jobs : Microsoft Fabric provides a comprehensive data management platform that integrates Azure Synapse Analytics alongside other services like Azure Data Factory and Databricks. While Fabric itself doesn't include an operator similar to AzureSynapseRunSparkBatchOperator, it offers alternative methods for executing Spark jobs.

Spark Job Definition Activity : The preferred approach for running Spark jobs inside Fabric data pipelines is through the Spark Job Definition Activity. This activity seamlessly integrates with Spark jobs defined within your Azure Synapse workspace, ensuring a smooth execution process.

For more details please refer : Compare Fabric Data Engineering and Azure Synapse Spark

                                               : Microsoft Fabric, explained for existing Synapse users

 

I hope this information helps. Please do let us know if you have any further questions.

 

Thanks.

Thanks.  Is the Spark Job Definition Activity only in preview in certain regions?  My capacity is on US East 2 and in my data pipeline i do not see the Spark Job Definition activity.  I only see Notebook activity.  That was the first place i looked and didn't see it so that's why i started down the Airflow route.  Then saw this documentation (https://learn.microsoft.com/en-us/fabric/data-engineering/run-spark-job-definition#how-to-run-a-spar...) said the only 2 ways were manually and via scheduler.  I was just hoping that documentation didn't account for possibly new functionality recently released.

The comparison page says this:

  • Pipeline activity support: Data pipelines in Fabric don't include Spark job definition activity yet. You could use scheduled runs if you want to run your Spark job periodically.

Hi @Digidank 

 

I apologize for the mistake I made.

Yes, you are correct, there are two ways you can run a Spark job definition:

  • Run a Spark job definition manually by selecting Run from the Spark job definition item in the job list.
  • Schedule a Spark job definition by setting up a schedule plan on the Settings tab. Select Settings on the toolbar, then select Schedule.

For details please refer : How to create an Apache Spark job definition in Fabric

 

Appreciate if you can upvote the existing Managed Apache Airflow support feedback: Microsoft Idea on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

 

I hope this information helps. Please do let us know if you have any further questions.

 

Thanks.

 

Hi @Digidank 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.


Thanks.

For now I am just making my multiple jobs run on frequent schedules and have a state table to keep track of when one should fire.  It's not pretty, but its working.  I am exploring using the Web Request activity in Pipelines to call the REST api to fire spark jobs.

Helpful resources

Announcements
Sept Fabric Carousel

Fabric Monthly Update - September 2024

Check out the September 2024 Fabric update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.