Solved: Re: DBT Orchestration

jk-dbt · ‎11-07-2023

Hi,

I have a question on dbt orchestration.

I'm not sure how I can trigger a production dbt run in a Fabric pipeline and hoping someone in the community had a suggestion.

The dbt-fabric adaptor is working perfectly. The config is good and I can target dev and prod with a manual dbt run with no issues.

However, I would like to have run without me having to be at desk pressing go every morning. Does Fabric have a way to pull the from the repo and trigger on a schedule?

Thanks a mil.

AndyDDC · ‎11-07-2023

OK thanks. At the moment you need to run dbt core (and therefore dbt run) on some non-fabric compute unfortunately, like a VM.

View solution in original post

Nghia · ‎11-07-2023

You can clone your code from git in the notebook and then run dbt in the notebook.
https://www.linkedin.com/pulse/dbt-microsoft-fabric-my-journ-nghia-tran/

Note: Change runtime version in Spark Setting from 1.2 to 1.1 (Runtime version 1.2 does not have git).

View solution in original post

sanetaboada · ‎09-11-2024

If you need an "all fabric" solution, I think the option there would be to use Fabic Data Factory's Data Workflows (Apache Airflow behind the hood). I.e. configure Data Workflow to point to a git repo where your dbt code is, and let it orchestrate your dbt jobs. This will involve learning Apache Airflow though.

I'm gearing towards that path since my team's skills are mostly dbt and I wanted to build a solution under the promise of Microsoft Fabric's "complete end-to-end data platform" without having to provision VM's, external integrations, etc.

But beware though, Data Factory Data Workflow is still in preview and is poorly documented. Until now I'm still scratching my head on how to make it work! 😅

Joshrodgers123 · ‎09-11-2024

There's a DBT core pipeline activity that is on the release plan for this quarter. Supposedly you can store your project in the files section of a lakehouse and it will run it for you. I have heard storing it in GitHub/DevOps will be supported later.

If you're on DBT cloud, you can trigger it by calling the job API from a pipeline using a web activity.

Christophe93100 · ‎01-23-2024

Hi dears

Do you know ,how can we call a DBT cloud job ( for a project ) on a Microsoft

Fabric datafactory pipeline tasks ?

Via call to dbt cloud Api ?

WE would like Fabric datafactory pipelines to be full chain orchestrator

AndyDDC · ‎11-07-2023

Hi @jk-dbt just a quick question: where are you running dbt core, is it on your desktop PC, or on a VM etc?

jk-dbt · ‎11-07-2023

Hey @AndyDDC . Appreciate you coming back on this. I'm developing on my local machine.

The basic workflow I'd like to have if possible would be like:

Analytics engineers develop on their local machine.
Push to github.
scheduled Fabric pipeline somehow clones the repo.
Fabric somehow does a dbt run
Alert messages get emailed ….

One idea I’ve seen floating around is to have a notebook do the lat 3 steps above. I actually pip installed dbt in the notebook, got all excited but then got stuck trying to clone the repo.

Nghia · ‎11-07-2023

You can clone your code from git in the notebook and then run dbt in the notebook.
https://www.linkedin.com/pulse/dbt-microsoft-fabric-my-journ-nghia-tran/

Note: Change runtime version in Spark Setting from 1.2 to 1.1 (Runtime version 1.2 does not have git).

jk-dbt · ‎11-08-2023

That worked.....nice one Nghia!

AndyDDC · ‎11-07-2023

OK thanks. At the moment you need to run dbt core (and therefore dbt run) on some non-fabric compute unfortunately, like a VM.

Jrampono · ‎07-12-2024

This is a great question and one I started to ponder myself about a year ago when I was working at Microsoft.

In my case I wanted to use native scheduling functionality in Fabric and avoid an external Python environment to host dbt.

It led me to developing a new dbt-adapter that allows the dbt project to be built locally without connection to the fabric workspace and for it to generate notebooks instead of directly execute the sql statements against the Fabric endpoint.

I wanted to use the lakehouse rather than the warehouse in fabric and I wanted to keep all or the transformation logic in spark sql rather than tsql.

It’s working very well so far and it gives the best of both worlds in my opinion. All of dbt’s benefits such as catalog, lineage, model dependency management, tests etc but with native Fabric artefacts generated. This also allows you to use native fabric and git functionality to migrate through dev, test, prod without having to setup a devops runner to host dbt. I am currently open sourcing the adapter so If you’d like more info hit me up on linked in

https://www.linkedin.com/in/john-rampono-8a73729?utm_source=share&utm_campaign=share_via&utm_content...