Solved: Need Help in Building a Metadata Driven Pipeline u...

Subhashsiva · ‎08-06-2025

Hello Community,

I'm building a Metadata-Driven Ingestion Framework using Notebooks in Microsoft Fabric, and I'm looking for guidance on handling a non-ideal file structure in ADLS.

We are ingesting data from an ADLS source, where a shortcut has been created in the Lakehouse. The objective is to populate the Bronze layer with three Delta tables, each corresponding to a specific category:

BB_End_User
End_User
MonthlyMobileActiveUsers

The challenge is that all source files are placed together in a single folder, without any subfolder organization. The files follow a naming pattern like:

BB_End_User_YYYYMM
End_User_YYYYMM
MobileActiveUsers_YYYYMM

Ideally, these files would be stored in category-specific folders, but restructuring is not an option at this time.

What we're trying to achieve:

Read and write the files into their corresponding Delta tables in the Bronze layer based on their file name.
Use a metadata/config table to drive the ingestion logic for scalability.
Support both full and incremental loads, based on configuration.
Build a semantic model on top of the Bronze layer for reporting.

I’m looking for suggestions or best practices on:

Filtering and routing files by category when they are all in a single folder
Structuring the metadata/config table for flexible ingestion
Implementing full and incremental loads effectively in Fabric notebooks

If you've dealt with a similar scenario, I'd really appreciate your insights. Thank you.

v-sdhruv · ‎08-07-2025

Hi @Subhashsiva ,

1.Design a metadata/config table with fields like:

source_file_pattern : e.g., BB_End_user_\d{6}
target_table: e.g., Bronze.BB_End_user
load_type: full or incremental
date_key, start_date,last_loaded_date
status, rows_inserted, rows_updated

This enables dynamic pipeline orchestration and tracking

2. Use the metadata table to determine load type:

Full Load: Drop and recreate the Delta table or overwrite.
Incremental Load: Filter based on date_key or last_loaded_date

In Fabric Notebooks, you can trigger notebook execution via:

Interactive run
Pipeline activity
Scheduler plan

3. Notebook Execution and Orchestration

Use Fabric Pipelines to orchestrate notebook execution:

Lookup activity to fetch config rows.
For each row, pass parameters to a child notebook.
Use Run as pipeline activity for scalable execution

Refer- Metadata Driven Pipelines for Microsoft Fabric

Hope this helps!

View solution in original post

v-sdhruv · ‎08-25-2025

Hi @Subhashsiva ,
Since we didnt hear back, we would be closing this thread.
If you need any assistance, feel free to reach out by creating a new post.

Thank you for using Microsoft Community Forum

v-sdhruv · ‎08-20-2025

Hi @Subhashsiva ,

Just wanted to check if you got a chance to review the suggestions provided and whether that helped you resolve your query?

v-sdhruv · ‎08-13-2025

Hi @Subhashsiva ,

Just wanted to check if you got a chance to review the suggestions provided and whether that helped you resolve your query?
If the answer has helped you resolve your query, please "Accept it as Solution" so that other members can also benefit from it.

Thank You

v-sdhruv · ‎08-07-2025