Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now

Reply
sreb_sreelesh
New Member

Ingesting Data from Flat file to Bronze Layer form different datastreams(facebook,twitter)

Ingest files ( CSV, Excel, or JSON) from the drop location, enforce schemas, deduplicate records, and maintain file-level auditability. Ensure consistent naming conventions, a clear partitioning strategy, and the ability to reprocess data reliably using microsoft fabric, Please recommend a best approach to implement it..

1 ACCEPTED SOLUTION
deborshi_nag
Resident Rockstar
Resident Rockstar

Hi @sreb_sreelesh 

 

Here is a production‑ready pattern for bringing CSV, Excel, and JSON files from different data streams (Facebook, Twitter, etc.) into a Fabric‑based Medallion architecture. It aims to ensure clean ingestion, strong governance, consistent structure, and reliable reprocessing without creating unnecessary complexity.


1. Organise Your Lakehouse and Workspaces

Set up separate workspaces or at least clearly separated Lakehouse artifacts for Bronze, Silver, and Gold layers. This fits the Medallion approach recommended for Fabric and keeps raw data isolated from cleaned and curated layers.


2. Use a Clear Landing Zone Structure

Regardless of where files arrive (SharePoint, ADLS, S3, or an application drop), adopt a consistent folder structure such as:

/landing/<source>/<yyyy>/<mm>/<dd>/<original file>

If the files arrive in external storage, avoid copying them unnecessarily. Instead, create OneLake shortcuts into your Lakehouse so Fabric can reference them directly without duplication.


3. Ingest Files into the Bronze Layer Using Fabric Pipelines

A simple and maintainable approach is:

A. Data Factory (Pipelines) + Copy Activity

  • Connect the pipeline to the landing location or shortcut.
  • Map it to a Bronze Lakehouse table.
  • Apply column mapping and data types during ingestion.
  • Add operational metadata (file name, path, load timestamp).
  • If needed, split ingestion paths for CSV, Excel, and JSON but keep the same table‑level schema.

This ensures consistent ingestion and centralised orchestration.


4. Enforce Schema at the Bronze Stage

Define a schema for each Bronze table. Fabric will enforce this when writing into Delta tables:

  • Reject rows that don’t match expected structure.
  • Allow controlled schema evolution later, not silently in Bronze.
  • Keep the raw file intact in the Files area for audit and reprocessing.

The idea is: Bronze = typed but untouched business content, with full audit traceability.


5. Add the Minimum Metadata Needed for Auditability

Add columns such as:

  • _source (facebook, twitter, etc.)
  • _file_name
  • _file_path
  • _ingest_time
  • _load_id (pipeline run identifier)

This gives you traceability without complicating the raw dataset.


6. Partition Your Bronze/Silver Tables Intentionally

Use partition columns that match your analytics patterns:

  • Preferably event_date if present in the payload.
  • Otherwise, use ingestion_date.

Fabric Pipelines support writing Lakehouse tables with partition columns, which keeps downstream queries fast and makes reprocessing more targeted.

Avoid over‑partitioning (e.g., by hour or minute) unless you truly need it.


7. Keep Deduplication Out of Bronze

Bronze should preserve everything as it arrived.

Handle deduplication in Silver using Delta Lake features (MERGE or window functions), for example:

  • Choose a business key such as post_id, ad_id, or composite keys depending on the source.
  • When multiple versions arrive, keep the latest based on _ingest_time or event timestamp.

This keeps your raw history intact but gives you a clean Silver layer for downstream use.


8. Enable Safe and Repeatable Reprocessing

Because you will have:

  • Raw files preserved in /Files
  • Well‑structured Bronze tables
  • Partitioned tables
  • Delta Lake time travel

You can reliably re-run loads for specific dates, sources, or files without corrupting your curated layers.

Pipelines should accept parameters such as:

  • date ranges
  • source system
  • file paths
  • reprocess mode

 

9. Integrate Dataflows and Notebooks Where Needed

Use Dataflows Gen2 when you want a low‑code transformation step before Silver. Ideal for light reshaping or type casting.

Use Spark notebooks when:

  • JSON is deeply nested
  • You need complex cleanup
  • You are standardising schemas across multiple sources
  • You need heavy deduplication or enrichment

Keep logic minimal in Bronze, heavier in Silver.


10. Maintain Governance and Quality

Implement:

  • Audit logs via Microsoft Purview to track who/what/when inside Fabric
  • Data Quality checks (optional) as early as the Bronze/Silver boundary
  • Version control and promotion through Fabric Git integration or Deployment Pipelines

This avoids surprises and maintains trust in the ingestion layer.


11. Naming Conventions

Adopt naming patterns that make browsing and automation predictable:

Tables
br_<domain>_<entity>
sv_<domain>_<entity>
gd_<domain>_<entity>

Folders
/landing/source/yyyy/mm/dd
/bronze/<entity>

Workspaces
<domain>-bronze, <domain>-silver, <domain>-gold

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

4 REPLIES 4
v-prasare
Community Support
Community Support

Hi @sreb_sreelesh,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.

 

 

 

Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support

v-prasare
Community Support
Community Support

Hi @sreb_sreelesh,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.

 

 

 

Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support

deborshi_nag
Resident Rockstar
Resident Rockstar

Hi @sreb_sreelesh 

 

Here is a production‑ready pattern for bringing CSV, Excel, and JSON files from different data streams (Facebook, Twitter, etc.) into a Fabric‑based Medallion architecture. It aims to ensure clean ingestion, strong governance, consistent structure, and reliable reprocessing without creating unnecessary complexity.


1. Organise Your Lakehouse and Workspaces

Set up separate workspaces or at least clearly separated Lakehouse artifacts for Bronze, Silver, and Gold layers. This fits the Medallion approach recommended for Fabric and keeps raw data isolated from cleaned and curated layers.


2. Use a Clear Landing Zone Structure

Regardless of where files arrive (SharePoint, ADLS, S3, or an application drop), adopt a consistent folder structure such as:

/landing/<source>/<yyyy>/<mm>/<dd>/<original file>

If the files arrive in external storage, avoid copying them unnecessarily. Instead, create OneLake shortcuts into your Lakehouse so Fabric can reference them directly without duplication.


3. Ingest Files into the Bronze Layer Using Fabric Pipelines

A simple and maintainable approach is:

A. Data Factory (Pipelines) + Copy Activity

  • Connect the pipeline to the landing location or shortcut.
  • Map it to a Bronze Lakehouse table.
  • Apply column mapping and data types during ingestion.
  • Add operational metadata (file name, path, load timestamp).
  • If needed, split ingestion paths for CSV, Excel, and JSON but keep the same table‑level schema.

This ensures consistent ingestion and centralised orchestration.


4. Enforce Schema at the Bronze Stage

Define a schema for each Bronze table. Fabric will enforce this when writing into Delta tables:

  • Reject rows that don’t match expected structure.
  • Allow controlled schema evolution later, not silently in Bronze.
  • Keep the raw file intact in the Files area for audit and reprocessing.

The idea is: Bronze = typed but untouched business content, with full audit traceability.


5. Add the Minimum Metadata Needed for Auditability

Add columns such as:

  • _source (facebook, twitter, etc.)
  • _file_name
  • _file_path
  • _ingest_time
  • _load_id (pipeline run identifier)

This gives you traceability without complicating the raw dataset.


6. Partition Your Bronze/Silver Tables Intentionally

Use partition columns that match your analytics patterns:

  • Preferably event_date if present in the payload.
  • Otherwise, use ingestion_date.

Fabric Pipelines support writing Lakehouse tables with partition columns, which keeps downstream queries fast and makes reprocessing more targeted.

Avoid over‑partitioning (e.g., by hour or minute) unless you truly need it.


7. Keep Deduplication Out of Bronze

Bronze should preserve everything as it arrived.

Handle deduplication in Silver using Delta Lake features (MERGE or window functions), for example:

  • Choose a business key such as post_id, ad_id, or composite keys depending on the source.
  • When multiple versions arrive, keep the latest based on _ingest_time or event timestamp.

This keeps your raw history intact but gives you a clean Silver layer for downstream use.


8. Enable Safe and Repeatable Reprocessing

Because you will have:

  • Raw files preserved in /Files
  • Well‑structured Bronze tables
  • Partitioned tables
  • Delta Lake time travel

You can reliably re-run loads for specific dates, sources, or files without corrupting your curated layers.

Pipelines should accept parameters such as:

  • date ranges
  • source system
  • file paths
  • reprocess mode

 

9. Integrate Dataflows and Notebooks Where Needed

Use Dataflows Gen2 when you want a low‑code transformation step before Silver. Ideal for light reshaping or type casting.

Use Spark notebooks when:

  • JSON is deeply nested
  • You need complex cleanup
  • You are standardising schemas across multiple sources
  • You need heavy deduplication or enrichment

Keep logic minimal in Bronze, heavier in Silver.


10. Maintain Governance and Quality

Implement:

  • Audit logs via Microsoft Purview to track who/what/when inside Fabric
  • Data Quality checks (optional) as early as the Bronze/Silver boundary
  • Version control and promotion through Fabric Git integration or Deployment Pipelines

This avoids surprises and maintains trust in the ingestion layer.


11. Naming Conventions

Adopt naming patterns that make browsing and automation predictable:

Tables
br_<domain>_<entity>
sv_<domain>_<entity>
gd_<domain>_<entity>

Folders
/landing/source/yyyy/mm/dd
/bronze/<entity>

Workspaces
<domain>-bronze, <domain>-silver, <domain>-gold

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

Very well articualted the step by step process. Easy to understand the medallian architecture. Thanks.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Fabric Update Carousel

Fabric Monthly Update - February 2026

Check out the February 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.