Get certified for free when you join Fabric Data Days 2026 and dive into Fabric, Power BI, SQL, AI, and other essential data skills.
Join nowData Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more
Ingest files ( CSV, Excel, or JSON) from the drop location, enforce schemas, deduplicate records, and maintain file-level auditability. Ensure consistent naming conventions, a clear partitioning strategy, and the ability to reprocess data reliably using microsoft fabric, Please recommend a best approach to implement it..
Solved! Go to Solution.
Here is a productionโready pattern for bringing CSV, Excel, and JSON files from different data streams (Facebook, Twitter, etc.) into a Fabricโbased Medallion architecture. It aims to ensure clean ingestion, strong governance, consistent structure, and reliable reprocessing without creating unnecessary complexity.
Set up separate workspaces or at least clearly separated Lakehouse artifacts for Bronze, Silver, and Gold layers. This fits the Medallion approach recommended for Fabric and keeps raw data isolated from cleaned and curated layers.
Regardless of where files arrive (SharePoint, ADLS, S3, or an application drop), adopt a consistent folder structure such as:
/landing/<source>/<yyyy>/<mm>/<dd>/<original file>
If the files arrive in external storage, avoid copying them unnecessarily. Instead, create OneLake shortcuts into your Lakehouse so Fabric can reference them directly without duplication.
A simple and maintainable approach is:
This ensures consistent ingestion and centralised orchestration.
Define a schema for each Bronze table. Fabric will enforce this when writing into Delta tables:
The idea is: Bronze = typed but untouched business content, with full audit traceability.
Add columns such as:
This gives you traceability without complicating the raw dataset.
Use partition columns that match your analytics patterns:
Fabric Pipelines support writing Lakehouse tables with partition columns, which keeps downstream queries fast and makes reprocessing more targeted.
Avoid overโpartitioning (e.g., by hour or minute) unless you truly need it.
Bronze should preserve everything as it arrived.
Handle deduplication in Silver using Delta Lake features (MERGE or window functions), for example:
This keeps your raw history intact but gives you a clean Silver layer for downstream use.
Because you will have:
You can reliably re-run loads for specific dates, sources, or files without corrupting your curated layers.
Pipelines should accept parameters such as:
Use Dataflows Gen2 when you want a lowโcode transformation step before Silver. Ideal for light reshaping or type casting.
Use Spark notebooks when:
Keep logic minimal in Bronze, heavier in Silver.
Implement:
This avoids surprises and maintains trust in the ingestion layer.
Adopt naming patterns that make browsing and automation predictable:
Tables
br_<domain>_<entity>
sv_<domain>_<entity>
gd_<domain>_<entity>
Folders
/landing/source/yyyy/mm/dd
/bronze/<entity>
Workspaces
<domain>-bronze, <domain>-silver, <domain>-gold
Hi @sreb_sreelesh,
We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support
Hi @sreb_sreelesh,
We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support
Here is a productionโready pattern for bringing CSV, Excel, and JSON files from different data streams (Facebook, Twitter, etc.) into a Fabricโbased Medallion architecture. It aims to ensure clean ingestion, strong governance, consistent structure, and reliable reprocessing without creating unnecessary complexity.
Set up separate workspaces or at least clearly separated Lakehouse artifacts for Bronze, Silver, and Gold layers. This fits the Medallion approach recommended for Fabric and keeps raw data isolated from cleaned and curated layers.
Regardless of where files arrive (SharePoint, ADLS, S3, or an application drop), adopt a consistent folder structure such as:
/landing/<source>/<yyyy>/<mm>/<dd>/<original file>
If the files arrive in external storage, avoid copying them unnecessarily. Instead, create OneLake shortcuts into your Lakehouse so Fabric can reference them directly without duplication.
A simple and maintainable approach is:
This ensures consistent ingestion and centralised orchestration.
Define a schema for each Bronze table. Fabric will enforce this when writing into Delta tables:
The idea is: Bronze = typed but untouched business content, with full audit traceability.
Add columns such as:
This gives you traceability without complicating the raw dataset.
Use partition columns that match your analytics patterns:
Fabric Pipelines support writing Lakehouse tables with partition columns, which keeps downstream queries fast and makes reprocessing more targeted.
Avoid overโpartitioning (e.g., by hour or minute) unless you truly need it.
Bronze should preserve everything as it arrived.
Handle deduplication in Silver using Delta Lake features (MERGE or window functions), for example:
This keeps your raw history intact but gives you a clean Silver layer for downstream use.
Because you will have:
You can reliably re-run loads for specific dates, sources, or files without corrupting your curated layers.
Pipelines should accept parameters such as:
Use Dataflows Gen2 when you want a lowโcode transformation step before Silver. Ideal for light reshaping or type casting.
Use Spark notebooks when:
Keep logic minimal in Bronze, heavier in Silver.
Implement:
This avoids surprises and maintains trust in the ingestion layer.
Adopt naming patterns that make browsing and automation predictable:
Tables
br_<domain>_<entity>
sv_<domain>_<entity>
gd_<domain>_<entity>
Folders
/landing/source/yyyy/mm/dd
/bronze/<entity>
Workspaces
<domain>-bronze, <domain>-silver, <domain>-gold
Very well articualted the step by step process. Easy to understand the medallian architecture. Thanks.