Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
daircom
Helper II
Helper II

fabric bronze layer: load just files in bronze or also transform files to tables?

Hi all,

 

I am setting up a medallion architecture in Microsoft Fabric. The input so far has been files (but will contain files from shortcut and possibly data from a database in the future). 

 

Do I now need to manually transform it in a table like it is done in the screenshot below? I assume this is wrong as this cannot be scheduled/ added to a pipeline. So I assume i should do this step (file --> table) in a notebook?

And second question. The file is now in the bronze layer as you can see. When leaving the sturcture of the table as is, when transforming the file into a table: should the table be in the silver layer? Or should both the file and the table based on the file stay in the bronze layer? What is best practice concerning this? 

 

daircom_0-1729161162951.png

 

1 ACCEPTED SOLUTION
frithjof_v
Community Champion
Community Champion

Here is a great video from Advancing Analytics about the medallion architecture, which I find very insightful (for a newbie like me) and also fun to watch: https://youtu.be/fz4tax6nKZM?si=ErBKN3msWPZMhMHI

 

And some other threads discussing similar topic:

 

https://www.reddit.com/r/MicrosoftFabric/s/IDVENJxQDx

 

https://www.reddit.com/r/MicrosoftFabric/s/1MNZAfsHxe

View solution in original post

5 REPLIES 5
v-shex-msft
Community Support
Community Support

Hi @daircom ,

Did the above suggestions help with your scenario? if that is the case, you can consider Kudo or Accept the helpful suggestions to help others who faced similar requirements.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.
frithjof_v
Community Champion
Community Champion

Here is a great video from Advancing Analytics about the medallion architecture, which I find very insightful (for a newbie like me) and also fun to watch: https://youtu.be/fz4tax6nKZM?si=ErBKN3msWPZMhMHI

 

And some other threads discussing similar topic:

 

https://www.reddit.com/r/MicrosoftFabric/s/IDVENJxQDx

 

https://www.reddit.com/r/MicrosoftFabric/s/1MNZAfsHxe

AndyDDC
Most Valuable Professional
Most Valuable Professional

From your setup it looks like the "Files" area of your lakehouse is the "landing zone" for your raw data. And yes the tables section can become the tabular version of your raw data in delta format.

 

What I usually do is setup the Files section to have a full pull of data in a folder then incremental data in other folders.  I then use a notebook to load that data to delta tables in the same lakehouse.  This becomes the queryable raw data.

 

I usually have a mixture of CSV, JSON, and Parquet in the Files area, then load that using pyspark in a notebook into delta tables.

 

from there, I'll load that into another lakehouse (cleansed) with a the relevant cleaning etc

 

Once the schema support in lakehouse becomes stable and moves into GA, I'll likely have both raw and cleansed in the same lakehouse split by schema.

spencer_sa
Resolver IV
Resolver IV

You can use several methods to automate the loading of files into tables;

  • Dataflow Gen II - a lot of custom transformation capabilities; can be slow
  • Copy Data Activity - pretty much a straight equivalent to the 'Load to Tables'; fast, but limited capabilities
  • Copy Job (Preview) - a new way of doing copies in Pipelines - see link below
  • Notebooks - by far the most flexible;  fast; need to be able to code in Python, R, or possibly Spark SQL.

Copy Job - https://blog.fabric.microsoft.com/en-gb/blog/announcing-public-preview-copy-job-in-microsoft-fabric?... 

 

As far as locations of files and tables, we have raw files *and* a 'straight-copy-to-table' version in the same layer.  We then transform for Silver.

AndyDDC
Most Valuable Professional
Most Valuable Professional

I hope Copy Job becomes a way to create a metadata driven pipeline with multiple tables, that would be great

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

November Update

Fabric Monthly Update - November 2024

Check out the November 2024 Fabric update to learn about new features.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.