Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedJoin us at the 2025 Microsoft Fabric Community Conference. March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for $400 discount. Register now
Hi all,
I am setting up a medallion architecture in Microsoft Fabric. The input so far has been files (but will contain files from shortcut and possibly data from a database in the future).
Do I now need to manually transform it in a table like it is done in the screenshot below? I assume this is wrong as this cannot be scheduled/ added to a pipeline. So I assume i should do this step (file --> table) in a notebook?
And second question. The file is now in the bronze layer as you can see. When leaving the sturcture of the table as is, when transforming the file into a table: should the table be in the silver layer? Or should both the file and the table based on the file stay in the bronze layer? What is best practice concerning this?
Solved! Go to Solution.
Here is a great video from Advancing Analytics about the medallion architecture, which I find very insightful (for a newbie like me) and also fun to watch: https://youtu.be/fz4tax6nKZM?si=ErBKN3msWPZMhMHI
And some other threads discussing similar topic:
https://www.reddit.com/r/MicrosoftFabric/s/IDVENJxQDx
Hi @daircom ,
Did the above suggestions help with your scenario? if that is the case, you can consider Kudo or Accept the helpful suggestions to help others who faced similar requirements.
Regards,
Xiaoxin Sheng
Here is a great video from Advancing Analytics about the medallion architecture, which I find very insightful (for a newbie like me) and also fun to watch: https://youtu.be/fz4tax6nKZM?si=ErBKN3msWPZMhMHI
And some other threads discussing similar topic:
https://www.reddit.com/r/MicrosoftFabric/s/IDVENJxQDx
From your setup it looks like the "Files" area of your lakehouse is the "landing zone" for your raw data. And yes the tables section can become the tabular version of your raw data in delta format.
What I usually do is setup the Files section to have a full pull of data in a folder then incremental data in other folders. I then use a notebook to load that data to delta tables in the same lakehouse. This becomes the queryable raw data.
I usually have a mixture of CSV, JSON, and Parquet in the Files area, then load that using pyspark in a notebook into delta tables.
from there, I'll load that into another lakehouse (cleansed) with a the relevant cleaning etc
Once the schema support in lakehouse becomes stable and moves into GA, I'll likely have both raw and cleansed in the same lakehouse split by schema.
You can use several methods to automate the loading of files into tables;
As far as locations of files and tables, we have raw files *and* a 'straight-copy-to-table' version in the same layer. We then transform for Silver.
So if I understand correctly, you have a folder for example "rawbronze" where all csv, parquet, xmls etc are landed, and then another folder for example "bronze" where all of these files are converted and stored to a delta table in parquet format? And are there any schema in the latter version or its as is files, without any schema?
I hope Copy Job becomes a way to create a metadata driven pipeline with multiple tables, that would be great
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Fabric update to learn about new features.
User | Count |
---|---|
33 | |
3 | |
3 | |
2 | |
2 |
User | Count |
---|---|
16 | |
8 | |
6 | |
5 | |
4 |