Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
I'm getting started with a medallion architecture pattern in Fabric and wondering what the tradeoffs are to ingesting APIs to File vs Table via a pipeline. Is there one destination that's best practice? If not, why would you choose one over the other?
Solved! Go to Solution.
Hi @cpshort7186,
It depends on the JSON file that you are receiving from the API. If it is a flat JSON file that is in tabular format, you should load the JSON directly to the Tables section in the bronze lakehouse. With a table, you have all the advantages of a delta file, such as ACID transactions, time travel and optimized storage. However, when the JSON file has many nested elements, it would not be convenient to store it in a table. You could first copy it to the Files section and do some transformations and JSON expanding afterwards (for example in a notebook).
We tend to write to json first and from there to lakehouse tables. The main reason for us is the fact that if something isn't right in the data, we can have a look at the actual recent json file. This is particularly usefull when using 3rd party API's over which you have no control. Sometimes companies are less then perfect ( 😂 )when it comes down to making changes in jsons, adding or renaming a whole key pairs, without letting us know.
We tend to write to json first and from there to lakehouse tables. The main reason for us is the fact that if something isn't right in the data, we can have a look at the actual recent json file. This is particularly usefull when using 3rd party API's over which you have no control. Sometimes companies are less then perfect ( 😂 )when it comes down to making changes in jsons, adding or renaming a whole key pairs, without letting us know.
Yes - I can see this point as I have been working with Fabric over the last month or so. Additionally, it seems that Fabric Notebook support for all SQL functions is not fully built out yet (ALTER limitations came up for me, but see the whole list here: https://learn.microsoft.com/en-us/fabric/data-warehouse/tsql-surface-area), so it makes working with tables more difficult, whereas JSON has been easier. For those reading back on this thread I would recommend defaulting to storing the raw file where possible.
Hi @cpshort7186 ,
Did FabianSchut reply solve your problem? If so, please mark it as the correct solution, and point out if the problem persists.
Best Regards,
Adamk Kong
Hi @cpshort7186,
It depends on the JSON file that you are receiving from the API. If it is a flat JSON file that is in tabular format, you should load the JSON directly to the Tables section in the bronze lakehouse. With a table, you have all the advantages of a delta file, such as ACID transactions, time travel and optimized storage. However, when the JSON file has many nested elements, it would not be convenient to store it in a table. You could first copy it to the Files section and do some transformations and JSON expanding afterwards (for example in a notebook).
Check out the October 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
9 | |
8 | |
7 | |
4 | |
2 |
User | Count |
---|---|
20 | |
16 | |
16 | |
6 | |
6 |