Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
Hi everyone,
I'm a new user of M.Fabric, and also studying and applying Kimball's 4 step for dimensional modelling.
I have a single csv source file wich describes food inspection across Chicago form 2010 up to now, with different columns. I have organized my data source into 5 dimensions, then created a Lakehouse where I loaded this single dataset, and now I'm working on transformations required in each of them by making use of Data Flows in a Staging area warehouse pipeline created for this specif purpose, which is one of the requeriments of the project.
For each of the dimensions I have quite considerable transformations to carry out, and intend to do this in different data flows specifically created and to be applied over the dimensions mentioned.
How could be the best pratice you could advise me?
1. Should I have to apply a transformation, for instance, the inspection date dimension through a Data Flow, then stop and clone it and applying the next transformation for the location dimension, then stop it and repeating this process for the rest of dimensions defined, just by simply cloning the previous Data Flow created?
2. Is it possible to, first, create a Data Flow for just splitting the data into different tables in my lakehouse and then applying the overall process of Data flows without the need of cloning the previous data flow?
OBS: For this project I cannot split the data manually just by separating into different csv files, then load it in the Lakehouse. It is supposed to do all the transformations required inside Fabric.
Thank you in advance and let me know if you have questions.
Solved! Go to Solution.
Hi @filipenovaims20 ,
Thanks for using Fabric Community.
1. Cloning Data Flows:
While cloning existing Data Flows can be convenient, it can lead to code duplication and maintenance challenges. Instead, I recommend these options:
2. Data Splitting:
Yes, splitting data into separate tables before individual Data Flow transformations is a good practice. This avoids unnecessary processing on irrelevant data for each dimension.
Additional Tips:
Remember: There is no one-size-fits-all solution, and the best approach will depend on your specific data, transformations, and preferences.
I hope this helps! Feel free to ask any further questions you may have.
Hi @Anonymous thank you for your support.
I ended up splitting the data source into several files based on my dimension criteria defined, did some pre-cleaning for duplicates and null values, and it worked when doing the Data Flows and loading to warehouses.
Didn't had the time for the partioning solution you mentioned, But I found it interesting and recommend other to have a try. I will certainly follow it in the near future. If I have doubts I'll let you know.
Thank you for your incredible tips and consider my question as finished!!
Hi @filipenovaims20 ,
Glad to know that query got resolved, Please continue using Fabric Community for your further queries.
Hi @filipenovaims20 ,
Thanks for using Fabric Community.
1. Cloning Data Flows:
While cloning existing Data Flows can be convenient, it can lead to code duplication and maintenance challenges. Instead, I recommend these options:
2. Data Splitting:
Yes, splitting data into separate tables before individual Data Flow transformations is a good practice. This avoids unnecessary processing on irrelevant data for each dimension.
Additional Tips:
Remember: There is no one-size-fits-all solution, and the best approach will depend on your specific data, transformations, and preferences.
I hope this helps! Feel free to ask any further questions you may have.
@Anonymous regarding 2. Pre-split Data flow, when I am in the Staging Area Warehouse, create a Data Flow to split, for instance, the Date Dimension by just deleting all the columns that aren't part of it and keep those that are part of, then doing the same for the rest of Dimensions, and at the of it adding a wait activity, then start the process of applying transformation on each dimension. Am I mistaking?
Below a pic of the process describred:
It may change based on your data and requirment, if the approach you are following works for you.
Please go ahead with your logic.
Hi @Anonymous many thanks for the help.
I tried to apply the "Lakehouse Partioning" solution and find a very interesting article about this here: https://techcommunity.microsoft.com/t5/fasttrack-for-azure/efficient-data-partitioning-with-microsoft-fabric-best-practices/ba-p/3890097
Unfortunatley I stopped in the part 4, where, I got this errors:
Could you help me identify where I did wrong please?
Hi @filipenovaims20 ,
It looks like csv file is not located in that specified path. Could you please verify your path and try again?
Hi @filipenovaims20 ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond back with the more details and we will try to help .
Hi @filipenovaims20 ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond back with the more details and we will try to help .
User | Count |
---|---|
6 | |
6 | |
4 | |
3 | |
3 |
User | Count |
---|---|
14 | |
10 | |
7 | |
7 | |
6 |