The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello,
I am looking for some guideance on creating some code to load into my Power BI Data Set. I am trying to write some code simular to how I used it in Qlik. Basically I have two data folders. The first is my base data (old/historical) and this data set never changes. The second is my new data set and this gets updated weekly with new records and changed records.
How I want it to work is to load the old CSV data file and load in the CSV new. The new file will load in new/changed records and replace that with the data being loaded in from the old. This is based on the Gift system ID field, so anytime a Gift system ID has new/changed records in the new data file, it should delete all these rows based on Gift system ID in the old and replace them with the new data. This process is needed to avoid duplications.
In my example below I have added two data sources for example. The first being the old and the second being the new. As you can see there is changed data for the new data file (*** indicating change). I need the old data to be deleted and the new data to populate my data set.
I have also attached the code I am using in Qlik. If there is an easier way to do this please let me know. I am open to any options.
https://drive.google.com/drive/folders/13TJBe80sM88x1FmMhZURbN9Ivmn47diR?usp=sharing
Old | |||||
Gift System ID | Appeal ID | Campain ID | Fund ID | Package ID | Amount |
1 | 1 | 1 | 1 | 1 | 200 |
1 | 2 | 1 | 1 | 1 | 250 |
1 | 3 | 1 | 1 | 1 | 100 |
1 | 4 | 1 | 1 | 1 | 100 |
1 | 5 | 1 | 1 | 1 | 300 |
2 | 1 | 1 | 1 | 1 | 50 |
2 | 2 | 1 | 1 | 1 | 20 |
2 | 1 | 1 | 1 | 1 | 100 |
New | |||||
Gift System ID | Appeal ID | Campain ID | Fund ID | Package ID | Amount |
1 | 1 | 1 | 1 | 1 | 200 |
1 | 2 | 1 | 1 | 1 | 250 |
1 | 3 | 1 | ***2 | 1 | 100 |
1 | 4 | 1 | 1 | 1 | 100 |
1 | 5 | 1 | 1 | 1 | ***10000 |
Solved! Go to Solution.
You can deduplicate in Power Query, just make sure to wrap your sorted table that puts the most recent results on top within a Table.Buffer, as explained here:
https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
If you want to partition historical vs. new then you'll want to create two dataflows, it's more efficient than trying to do it all within the dataset. Just to the append/dedupe logic in the dataset (or you could do it in a third dataflow if you don't mind paying for Power BI Premium per User).
Just create two dataflows by folder (historical and new), you don't need to create one for each file (that wouldn't scale at all!). Yes, once you load them in the dataset you can add filtering there if needed.
No, you can create dataflows with Power BI Pro too, with somewhat limited capabilities:
Load your data via two dataflows, one for historical data and another for fresh data. Only set the fresh one on a refresh schedule. Then load and append these dataflows into your dataset.
Okay so best advice I've gotten so far to carry out what I'm trying to do. Last thing though... so I don't apply a refresh on old data flow and then apply refresh policy for new! Then combine and append. But how would this solve my duplicate issue? I want to delete old and add new/changed from new file.
You can deduplicate in Power Query, just make sure to wrap your sorted table that puts the most recent results on top within a Table.Buffer, as explained here:
https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
Do I need to create a data flow? Or can I just create a data set?
If you want to partition historical vs. new then you'll want to create two dataflows, it's more efficient than trying to do it all within the dataset. Just to the append/dedupe logic in the dataset (or you could do it in a third dataflow if you don't mind paying for Power BI Premium per User).
With my shared folders I have two seperate shared folders, one new and one historical. Within each folders I have about 10 data sets that have a one to one relationship between folders. Each CSV files are competly different but I can create a relationship when I create tables in my data set. I am just wondering do I need to create a data flow for each one of these CSV files within the folder? Or can I simply create two data flows, all files in historical and another for all files in new. Then when I bring my data flow into Power BI desktop to create my reports I can select what files I want from the data flows?
Just create two dataflows by folder (historical and new), you don't need to create one for each file (that wouldn't scale at all!). Yes, once you load them in the dataset you can add filtering there if needed.
In the data flow it is showing my 10 folders as one table. I know in my dataset in power BI I can filter and create querys and then combine from there but is there a way where I can left join everything in my data flow?
Thank you! Last question lol. Once I create the two day flows what would be the best option. Create multiple tables? Or is there a way I can just create 1 table and left merge everything based on one main table? Is that even a capability? If so please share a link or something that I can use.
Could you share high level over view of how this would work for my data set up above? Really appreciate as I am use to basic level datasets and not to this level.
Do I need premium license to create data flows?
User | Count |
---|---|
43 | |
16 | |
12 | |
11 | |
9 |
User | Count |
---|---|
50 | |
31 | |
22 | |
17 | |
15 |