To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
What's the most-recommended approach for tracking data that changes? Naturally I've searched, but the many results I find all make the assumption that the data to be plotted is all loaded every time the data is refreshed, which is not the case for my scenario.
Here's an example. Say I receive a CSV file that contains the following:
ID | Height/m | Mass/kg |
1 | 1.42 | 52 |
2 | 1.37 | 51 |
I get an updated file every week. The second week, it contains this:
ID | height / m | Mass / kg |
1 | 1.42 | 51 |
2 | 1.38 | 51 |
Suppose I want to plot the mean mass of all the entries in the table week by week. The data that would be plotted might look like the following - but I don't actually have this data stored anywhere; it needs to be computed:
Week | Mean Mass / kg |
1 | 51.5 |
2 | 51 |
How do people generally go about this?
The actual scale is that the scenario will run for maybe 3-5 years, which means the refresh needs to be easy to execute and involve a minimum of human intervention. My actual files have up to 1m rows. In each file I need to import a dozen or so of the available columns. So, that's up to 12m data points. Once the means etc. are computed, I don't need to store the raw historic data; I just need to keep the results of the computations for the purposes of producing the figure, which is a trival amount of data.
I also have data security complications and ideally should keep the raw data local.
Before arriving at Power BI, I would have arranged for the incoming files to be named with the week number. I'd have knocked out a script to read the latest CSV, compute the needed values, and write the results to long-term storage. Then I'd read all the historic results and spit out an updated figure as a JPG or whatever. That would all run on autopilot. I'd hoped that Power BI would do something similar, allowing a user to see the figure in a dashboard with nothing complicated each week to make it happen. But I'm really struggling. I seem to have three choices:
(a) each week's file overwrites the previous week's file when refreshed, which means the mean can't be computed and thus the figure can't be plotted,
(b) each file is loaded into a separate table, but that involves updating all the processing every week and it's quickly going to become unwieldy, or
(c) I can write a script to do the calculations outside of Power BI and just read the results for graphing purposes, but that's effort I was hoping to avoid. I haven't done any scripting of this type for a couple of years now, it complicates the process of refreshing the data each week, and I need to hand over a functioning system that doesn't require a lot of user intervention or create maintenance complications.
Any pointers?
Solved! Go to Solution.
Power BI has no memory. You need to do all your snapshotting before. Having separate CSV files is not such a bad idea - they can be ingested quickly. Recommend putting them on OneDrive and then accessing them via SharePoint.Folder()
I would recommend against using week numbers in the file names. Use proper ISO8601 style date stamps instead.
Power BI has no memory. You need to do all your snapshotting before. Having separate CSV files is not such a bad idea - they can be ingested quickly. Recommend putting them on OneDrive and then accessing them via SharePoint.Folder()
I would recommend against using week numbers in the file names. Use proper ISO8601 style date stamps instead.