To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hello,
I have a time series dataset representing product stock in a store. The columns are:
However, for most combinations of dates and products, there is no change in stock.
So instead of having a row in the data set for every combination of date and product,
there is only a row present if there has been a change in the stock level.
As an example, a dataset of this structure for dates 2022-03-01 to 2022-03-12 might look like:
date | product_id | in_stock |
2022-03-01 | A | 5 |
2022-03-10 | A | 10 |
2022-03-05 | B | 2 |
2022-03-12 | B | 8 |
I would like to:
The problem is that if I for instance filter on product A, the standard visualization interpolates linearily between a stock of 5 products on 2022-03-01 and a stock of 10 products on 2022-03-10. This is not what I want. The stock was 5 until and including 2022-03-09, before it rose to 10 on 2022-03-10.
For the aggregation case, there is also an issue: if I try to aggregate over a set of products to compute the sum, it will only sum when a row is present (when there was a change in stock). In the example above the sum of the date 2022-03-05 will be 2, but in reality the sum is 2 + 5 = 7. For the date 2022-03-11 the sum will be 0, but in reality it is 2 + 10 = 12.
What I have a tried
Help very much appreciated. If you have any questions please ask and I will explain more details.
Hi @Anonymous ,
According to my understanding, you want to get the total In_stock of each date, the date should be continuous.
If so, you need to create a new table with all dates between the min date and the max date from the original table:
New Table =
var _1=CALENDAR(MIN('Table'[date]),MAX('Table'[date]))
var _2=VALUES('Table'[product_id])
return CROSSJOIN(_1,_2)
Then to find the corrcet stock of each day:
In_Stock =
var _date= CALCULATE(LASTNONBLANK('Table'[date],TRUE()), FILTER('Table',[product_id]=EARLIER('New Table'[product_id]) && [Date]<=EARLIER('New Table'[Date])))
return LOOKUPVALUE('Table'[in_stock],[product_id],[product_id],[date],_date)
Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you so much for your time and effort. Unfortunately, your solution is essentially one of the things that I've already tried (see the "What I've tried" section in the post). It does not solve the problem.
Your solution does the right thing, but the computational complexity of explicitly forming the cross join makes it infeasible. In reality the data set is very spare - most products do not change on most dates. The sparsity factor is about 1000x, so computing the cross join creates a data set with 1000x as many rows as are in the original data set. This is infeasible.
The total stock can be found in O(number_of_rows) time. It's enough to see each row once and keep track of one counter for each product ID, in a language such as Python. But the computational complexity of the cross join is O(number_of_dates * number_of_products), which is 1000x larger than O(number_of_rows) in my actual data set. I wrote 100x in the original post, but 1000x is more accurate. I don't know of any SQL-like way to perform the computation without a cross join though.
Does anyone have any input on this? Any help would be greatly appreciated.
User | Count |
---|---|
77 | |
67 | |
65 | |
50 | |
27 |