Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Ok, so I have c. 750k rows of sales data (and growing). It's way too much to analyse in Excel using pivots and I've found PBI is the PERFECT solution!
However I still have never managed to figure out how to efficiently calculate the 'holy grail' measure of NET sales. We have a HIGH cancellation rate in our business (often with a long lead time from sale to cancellation - don't ask!), so our ability to measure NET sales and canx rates is fundamental.
Illustration:
Dataset
Prod # Sales Yr Canx Yr Status 16Sales 16Canx
1 2014 2016 C - (1)
2 2015 2016 C - (1)
3 2016 L 1
4 2016 2016 C 1 (1)
5 2016 L 1
6 2016 L 1
etc.... (to #10 for illustration) --- ---
8 (3)
Output
2016 Net Sales (Gross less Canx)
Gross 8
Canx (3) [note: 2 are from a prior period!]
---
Net 5
As illustrated, I need to report +1 for all sales with a sales date in the period, then deduct -1 for every product which cancels in the same period (even is sold in a prior period). Counting teh NET sales figure itself is relatively easy.
The difficult part is reporting BY PERIOD the GROSS and the CANX sales (especially when you get down to reporting daily and weekly sales figures). I can't figure how to align the cancellations with the sales; I need a way to efficiently count a SALE and a CANX (count = 2) for those products which have cancelled, but report the sale aligned to the sales date and the cancellation to the canx date....
I could simply create a new dataset for just the cancelled products, but since the dataset is so large it's a painful extra step, plus I don't want to open the dataset in Excel to create it as this crashes my PC. (I suppose I could analyse it in PBI , export it, then re-import it as a new table... but that seems very roundabout!).
In time, I can get IT to produce a 2nd dataset of only the cancellations and their canx dates, but it seems very inefficient to duplicate rows.
Is there a smarter way!???
Many thanks, DP
Solved! Go to Solution.
Import the first 3 columns into the data model
create a lookup table that contains all possible years for your data (sales and Canx) and add to the datamodel
join the data[sales year] column to the lookup table year
join the canx year from the data table to the lookup table year (inactive relationship)
add a table to the canvas, and put year from the lookup table on the values
write the the following measures.
Gross sales = countrows(dataTable)
total cancelations = calculate(countrows(dataTable),userelationship(lookupTable[year],dataTable[year]))
net sales = [Gross Sales] - [Total Cancellations]
add the measures you want to the table.
this doesn't look too hard, but can you please post the table structure you are using in your model (I assume it is. It what you have posted already.
Cheers, Matt, appreciate the offer of assistance. I'm primarily a finance person with LEAN skills and a degree in using Excel... (joke)! I primarily work in continuous improvement, but dabble with data analysis.
Not sure what you mean by table structure...?
Essentially, I have 750k rows with multiple fields. The key fields are as noted in my illustration, but I want to be able to cut and slice the output by other fields (such as type and sales channel), so:
Product #
Sales Date
Cancellation Date
Status (L or C)
Sales Channel
Prod Type
Brand
etc...
The main challenge is reporting a sale, which has subsequently cancelled (probably in a different period, whether that be week, month or even year), consistently across the outputs, as net = 0, gross = 1, canx = -1.
However, if I am reporting 2016, then if the sale is in 2015, but cancels in 2016, I would report -1 (as the sale belongs to 2015 and is already reported there).
However, if sale was in 2016 and cancelled in 2016, my report output for 2016 would be zero.
At a lower level, we report weekly metrics and I would want to be able to track 'net sales' by week. The problem is that the sales in 'week x' are those with a 'sales date' in week 1, however the canx in 'week x' are those with a cancellation date in 'week x' (which is unlikely to be the same week as the sale...!).
In term sof the data table, the product ('sale') is only recorded in the dataset ONCE, but the status (L= Live, C = Cancelled) determines what to count in the total and the 'sales' or 'cancellation' date the period in which it needs counted.
L = 1 (according to sales date)
C = 1 and -1 (assigned to different dates)
Thanks for any help!
By table structure I mean what are the columns you have. If they are like you originally posted, how did they end up looking like that? I would not normally expect such a structure unless some other system or tool had restructured the data. For example, you have a 16Sales and 16Canx column. Does hat mean you have 2 columns for 15, 2 for 14, 2 for 13 etc? Also where has the date information gone? Presumably these sales happened on a specific date, and the detail is stored somewhere.
I realise I don't understand the data, but it seems to me that the last 3 columns are superfluous. I can work out the last 3 from the first 3. Is that correct? I would be seeking a table that has product number, sale date (not year) and cancellation date. If the cancellation date doesn't exist, then it hasn't been cancelled.
Would that work?
Correct!
The last two columns were informational only and do not exist in the actual dataset.
The Status column does exist in the datset, but you are right that (from a data perspective) the C status can be derived where 'cancel date' <> BLANK.
However, the way the (in-house legacy) system works, is that the status is first set to 'cancelled' then the user requires to enter a cancellation date.
I listed the key 'table structure' fields in my last post.
Cheers.
Import the first 3 columns into the data model
create a lookup table that contains all possible years for your data (sales and Canx) and add to the datamodel
join the data[sales year] column to the lookup table year
join the canx year from the data table to the lookup table year (inactive relationship)
add a table to the canvas, and put year from the lookup table on the values
write the the following measures.
Gross sales = countrows(dataTable)
total cancelations = calculate(countrows(dataTable),userelationship(lookupTable[year],dataTable[year]))
net sales = [Gross Sales] - [Total Cancellations]
add the measures you want to the table.
Genius, Matt! Thanks a lot, mate. I owe you a beer.
My datasets were a bit more complicated than illustrated, but I worked out how to apply your formulae to link specific dates and report out 'net sales' easily for the very first time by various slices, including product type, sales channel, 'week #' and accounting periods.
An exercise that would have required HOURS of effort beforehand, using Excel and adding gross and cancels toegther manually...
One very happy Finance team!
Cheers.
Glad you worked it out from here. Once you get started, there is no stopping 🙂
User | Count |
---|---|
119 | |
78 | |
58 | |
52 | |
46 |
User | Count |
---|---|
170 | |
117 | |
63 | |
58 | |
51 |