Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi everyone
I have a database with millions of rows, and I've been asked a specific query about. I was about to go back and say that I dont think its possible, but thought I'd throw it out here.
Here is a snippet of what I have
Customer ID Type of Shop Rank of shops (calculated in SQL)
12345 Online 4
12345 Instore 5
45678 Instore 8
98765 Instore 10
98765 Online 11
09090 Online 2
11000 Instore 3
11000 Instore 4
11000 Instore 5
11000 Online 6
What I am being asked for is if it is possible to identify how many customer id's made a transaction instore, and then their most recent transaction (highest rank number) was online?
So in the example above the answer is 2, ids 98765 and 11000 both have an earlier instore but their highest ranked shop was online.
I suspect this in impossible, or if it is would be impracticle looking through millions of rows, but can anyone assist?
Many thanks
Solved! Go to Solution.
@powerbiuser9999 Sure, here it is with comments:
Measure =
// Get a distinct single column table of customers that have bought Instore
// Since we need to filter the table for Instore, we need to use SELECTCOLUMNS
// to return a single column from the FILTER statement and then apply DISTINCT
// to only get the unique values. We set this table variable aside until the end
VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
// This is the tricky one. We start with a distinct list of customers and then add
// two columns that use the iterator function, MAXX to return the highest rank for
// Instore and Online. The columns are called "Instore" and "Online" and the only
// difference is the filter for Type of Shop. Each column calculation starts by
// getting the "current" value of [Customer ID] for the row and then uses that as
// part of the filter clause to ensure we are getting the rank for that customer.
VAR __Table1 =
ADDCOLUMNS(
// This is the start of the table.
DISTINCT('Table'[Customer ID]),
// Adding Instore column.
"Instore",
// This is the calculation for the Instore column, nested VAR and RETURN
VAR __Customer = [Customer ID]
RETURN
MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
// Adding Online column
"Online",
// This is the calculation for the Online column, nested VAR and RETURN
VAR __Customer = [Customer ID]
RETURN
MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
)
// Now we use Table1 to create a new table var that simply filters our __Table1 where
// the Online rank value is greater than the Instore rank value. We also only want the
// customer IDs so we use SELECTCOLUMNS again to just grab that column.
VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
// We now have two table vars, __Table and __Table2 that each have single column of
// unique customer values. __Table lists customer that have bought Instore and __Table2
// has a list of customers that have an Online rank that is higher than the Instore rank
// so, since we want customers with an Online rank higher than Instore rank that have
// also bought Instore, then we want the INTERSECT of the two tables (basically the values
// that exist in both tables). We then simply use COUNTROWS to count the number of rows
// in the intersection of the two tables.
COUNTROWS(INTERSECT(__Table2, __Table))
@powerbiuser9999 Maybe:
Measure =
VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
VAR __Table1 =
ADDCOLUMNS(
DISTINCT('Table'[Customer ID]),
"Instore",
VAR __Customer = [Customer ID]
RETURN
MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
"Online",
VAR __Customer = [Customer ID]
RETURN
MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
)
VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
COUNTROWS(INTERSECT(__Table2, __Table))
@Greg_Deckler
If I could give more than 1 Kudos I would..
Will give you full credit it my work, that DAX is incredible.
Would you mind giving me a quick walk through how you've constructed that please?
@powerbiuser9999 Sure, here it is with comments:
Measure =
// Get a distinct single column table of customers that have bought Instore
// Since we need to filter the table for Instore, we need to use SELECTCOLUMNS
// to return a single column from the FILTER statement and then apply DISTINCT
// to only get the unique values. We set this table variable aside until the end
VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
// This is the tricky one. We start with a distinct list of customers and then add
// two columns that use the iterator function, MAXX to return the highest rank for
// Instore and Online. The columns are called "Instore" and "Online" and the only
// difference is the filter for Type of Shop. Each column calculation starts by
// getting the "current" value of [Customer ID] for the row and then uses that as
// part of the filter clause to ensure we are getting the rank for that customer.
VAR __Table1 =
ADDCOLUMNS(
// This is the start of the table.
DISTINCT('Table'[Customer ID]),
// Adding Instore column.
"Instore",
// This is the calculation for the Instore column, nested VAR and RETURN
VAR __Customer = [Customer ID]
RETURN
MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
// Adding Online column
"Online",
// This is the calculation for the Online column, nested VAR and RETURN
VAR __Customer = [Customer ID]
RETURN
MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
)
// Now we use Table1 to create a new table var that simply filters our __Table1 where
// the Online rank value is greater than the Instore rank value. We also only want the
// customer IDs so we use SELECTCOLUMNS again to just grab that column.
VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
// We now have two table vars, __Table and __Table2 that each have single column of
// unique customer values. __Table lists customer that have bought Instore and __Table2
// has a list of customers that have an Online rank that is higher than the Instore rank
// so, since we want customers with an Online rank higher than Instore rank that have
// also bought Instore, then we want the INTERSECT of the two tables (basically the values
// that exist in both tables). We then simply use COUNTROWS to count the number of rows
// in the intersection of the two tables.
COUNTROWS(INTERSECT(__Table2, __Table))
User | Count |
---|---|
97 | |
87 | |
77 | |
67 | |
63 |
User | Count |
---|---|
112 | |
96 | |
95 | |
67 | |
65 |