Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply

Impossible?

Hi everyone

 

I have a database with millions of rows, and I've been asked a specific query about. I was about to go back and say that I dont think its possible, but thought I'd throw it out here.

Here is a snippet of what I have

Customer ID        Type of Shop      Rank of shops (calculated in SQL)

12345                  Online                 4

12345                  Instore                 5

45678                  Instore                8

98765                  Instore                10

98765                  Online                 11

09090                  Online                 2

11000                  Instore                3

11000                  Instore                4

11000                  Instore                5

11000                  Online                6

 

What I am being asked for is if it is possible to identify how many customer id's made a transaction instore, and then their most recent transaction (highest rank number) was online?

 

So in the example above the answer is 2, ids 98765 and 11000 both have an earlier instore but their highest ranked shop was online.

 

I suspect this in impossible, or if it is would be impracticle looking through millions of rows, but can anyone assist?

Many thanks

1 ACCEPTED SOLUTION

@powerbiuser9999 Sure, here it is with comments:

Measure =
  // Get a distinct single column table of customers that have bought Instore
  // Since we need to filter the table for Instore, we need to use SELECTCOLUMNS
  // to return a single column from the FILTER statement and then apply DISTINCT
  // to only get the unique values. We set this table variable aside until the end
  VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
  // This is the tricky one. We start with a distinct list of customers and then add 
  // two columns that use the iterator function, MAXX to return the highest rank for
  // Instore and Online. The columns are called "Instore" and "Online" and the only
  // difference is the filter for Type of Shop. Each column calculation starts by 
  // getting the "current" value of [Customer ID] for the row and then uses that as
  // part of the filter clause to ensure we are getting the rank for that customer.
  VAR __Table1 =
    ADDCOLUMNS(
      // This is the start of the table.
      DISTINCT('Table'[Customer ID]),
      // Adding Instore column.
      "Instore",
        // This is the calculation for the Instore column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
      // Adding Online column
      "Online",
        // This is the calculation for the Online column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
    )
  // Now we use Table1 to create a new table var that simply filters our __Table1 where
  // the Online rank value is greater than the Instore rank value. We also only want the
  // customer IDs so we use SELECTCOLUMNS again to just grab that column.
  VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
  // We now have two table vars, __Table and __Table2 that each have single column of 
  // unique customer values. __Table lists customer that have bought Instore and __Table2
  // has a list of customers that have an Online rank that is higher than the Instore rank
  // so, since we want customers with an Online rank higher than Instore rank that have
  // also bought Instore, then we want the INTERSECT of the two tables (basically the values
  // that exist in both tables). We then simply use COUNTROWS to count the number of rows
  // in the intersection of the two tables.
  COUNTROWS(INTERSECT(__Table2, __Table))

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

4 REPLIES 4
Greg_Deckler
Super User
Super User

@powerbiuser9999 Maybe:

Measure =
  VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
  VAR __Table1 =
    ADDCOLUMNS(
      DISTINCT('Table'[Customer ID]),
      "Instore",
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
      "Online",
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
    )
  VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
  COUNTROWS(INTERSECT(__Table2, __Table))

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler 

If I could give more than 1 Kudos I would..

 

Will give you full credit it my work, that DAX is incredible.

 

Would you mind giving me a quick walk through how you've constructed that please?

@powerbiuser9999 Sure, here it is with comments:

Measure =
  // Get a distinct single column table of customers that have bought Instore
  // Since we need to filter the table for Instore, we need to use SELECTCOLUMNS
  // to return a single column from the FILTER statement and then apply DISTINCT
  // to only get the unique values. We set this table variable aside until the end
  VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
  // This is the tricky one. We start with a distinct list of customers and then add 
  // two columns that use the iterator function, MAXX to return the highest rank for
  // Instore and Online. The columns are called "Instore" and "Online" and the only
  // difference is the filter for Type of Shop. Each column calculation starts by 
  // getting the "current" value of [Customer ID] for the row and then uses that as
  // part of the filter clause to ensure we are getting the rank for that customer.
  VAR __Table1 =
    ADDCOLUMNS(
      // This is the start of the table.
      DISTINCT('Table'[Customer ID]),
      // Adding Instore column.
      "Instore",
        // This is the calculation for the Instore column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
      // Adding Online column
      "Online",
        // This is the calculation for the Online column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
    )
  // Now we use Table1 to create a new table var that simply filters our __Table1 where
  // the Online rank value is greater than the Instore rank value. We also only want the
  // customer IDs so we use SELECTCOLUMNS again to just grab that column.
  VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
  // We now have two table vars, __Table and __Table2 that each have single column of 
  // unique customer values. __Table lists customer that have bought Instore and __Table2
  // has a list of customers that have an Online rank that is higher than the Instore rank
  // so, since we want customers with an Online rank higher than Instore rank that have
  // also bought Instore, then we want the INTERSECT of the two tables (basically the values
  // that exist in both tables). We then simply use COUNTROWS to count the number of rows
  // in the intersection of the two tables.
  COUNTROWS(INTERSECT(__Table2, __Table))

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Thank you enormously @Greg_Deckler 

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.