Solved: ¿Imposible?

Syndicate_Admin · ‎03-23-2022

Hola a todos

Tengo una base de datos con millones de filas y me han hecho una consulta específica al respecto. Estaba a punto de regresar y decir que no creo que sea posible, pero pensé que lo tiraría aquí.

Aquí hay un fragmento de lo que tengo

ID de cliente Tipo de tienda Rango de tiendas (calculado en SQL)

12345 En línea 4

12345 En la tienda 5

45678 Tienda 8

98765 En tienda 10

98765 en línea 11

09090 En línea 2

11000 En la tienda 3

11000 En la tienda 4

11000 En la tienda 5

11000 En línea 6

Lo que se me pregunta es si es posible identificar cuántas identificaciones de clientes realizaron una transacción en la tienda y luego su transacción más reciente (número de rango más alto) fue en línea.

Entonces, en el ejemplo anterior, la respuesta es 2, las identificaciones 98765 y 11000 tienen una tienda anterior, pero su tienda mejor clasificada estaba en línea.

Sospecho que esto es imposible, o si lo es sería impracticle mirar a través de millones de filas, pero ¿alguien puede ayudar?

Muchas gracias

Syndicate_Admin · ‎03-24-2022

@powerbiuser9999 Claro, aquí está con comentarios:

Measure =
  // Get a distinct single column table of customers that have bought Instore
  // Since we need to filter the table for Instore, we need to use SELECTCOLUMNS
  // to return a single column from the FILTER statement and then apply DISTINCT
  // to only get the unique values. We set this table variable aside until the end
  VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
  // This is the tricky one. We start with a distinct list of customers and then add 
  // two columns that use the iterator function, MAXX to return the highest rank for
  // Instore and Online. The columns are called "Instore" and "Online" and the only
  // difference is the filter for Type of Shop. Each column calculation starts by 
  // getting the "current" value of [Customer ID] for the row and then uses that as
  // part of the filter clause to ensure we are getting the rank for that customer.
  VAR __Table1 =
    ADDCOLUMNS(
      // This is the start of the table.
      DISTINCT('Table'[Customer ID]),
      // Adding Instore column.
      "Instore",
        // This is the calculation for the Instore column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
      // Adding Online column
      "Online",
        // This is the calculation for the Online column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
    )
  // Now we use Table1 to create a new table var that simply filters our __Table1 where
  // the Online rank value is greater than the Instore rank value. We also only want the
  // customer IDs so we use SELECTCOLUMNS again to just grab that column.
  VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
  // We now have two table vars, __Table and __Table2 that each have single column of 
  // unique customer values. __Table lists customer that have bought Instore and __Table2
  // has a list of customers that have an Online rank that is higher than the Instore rank
  // so, since we want customers with an Online rank higher than Instore rank that have
  // also bought Instore, then we want the INTERSECT of the two tables (basically the values
  // that exist in both tables). We then simply use COUNTROWS to count the number of rows
  // in the intersection of the two tables.
  COUNTROWS(INTERSECT(__Table2, __Table))

View solution in original post

Syndicate_Admin · ‎03-23-2022

@powerbiuser9999 Quizás:

Measure =
  VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
  VAR __Table1 =
    ADDCOLUMNS(
      DISTINCT('Table'[Customer ID]),
      "Instore",
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
      "Online",
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
    )
  VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
  COUNTROWS(INTERSECT(__Table2, __Table))

Syndicate_Admin · ‎03-23-2022

@Greg_Deckler

Si pudiera dar más de 1 Kudos lo haría..

Te daré todo el crédito de mi trabajo, que DAX es increíble.

¿Te importaría darme un paseo rápido por cómo has construido eso, por favor?

Syndicate_Admin · ‎03-24-2022

@powerbiuser9999 Claro, aquí está con comentarios:

Measure =
  // Get a distinct single column table of customers that have bought Instore
  // Since we need to filter the table for Instore, we need to use SELECTCOLUMNS
  // to return a single column from the FILTER statement and then apply DISTINCT
  // to only get the unique values. We set this table variable aside until the end
  VAR __Table = DISTINCT(SELECTCOLUMNS(FILTER('Table',[Type of Shop] = "Instore"),"Customer ID",[Customer ID]))
  // This is the tricky one. We start with a distinct list of customers and then add 
  // two columns that use the iterator function, MAXX to return the highest rank for
  // Instore and Online. The columns are called "Instore" and "Online" and the only
  // difference is the filter for Type of Shop. Each column calculation starts by 
  // getting the "current" value of [Customer ID] for the row and then uses that as
  // part of the filter clause to ensure we are getting the rank for that customer.
  VAR __Table1 =
    ADDCOLUMNS(
      // This is the start of the table.
      DISTINCT('Table'[Customer ID]),
      // Adding Instore column.
      "Instore",
        // This is the calculation for the Instore column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Instore"),[Rank of shops]),
      // Adding Online column
      "Online",
        // This is the calculation for the Online column, nested VAR and RETURN
          VAR __Customer = [Customer ID]
        RETURN 
          MAXX(FILTER('Table',[Customer ID] = __Customer && [Type of Shop] = "Online"),[Rank of shops])
    )
  // Now we use Table1 to create a new table var that simply filters our __Table1 where
  // the Online rank value is greater than the Instore rank value. We also only want the
  // customer IDs so we use SELECTCOLUMNS again to just grab that column.
  VAR __Table2 = SELECTCOLUMNS(FILTER(__Table1,[Online] > [Instore]),"Customer ID",[Customer ID])
RETURN
  // We now have two table vars, __Table and __Table2 that each have single column of 
  // unique customer values. __Table lists customer that have bought Instore and __Table2
  // has a list of customers that have an Online rank that is higher than the Instore rank
  // so, since we want customers with an Online rank higher than Instore rank that have
  // also bought Instore, then we want the INTERSECT of the two tables (basically the values
  // that exist in both tables). We then simply use COUNTROWS to count the number of rows
  // in the intersection of the two tables.
  COUNTROWS(INTERSECT(__Table2, __Table))

Syndicate_Admin · ‎03-25-2022

Muchas gracias @Greg_Deckler

¿Imposible?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024