Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Anonymous
Not applicable

Custom column to create a new customer flag using group by function.

Hi all,

I am an apprentice to M language, but I am aware of ETL using Power query. I need help to add a custom column using group by in order to flag each transaction if the customer is new or not.

I have a reference date - "1/1/2021". 
The logic I am looking for is to add an IF ELSE statement within the table.group function to achieve a "Y" flag for new customer if the helper column( unique identifier ) has been occured after the reference date, else "N".

Please see my sample below, I have been deriving the flag in excel using formulas.
Open to all suggestion and thanks in advance. Appreciate your help to cross this bridge and be able to build this logic in PQ using excel sample as reference.  

 

https://docs.google.com/spreadsheets/d/1JyjCdR2uUYxWJU2ARYvPvvrNrUNyffea/edit?usp=sharing&ouid=11042... 


5 REPLIES 5
Anonymous
Not applicable

@Alex_T 

I wrote this logic for Sku Count based on your solution for New Customer Flag above. Please see below query:

= Table.Group(Source,"Account-Zip-Sku",{{"Sku Count", each if List.Dates[Invoice Dates] <= List.Dates[Start of Month] then "0" else "1"}})

 

But it throws me this error :
Expression.Error: We cannot apply field access to the type Function.
Details:
Value=[Function]
Key=Invoice Dates

Notes: I referenced the orginal source and created a reference query to test and derive Sku Count using the above logic.
If I remove the List. Dates function then it throws this error:

Expression.Error: We cannot apply operator <= to types List and List.
Details:
Operator=<=
Left=[List]
Right=[List]


Could you please help me write the correct query?

 

Thank you!

Hiya @Anonymous ,

Table.Group requires aggregations. [Invoice Dates] is referrring to the column as a column in the first error and as a list in the second error, in both cases it is trying to make sense of the expression, which unfortunately doesn't follow the right syntax

List.Dates(start as date, count as number, step as duration) as list

List.Dates creates a sequential list of dates, at best using [Invoice Dates], if the syntax was right, would offer a start date but this would fail straight away as it's a list and not a scalar value, your syntax is List.Dates[Invoice Dates], not List.Dates(<parameters>)

TBH, I have never used Table.Groups before but it appeared to produce the right result for you for New Customer, in this case I think you want a count and Table.Groups may again be the right approach (in which case I think you may want to use List.Min again with the dates, but I need to be sure I understand what you are trying to achieve). In the example PQ_Test, all the Invoice Dates are >= Start of Month so it looks like that is poor test data as it contradicts your <= comparison (or at least will work poorly with your test).

So this is a bit confusing (I also have no previous experience of what SKUs are! I work with hospital data!), can you clarify, is this right...

  • for each Account-Zip-Sku, count the number of Invoice Dates that are the same or earlier than the Start of Month

I'm not sure it is and if not could you re-write it and clarify what you need and we can have another go 🙂 

 

Cheers

 

PS this is perhaps closer to what you appeared to be trying to do so may be what you want...

 

Table.Group(Source,"Unique ID",{ "SKU_count", each 
if List.Min([Invoice Date]) <= List.Min([Start of month]) then 1 else 0})

 

Also, these are good references for learning PowerQuery...

Power Query M function reference - PowerQuery M | Microsoft Docs

Power Query M Primer (part 1): Introduction, Simple Expressions & let | Ben Gribaudo

 

 

 

 

 

Anonymous
Not applicable

Hi @Alex_T 

 

I appreciate you taking time to help.

I am attaching a new sample data close to my orginal file (due to confidentiality can't share the OG file).I build pivot table based reports using csv and excel files for CPG industry, SKU is item in terms of supply chain.

Here's what I am trying to achieve. I get data sources from different databases and I am trying to perform all the ETL using power query and automate the refresh as each week fresh transactions are posted.
I have been in a fix to derive these two columns:

  1. Sku Count - based on the unique ID (Account-Zip-Sku) I need to count the number of times this combination is repeated in my data set keeping the logic that if the combination has repeated more than once when the invoice date is less than the start of the month. If true ,0 else 1.
    Why its designed this way is becuase :
  2. New Account :client wants to filter the data based on month number and see how many new accounts are derived each month with reference date in connection.
    If the Sku count is >0 then "Y" else"N" --> Logic set for New Account.

The samples I am able to share here doesn't contain previous two years data but my original file does. that's why we are considering, reference date.
We are using the logic Invoice date<= Start of the month  - becuase the client needs to set filters at month level.

 

I hope there is an agile way I can derive these in the same table as two new columns?
Can you please look into this and let me know?

Note: I tried the logic

Table.Group(Source,"Unique ID",{ "SKU_count", each 
if List.Min([Invoice Date]) <= List.Min([Start of month]) then 1 else 0})


But I got this error

Nerd_V_91_0-1645119364869.png

 


Cheers,
V

Alex_T
Frequent Visitor

Hi @Anonymous ,

 

Hopefully this will help...

 

New_customer_table

let

_reference_date = #date(2021,01,01),

Source = PQ_Test,

Custom1 = Table.Group(Source,"Unique ID",{

{"New Customer", each if List.Min([Sku Count]) = 0 and List.Min([Invoice Date]) >= _reference_date then "Yes" else "No"}

})

in

Custom1

 

Using Table.Groups you will obviously be creating a separate reference table of customer status. You can create a new column with a join.

 

New_PQ_Test_table

let

Source = Table.NestedJoin(PQ_Test, {"Unique ID"}, New_customer_table,

{"Unique ID"}, "Join_table", JoinKind.LeftOuter),

Expanded_Join_table = Table.ExpandTableColumn(Source, "Join_table",

{"New Customer"}, {"Join_table.New Customer"})

in

Expanded_Join_table

 

You need a new table for this as joining onto the existing PQ_Test will create a circular reference, to avoid seeing both original and new tables in the report, deselect "Enable load"

Alex_T_0-1645003036980.png

 

Let me know if this works for you.

 

Best wishes, Alex

 

 

Anonymous
Not applicable

Hi @Alex_T 

Thank you, I appreciate your help.
I guess I am one step closer to achieving what I need to.


I forgot to mention a prerequisite step in my orginal post. Kindly help me on this.
In order to derive the New Customer flag, I need to first derive the sku count in PQ as a new custom coulmn.
The Sku count is dependent on the Unique ID that I have already derived.

Request/Questions

1) Can you please guide me - how to derive the Sku count in PQ? logic for SKu count is in the sample file.
2)The table in the sample file is already loaded into PQ in my computer for the weekly tasks. Must I still include the line "Source = PQ_Test," in the query editor?


Thank you!

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors