Custom Index Column

HassanMoosa · ‎09-07-2023

I'm currently working on a Power BI project with a dataset called "CRM Data Set." In this dataset, I have columns for "email," "mobile," and "course." I'm facing a specific challenge where I need to create a custom index column based on the following criteria:

Main Objectives:

Identify Repeated Inquiries: I want to identify repeated inquiries, which can be determined by comparing the "mobile" and "email" columns. If either the mobile number or email address matches with any other entry in the dataset, I need to mark it as a repeated entry.
Course Inquiry Identification: I also need to track which course each inquiry is about. This is crucial for understanding which programs each individual inquired about.
Create a Unique Identifier: Since my dataset lacks a primary or foreign key column, I need to generate a unique identifier for each record. This identifier should allow me to reference records in other tables. It's important to note that different types of students may appear in the dataset, and their phone numbers or email addresses might duplicate in other inquiries. Additionally, I can't solely rely on the date, email, or mobile for differentiation, as data import can result in identical date and time values.

Attached screenshot is of my sample columns for your clarity I want to make unique indentifier which should represent (Unique record Identifier, Track individual record e.g. I am not clear but maybe combination of created on (date), email, mobile, course)

Sample Data Columns

Specific Requirements for the Index Column:

If the "email" or "mobile" and course value matches any other record in the entire dataset, the index should be set to 0 for that row.
If the "email" or "mobile" and course values are unique among all other rows, the index should be incremented by 1 for each unique combination of "email" and "mobile."

Thanks in advance

jennratten · ‎09-07-2023

Hello - here you are.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rdDPC4IwFAfw/8Wzpc0ij1HRDyGQspN4GDbcaPOJC8L/vpFzNLND1Pew994OH3gvTZ25Nwk85KPAcZ0IaKnLohCY8XEOQn1MTNQQ00ayXKpuA5zDfXSunMz9IxR64VfOjskb1I3qjkRWUEqiGeR3zgnXmHbVkpCJGvbJAII65MCuRBeLCEzUsGTAoWgGlnp3cooJ/4mKmGhfS5maWFfuL9YhMRbtayEzEzWsKBHqys8jr2uoKnLpKwmUjS6fnS0BQVrmZafsAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Created = _t, Lead = _t, Email = _t, Mobile = _t, Course = _t, Stage = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Created", type date}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Course", Order.Ascending}, {"Created", Order.Ascending}, {"Email", Order.Ascending}, {"Mobile", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Course", "Email", "Mobile"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"Data", each Table.AddIndexColumn ( _, "Id", 0, 1, Int64.Type ), type table}}, 0, (x,y) => Number.From ( x[Course] <> y[Course] and x[Email] <> y[Email] or x[Mobile] <> y[Mobile] ) ),
    #"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Created", "Lead", "Email", "Mobile", "Course", "Stage", "Id"}, {"Created", "Lead", "Email", "Mobile", "Course", "Stage", "Id"})
in
    #"Expanded Data"

Here is a preview from the grouped rows step...

If this post helps to answer your questions, please consider marking it as a solution so others can find it more quickly when faced with a similar challenge.

Proud to be a Microsoft Fabric Super User

Custom Index Column

Helpful resources

Join us at the Microsoft Fabric Community Conference

A Year in Review - December 2024

New Offer! Become a Certified Fabric Data Engineer

Custom Index Column

Helpful resources

Join us at the Microsoft Fabric Community Conference

A Year in Review - December 2024