Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
Anonymous
Not applicable

Distinct count based on previous dates and grouped by other variables

Hi,

I have a table with several columns including Date, ID and other variables.

I want to get a grouped table with a distinct count of the IDs grouped by 2 variables. Until here the group by option solves this quickly.

The added difficulty is that the IDs distinct count has to be done for the previous 3 months (e.g. for Feb-20 I will count the distinct IDs in Jan-20, Dec-19 and Nov-19)

I am working with the advanced editor but I don't find a way to add a "between dates" or any other option to add to the query that will provide all the IDs for a past period of time.

 

The data source looks like this:
DataSource.png


And I want a table like the following:

 

ResultTable.png


It means that, for the month of Feb-20 I am counting how many distinct ID+Var+Var2 existed in the previous 3 months (so Date between Nov-2019 and Jan-2020)

The result table has to contain all the months, not only Feb-20.

 

**I'm adding a link to a sample file containing fake data: https://1drv.ms/u/s!AiEedMWX9yXiayCL3IzMb4-Shy0?e=SXa9FP

 

Anyone knows how can I do that from the advanced editor?

Thanks

1 ACCEPTED SOLUTION
Anonymous
Not applicable

change the column [all] in the expressione where

 

 

 

 

 

 

    #"Added Custom 1" = List.Transform(#"Added Custom"[all],

 

 

 

 

 

to [rollingQ]

 

 

 

 

 

    #"Added Custom 1" = List.Transform(#"Added Custom"[rollingQ],

 

 

 

 

 

 

but, please PLEASE, be carefull to the space in the text values. For instance Blue is not "Blue" but "Blue " and this cause problem.

 

don't have now enough time to analyze and understand the real reason for these problems, but, for safety, find and remove the extra spaces

 

 

#########edited#############

 

use this code to try

 

 

 

let
 
   s1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZTNbsIwEIRfpcqZSKU3jqCqhwh6IBcqxCGA+0MtQEhpVZ6+Voyn9o4Xou4hsfAnz2ZnzHJZDMvnw1c5HBWDovHlVhPbmjv3rovVIEHWXbnF3Gzdcyb3/454MdYevt1iemEezcYzm0tFOguCtl1BqJb7xlcsFJqpmn35cO9+eA0VKU0l9NYVhBYEZD6plgzGEmSoFzG4PircyntXGX80GWrjA5WzKNMNOfRk1pJJLQKAXsghILtQGYcAffpii2bNSVOqJYI8SYeYSB3CfhRbtmh8PHHm0rkAQeJoLkB4/iQknOZO9KFckeFjkDnlayj6kT0/mdClx4DR0pQBaGxgqBdqJQ4cpUlXom4wlX80U7X72+kHpKYfhG40EDX9TIj/p7B/Nf2g9PRXrb0hZHsJ2d5C+jUDol8zINo1y5yh65x9CWb1Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Date = _t, ID = _t, Var1 = _t, Var2 = _t]),
    s2=Table.ReplaceValue(s1,"Blue ","Blue",Replacer.ReplaceText,{"Var1"}),
    #"Changed Type" = Table.TransformColumnTypes(s2,{{"Date", type date}},"en-US"),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [Date] <> null and [Date] <> ""),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Date"}, {"all", each _},GroupKind.Local,(x,y)=>Number.From(y[Date]>=Date.AddMonths(x[Date],1))),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "rollingQ", each Table.Combine(List.LastN(List.FirstN(#"Grouped Rows"[all], List.PositionOf(#"Grouped Rows"[all],[all])),3))),


    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsopTVXSUQpWitWBc3yROT7InAgILyg1Ba4HwvZFYiOrgWqITM3JyS+H64FzfVG5aIpBmmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type 1" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),

    #"Added Custom 1" = List.Transform(#"Added Custom"[rollingQ], (t)=> Table.AddColumn(#"Changed Type 1", "dist", each try Table.RowCount(Table.Distinct(Table.SelectRows(t, (r)=>  r[Var1]=[Column1] and r[Var2]=[Column2] ),"ID" ))otherwise 0)),

    q1=Table.FromColumns({#"Added Custom"[Date], #"Added Custom 1"}),
    #"Expanded Column2" = Table.ExpandTableColumn(q1, "Column2", {"Column1", "Column2", "dist"}, {"Column2.Column1", "Column2.Column2", "Column2.dist"}),
    #"Added Custom 2" = Table.AddColumn(#"Expanded Column2", "mmm", each Date.MonthName([Column1],"it-IT")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom 2",{"Column1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"mmm", "Column1"}}),
    #"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Renamed Columns", {{"Column1", type text}}, "it-IT"), List.Distinct(Table.TransformColumnTypes(#"Renamed Columns", {{"Column1", type text}}, "it-IT")[Column1]), "Column1", "Column2.dist")
in
    #"Pivoted Column"

 

 

 

The problem about space ending Blue was that I hand writed an auxiliary table for the colours and sizes and, of course, I couldn't realize there was a space after Blue in the table you linked

 

 

 

 

View solution in original post

11 REPLIES 11
ziying35
Impactful Individual
Impactful Individual

Hi, @Anonymous 

What if the data before the current month is less than three months? and could you upload a complete spreadsheet of simulated data

Anonymous
Not applicable

Hi, @ziying35

If there is any ID only existing in the current month, it won't appear in the ID distinct count until the next 3 months.

The goal is to get the distinct IDs in groups of 3 months. If I have an ID only appearing in Jan-20, it will be counted in the lines for Feb-20, Mar-20 and Apr-20. It won't appear in May-20 if it doesn't exist in any of the previous 3 months. It won't appear either in Jan-20.

 

If the current data is less than 3 months - imagine we started logging info starting on Jun-20 - I'd like to get all the IDs distinct count for the month on June assigned to the July row in the result table.

 

I am not finding a way to upload a sample data file, my apologies.

ziying35
Impactful Individual
Impactful Individual

@Anonymous 

You can upload the file to onedrive or Google or Media Fire and post the link, File data as much as possible to simulate, the desired effect is also simulated, Because my English is not good, but I can understand what the spreadsheet says

Anonymous
Not applicable

Here it is the link to the file, I hope you can open it

https://1drv.ms/u/s!AiEedMWX9yXiayCL3IzMb4-Shy0?e=SXa9FP

 

I'm putting the link to the file in my first post.

Thank you 🙂

Anonymous
Not applicable

is this the expected result?

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZTNbsIwEIRfpcqZSKU3jqCqhwh6IBcqxCGA+0MtQEhpVZ6+Voyn9o4Xou4hsfAnz2ZnzHJZDMvnw1c5HBWDovHlVhPbmjv3rovVIEHWXbnF3Gzdcyb3/454MdYevt1iemEezcYzm0tFOguCtl1BqJb7xlcsFJqpmn35cO9+eA0VKU0l9NYVhBYEZD6plgzGEmSoFzG4PircyntXGX80GWrjA5WzKNMNOfRk1pJJLQKAXsghILtQGYcAffpii2bNSVOqJYI8SYeYSB3CfhRbtmh8PHHm0rkAQeJoLkB4/iQknOZO9KFckeFjkDnlayj6kT0/mdClx4DR0pQBaGxgqBdqJQ4cpUlXom4wlX80U7X72+kHpKYfhG40EDX9TIj/p7B/Nf2g9PRXrb0hZHsJ2d5C+jUDol8zINo1y5yh65x9CWb1Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Date = _t, ID = _t, Var1 = _t, Var2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}},"en-US"),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [Date] <> null and [Date] <> ""),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "ddd", each Table.RowCount(Table.Distinct(Table.SelectRows(#"Filtered Rows",
    
    (d)=> Date.AddMonths(d[Date],+3) >= [Date] and d[Date]<[Date]),{"ID","Var1","Var2"})))
    
in
    #"Added Custom"
Anonymous
Not applicable

peraphs (but I'm not really sure ) now I understood the request 😀

 

 

 

this is the query quarter, which group the rows of the orginal table in table containings rows which date is in tree months

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZTNbsIwEIRfpcqZSKU3jqCqhwh6IBcqxCGA+0MtQEhpVZ6+Voyn9o4Xou4hsfAnz2ZnzHJZDMvnw1c5HBWDovHlVhPbmjv3rovVIEHWXbnF3Gzdcyb3/454MdYevt1iemEezcYzm0tFOguCtl1BqJb7xlcsFJqpmn35cO9+eA0VKU0l9NYVhBYEZD6plgzGEmSoFzG4PircyntXGX80GWrjA5WzKNMNOfRk1pJJLQKAXsghILtQGYcAffpii2bNSVOqJYI8SYeYSB3CfhRbtmh8PHHm0rkAQeJoLkB4/iQknOZO9KFckeFjkDnlayj6kT0/mdClx4DR0pQBaGxgqBdqJQ4cpUlXom4wlX80U7X72+kHpKYfhG40EDX9TIj/p7B/Nf2g9PRXrb0hZHsJ2d5C+jUDol8zINo1y5yh65x9CWb1Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Date = _t, ID = _t, Var1 = _t, Var2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}},"en-US"),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [Date] <> null and [Date] <> ""),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Date"}, {"all", each _},GroupKind.Local,(x,y)=>Number.From(y[Date]>=Date.AddMonths(x[Date],3)))
in
    #"Grouped Rows"

 

 

this is the query quarteDistinctID, which count for each type of product the distinc ID

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsopTVXSUQpWitWBc3yROT7InAgILyg1Ba4HwvZFYiOrgWqITM3JyS+H64FzfVG5aIpBmmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
    #"Added Custom" = List.Transform(quarter[all], (t)=> Table.AddColumn(#"Changed Type", "dist", each Table.RowCount(Table.Distinct(Table.SelectRows(t, (r)=> r[Var1]=[Column1] and r[Var2]=[Column2]),"ID"))))
in
    #"Added Custom"

 

 

Anonymous
Not applicable

if you expect (is not clear to me) a sort of rolling info  on previuos tree months:

 

let

   s1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZTNbsIwEIRfpcqZSKU3jqCqhwh6IBcqxCGA+0MtQEhpVZ6+Voyn9o4Xou4hsfAnz2ZnzHJZDMvnw1c5HBWDovHlVhPbmjv3rovVIEHWXbnF3Gzdcyb3/454MdYevt1iemEezcYzm0tFOguCtl1BqJb7xlcsFJqpmn35cO9+eA0VKU0l9NYVhBYEZD6plgzGEmSoFzG4PircyntXGX80GWrjA5WzKNMNOfRk1pJJLQKAXsghILtQGYcAffpii2bNSVOqJYI8SYeYSB3CfhRbtmh8PHHm0rkAQeJoLkB4/iQknOZO9KFckeFjkDnlayj6kT0/mdClx4DR0pQBaGxgqBdqJQ4cpUlXom4wlX80U7X72+kHpKYfhG40EDX9TIj/p7B/Nf2g9PRXrb0hZHsJ2d5C+jUDol8zINo1y5yh65x9CWb1Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Date = _t, ID = _t, Var1 = _t, Var2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(s1,{{"Date", type date}},"en-US"),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [Date] <> null and [Date] <> ""),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Date"}, {"all", each _},GroupKind.Local,(x,y)=>Number.From(y[Date]>=Date.AddMonths(x[Date],1))),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "rollingQ", each List.LastN(List.FirstN(#"Grouped Rows"[all], List.PositionOf(#"Grouped Rows"[all],[all])),3)),


    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsopTVXSUQpWitWBc3yROT7InAgILyg1Ba4HwvZFYiOrgWqITM3JyS+H64FzfVG5aIpBmmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type 1" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
    #"Added Custom 1" = List.Transform(#"Added Custom"[all], (t)=> Table.AddColumn(#"Changed Type 1", "dist", each Table.RowCount(Table.Distinct(Table.SelectRows(t, (r)=> r[Var1]=[Column1] and r[Var2]=[Column2]),"ID"))))

in
    Table.FromColumns({#"Added Custom"[Date], #"Added Custom 1"})

 

Anonymous
Not applicable

Hi, @Anonymous 

I am trying the different solutions you've provided me. Thank you so much, they look much better than what I did 🙂

 

Comparing the last query with a manual calculation I've done in the excel, this how I'd like to get the month of July 2020:

The results are not exactly the same, the differences are in red color.

I'll try to understand why some totals are different. 

As I need the last 3 months, to fill the rows for Jul-20 I am taking all the rows in Jun-20, May-20 and Apr-20.

Then I am filtering by the VAR1 and VAR2, and with the results I count the distinct IDs.

I have highlighted a case I think it's relevant: if I count all the IDs for Yellow+S, the total is 3 but, as far as there is an ID repeated (bbbbbb) it will be only counted once. So, in the row 01-Jul-20 | Yellow | S, the total I'd like to see is 2 instead of 3

 

ComparingQueryExcel.png

Maybe I did something wrong when copying your query, I'll double check just in case.

Thanks again!! I really appreciate your help 🙂

 

Anonymous
Not applicable

change the column [all] in the expressione where

 

 

 

 

 

 

    #"Added Custom 1" = List.Transform(#"Added Custom"[all],

 

 

 

 

 

to [rollingQ]

 

 

 

 

 

    #"Added Custom 1" = List.Transform(#"Added Custom"[rollingQ],

 

 

 

 

 

 

but, please PLEASE, be carefull to the space in the text values. For instance Blue is not "Blue" but "Blue " and this cause problem.

 

don't have now enough time to analyze and understand the real reason for these problems, but, for safety, find and remove the extra spaces

 

 

#########edited#############

 

use this code to try

 

 

 

let
 
   s1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZTNbsIwEIRfpcqZSKU3jqCqhwh6IBcqxCGA+0MtQEhpVZ6+Voyn9o4Xou4hsfAnz2ZnzHJZDMvnw1c5HBWDovHlVhPbmjv3rovVIEHWXbnF3Gzdcyb3/454MdYevt1iemEezcYzm0tFOguCtl1BqJb7xlcsFJqpmn35cO9+eA0VKU0l9NYVhBYEZD6plgzGEmSoFzG4PircyntXGX80GWrjA5WzKNMNOfRk1pJJLQKAXsghILtQGYcAffpii2bNSVOqJYI8SYeYSB3CfhRbtmh8PHHm0rkAQeJoLkB4/iQknOZO9KFckeFjkDnlayj6kT0/mdClx4DR0pQBaGxgqBdqJQ4cpUlXom4wlX80U7X72+kHpKYfhG40EDX9TIj/p7B/Nf2g9PRXrb0hZHsJ2d5C+jUDol8zINo1y5yh65x9CWb1Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Date = _t, ID = _t, Var1 = _t, Var2 = _t]),
    s2=Table.ReplaceValue(s1,"Blue ","Blue",Replacer.ReplaceText,{"Var1"}),
    #"Changed Type" = Table.TransformColumnTypes(s2,{{"Date", type date}},"en-US"),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [Date] <> null and [Date] <> ""),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Date"}, {"all", each _},GroupKind.Local,(x,y)=>Number.From(y[Date]>=Date.AddMonths(x[Date],1))),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "rollingQ", each Table.Combine(List.LastN(List.FirstN(#"Grouped Rows"[all], List.PositionOf(#"Grouped Rows"[all],[all])),3))),


    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsopTVXSUQpWitWBc3yROT7InAgILyg1Ba4HwvZFYiOrgWqITM3JyS+H64FzfVG5aIpBmmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type 1" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),

    #"Added Custom 1" = List.Transform(#"Added Custom"[rollingQ], (t)=> Table.AddColumn(#"Changed Type 1", "dist", each try Table.RowCount(Table.Distinct(Table.SelectRows(t, (r)=>  r[Var1]=[Column1] and r[Var2]=[Column2] ),"ID" ))otherwise 0)),

    q1=Table.FromColumns({#"Added Custom"[Date], #"Added Custom 1"}),
    #"Expanded Column2" = Table.ExpandTableColumn(q1, "Column2", {"Column1", "Column2", "dist"}, {"Column2.Column1", "Column2.Column2", "Column2.dist"}),
    #"Added Custom 2" = Table.AddColumn(#"Expanded Column2", "mmm", each Date.MonthName([Column1],"it-IT")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom 2",{"Column1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"mmm", "Column1"}}),
    #"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Renamed Columns", {{"Column1", type text}}, "it-IT"), List.Distinct(Table.TransformColumnTypes(#"Renamed Columns", {{"Column1", type text}}, "it-IT")[Column1]), "Column1", "Column2.dist")
in
    #"Pivoted Column"

 

 

 

The problem about space ending Blue was that I hand writed an auxiliary table for the colours and sizes and, of course, I couldn't realize there was a space after Blue in the table you linked

 

 

 

 

Anonymous
Not applicable

Hi @Anonymous 

That's fantastic!

It is exactly what I needed. I will adapt your query to my data, as I have many other columns and data processing needed but yes, the point I couldn't solve is this.

You're a star, thank you so much for your help 🙂

Anonymous
Not applicable

adapt this to you data  and see if it works

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bdAxDoAgDAXQu3QmgXbzLIQBCLDoIove3iYWDdahXV5+W/Ae0CJacriAgRgj98TVIZjZUkpim7I7d3Ktw2hYzllyh8Zaq6C2UooMfRYykfuPDWqtcd+ndcPeM/uX9OtoTsmnhAs=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [when = _t, id = _t, v1 = _t, v2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"when", type date}, {"id", type text}, {"v1", type text}, {"v2", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "ddd", each Table.RowCount(Table.Distinct(Table.SelectRows(#"Changed Type",
    
    (d)=> Date.AddMonths(d[when],+3) >= [when] and d[when]<[when]),{"id","v1","v2"})))
    
in
    #"Added Custom"

 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.