Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
russell80
Helper III
Helper III

Help With Counting Occurrences in Date Period

I have a table similar to the one below but with 10's of thousands of rows. What I want to do in power query, is for each row, count the number of rows the Name appears in the previous 12 months and add this as a new column.

 

e.g. for the last entry, 'Jane' occurs twice in the last 12 month date period 01-10-2022 to 30-09-2023.

 

I tried this by creating 2 new columns for the previous 12 months start and finish dates, then adding the column with a custom fuction to calculate the number of occurences in the last 12 months. But this takes a long time to run, so I'm hoping there is a faster way to do this.

 

Custom Function

 

 

(nametable as table, start_date as date, finish_date as date, name as text) =>

let
    Source = nametable,
    #"Filtered Rows1" = Table.SelectRows(Source, each [Name] = name),
    #"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each [Date] >= start_date and [Date] <= finish_date),
    #"Row Count" = Table.RowCount(#"Filtered Rows2")
in
    #"Row Count"

 

 

 

 

Name table

DateName
01/08/2021John
31/08/2021Jane
30/09/2021John
30/10/2021Jane
29/11/2021Jim
29/12/2021John
28/01/2022John
27/02/2022Jim
29/03/2022John
28/04/2022Jim
28/05/2022John
27/06/2022John
27/07/2022Jim
26/08/2022John
25/09/2022John
25/10/2022Jim
24/11/2022John
24/12/2022John
23/01/2023Jim
22/02/2023John
24/03/2023John
23/04/2023John
23/05/2023Jane
22/06/2023Jim
22/07/2023Jane
21/08/2023Jim
20/09/2023John
20/10/2023Jane
1 ACCEPTED SOLUTION
jbwtp
Memorable Member
Memorable Member

Hi @russell80,

 

Sorry, re-wrote my post as I found a much better solution.

 

This is a very intersting problem. The way to improve it is to somehow reduce the number of iterations and the size of the data being manipulated with.

 

The code below runs against 20k lines in about 3 sec on my laptop, which is quite new, so I guess that this is somewhere near the in-cloud performance. This goes more or less in line with @wdx223_Daniel but uses grouping to reduce the number of the rows we have to deal in each particular period of time.

 

Ignore anything before f  this steps are used to set up the scene for testing.

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HcVJDQAwCARAL7yrgjuAA4J/G012PrNLzEz3lkQEqyo2M+zuOCJwZuKqwt2NZ4buPg==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    Dates = List.Dates(#date(2018,1,1), 2000, #duration(1, 0,0,0)),
    Custom1 = Table.AddColumn(#"Changed Type", "Date", each Dates),
    #"Expanded Date" = Table.ExpandListColumn(Custom1, "Date"),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded Date",{"Date", "Name"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Date", type date}}),
    
    f = (t as table) as table =>
        let 
            mList = List.Buffer(t[Date]),
            out = List.Accumulate(mList, {}, (a, n)=> 
                let
                    limit = Date.AddMonths(n, -12), 
                    out = a & {[Date = n, Count = 1 + List.Count(List.Select(mList, each _ < n and _ > limit))]}
                in out)
        in Table.FromRecords(out),
    
    #"Grouped Rows" = Table.Group(#"Changed Type1", {"Name"}, {{"Data",  f}}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Grouped Rows", "Data", {"Date", "Count"}, {"Date", "Count"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Expanded Data",{{"Date", type date}})
in
    #"Changed Type2"

 

Cheers,

John

 

 

 

 

Cheers,

John

 

View solution in original post

6 REPLIES 6
jbwtp
Memorable Member
Memorable Member

Hi @russell80,

 

Sorry, re-wrote my post as I found a much better solution.

 

This is a very intersting problem. The way to improve it is to somehow reduce the number of iterations and the size of the data being manipulated with.

 

The code below runs against 20k lines in about 3 sec on my laptop, which is quite new, so I guess that this is somewhere near the in-cloud performance. This goes more or less in line with @wdx223_Daniel but uses grouping to reduce the number of the rows we have to deal in each particular period of time.

 

Ignore anything before f  this steps are used to set up the scene for testing.

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HcVJDQAwCARAL7yrgjuAA4J/G012PrNLzEz3lkQEqyo2M+zuOCJwZuKqwt2NZ4buPg==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    Dates = List.Dates(#date(2018,1,1), 2000, #duration(1, 0,0,0)),
    Custom1 = Table.AddColumn(#"Changed Type", "Date", each Dates),
    #"Expanded Date" = Table.ExpandListColumn(Custom1, "Date"),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded Date",{"Date", "Name"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Date", type date}}),
    
    f = (t as table) as table =>
        let 
            mList = List.Buffer(t[Date]),
            out = List.Accumulate(mList, {}, (a, n)=> 
                let
                    limit = Date.AddMonths(n, -12), 
                    out = a & {[Date = n, Count = 1 + List.Count(List.Select(mList, each _ < n and _ > limit))]}
                in out)
        in Table.FromRecords(out),
    
    #"Grouped Rows" = Table.Group(#"Changed Type1", {"Name"}, {{"Data",  f}}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Grouped Rows", "Data", {"Date", "Count"}, {"Date", "Count"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Expanded Data",{{"Date", type date}})
in
    #"Changed Type2"

 

Cheers,

John

 

 

 

 

Cheers,

John

 

Thansk @jbwtp , I've managed to get that running on my side. One thing I'd also like to know is: my source table actually has 26 columns, how can I keep all those columns so they are returned at the end when the table is expanded in #"Expanded Data"? Is that possible?

Hi @russell80,

 

Yes, this is possible. Sorry, should think about this beforehand.

This is the code that does it - a slight modification of the previous version:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HcVJDQAwCARAL7yrgjuAA4J/G012PrNLzEz3lkQEqyo2M+zuOCJwZuKqwt2NZ4buPg==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    Dates = List.Dates(#date(2018,1,1), 2000, #duration(1, 0,0,0)),
    Custom1 = Table.AddColumn(#"Changed Type", "Date", each Dates),
    #"Expanded Date" = Table.ExpandListColumn(Custom1, "Date"),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded Date",{"Date", "Name"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Date", type date}}),
    
    f = (t as table) as table =>
        let 
            mList = Table.ToRecords(t),
            out = List.Accumulate(mList, {}, (a, n)=> 
                let
                    limit = Date.AddMonths(n[Date], -12), 
                    out = a & {n & [Count = 1 + List.Count(List.Select(mList, each _[Date] < n[Date] and _[Date] > limit))]}
                in out)
        in Table.FromRecords(out),
    
    #"Grouped Rows" = Table.Group(#"Changed Type1", {"Name"}, {{"Data",  f}}),
    Custom2 = Table.Combine(#"Grouped Rows"[Data], Value.Type(Table.AddColumn(#"Changed Type1", "Count", each null, type number)))
in
    Custom2

Kind regards,

John

wdx223_Daniel
Super User
Super User

hope this can help

= let a=List.Buffer(Table.ToRows(Table.Sort(Source,"Date"))) in #table(Table.ColumnNames(Source)&{"Count"},List.Accumulate(a,{{},[]},(x,y)=>let aa=Record.TransformFields(x{1},{y{1},each List.Skip((_??{})&{y{0}},each _<Date.AddMonths(y{0},-12))},2) in {x{0}&{y&{List.Count(Record.Field(aa,y{1}))}},aa}){0})

talbot
Regular Visitor

This should work. No need for a function.

 

let
Source = NameTable,
#"Filtered Rows" = Table.SelectRows(Source, each Date.IsInPreviousNMonths([Date], 12)),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Name"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(Source, {"Name"}, #"Grouped Rows", {"Name"}, "Count", JoinKind.LeftOuter),
#"Expanded Count" = Table.ExpandTableColumn(#"Merged Queries", "Count", {"Count"}, {"Count"})
in
#"Expanded Count"

Thanks @talbot but that's not quite giving me the result I need. Here's a smaller table with how it should look.
To get the result, I used the following COUNTIFS formula in Cell C2 and dragged it down:

 

=COUNTIFS($C$2:$C$10,"="&C2,$B$2:$B$10,"<"&EOMONTH(B2,-1)+1,$B$2:$B$10,">"&EOMONTH(B2,-13))

 

 

 ABC
1DateNameCountifs
228/09/2022John2
328/09/2022Jane1
401/09/2022John2
501/09/2022Jane1
601/08/2022John2
701/09/2021John1
801/09/2021Jane1
931/08/2021John0
1031/08/2021Jane0

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors