Solved: Re: Create rows for missing dates within groups

orionjtaylor · ‎07-16-2021

I have data with gaps by state-county, like:

State	County	First_Seen	Last_Seen	Date	Population
NY	New York	2020-01-01	2020-06-30	2020-01-01	17
NY	New York	2020-01-01	2020-06-30	2020-01-02	17
NY	New York	2020-01-01	2020-06-30	2020-01-03	19
NY	New York	2020-01-01	2020-06-30	2020-01-05	31
NY	New York	2020-01-01	2020-06-30	2020-01-25	50
NY	New York	2020-01-01	2020-06-30	2020-01-26	33
...
NY	New York	2020-01-01	2020-06-30	2020-06-30	22
CA	Los Angeles	2020-02-15	2020-03-15	2020-02-15	886
CA	Los Angeles	2020-02-15	2020-03-15	2020-02-16	889
CA	Los Angeles	2020-02-15	2020-03-15	2020-03-14	855
CA	Los Angeles	2020-02-15	2020-03-15	2020-03-15	849

I want to create rows for missing dates within each State-County group in my table, and then linearly interpolate values of population over these. For instance, since NY-New York has data on 2020-01-03 and 2020-01-05 but not 2020-01-04, I would want to insert a row with linearly interpolated population for that date (19+31)/[length of gap = 2]=25 like:

...
NY	New York	2020-01-01	2020-06-30	2020-01-03	19
NY	New York	2020-01-01	2020-06-30	2020-01-04	25
NY	New York	2020-01-01	2020-06-30	2020-01-05	31
...

Is there a way to easily create rows for these missing dates within a group on (State, County), and/or to then linearly interpolate Population (or multiple columns) over these new rows?

Anonymous · ‎07-17-2021

@orionjtaylor

Here's PQ code in M that does linear interpolation. It's an example of how to do it natively, so to speak. It could be refined further to use a function to return the interpolated value. This code works fine if the following assumptions about the data set are met:

1) For each point/row that has a missing value there is a point that does have a value, its order is before the current point and it belongs to the same category and subcategory as the current point.

2) For each point/row that has a missing value there is a point that does have a value, its order is after the current point and it belongs to the same category and subcategory as the current point.

These are needed in order to be able to interpolate because you can't do it when there are no points with values on either side of a point with a missing value.

// f_InterpolateLinearly
(pointX as nullable number, 
    lowerPointX as nullable number, 
    lowerPointY as nullable number,
    upperPointX as nullable number,
    upperPointY as nullable number ) as nullable number =>
let
    slope = (upperPointY - lowerPointY) / (upperPointX - lowerPointX),
    value = slope * (pointX - lowerPointX) + lowerPointY
in
    value

// T
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bdA9CsMwDAXgu3hOIZJjxxnbHKBDoIvxkP5snUJ7/9pyCXlYg4z4eCCeYzTX7fnaTGfm9ZPf5Xt/yHJb3yZ10VDez2VOZSMxBhOyQFZsaGMOyIn53Xi/MIIJBaB6YWpj1IPVE0SAvmJZL2UO1cgCVhsUc4p5xUaw/+WgBCewUD+6b4NMirFiShNWmjA2yWVFsUuum9IP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Cat = _t, Subcat = _t, Val = _t, Column1 = _t]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Cat", type text}, {"Subcat", type text}, {"Val", Int64.Type}, {"Order", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Cat", Order.Ascending}, {"Subcat", Order.Ascending}, {"Order", Order.Ascending}}),
    #"Added Custom" = Table.AddColumn(#"Sorted Rows", "Info", 
        each 
            if _[Val] <> null then null
            else
                Table.Last(
                    Table.SelectRows(#"Sorted Rows",
                        (r) => r[Order] < _[Order] 
                                and r[Cat] = _[Cat] 
                                and r[Subcat] = _[Subcat]
                                and r[Val] <> null
                    )[[Order], [Val]]
                )
    ),
    #"Expanded Info" = Table.ExpandRecordColumn(#"Added Custom", "Info", {"Order", "Val"}, {"Info.Order", "Info.Val"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Info",{{"Info.Order", "LowerOrder"}, {"Info.Val", "LowerVal"}}),
    #"Added Info2" = Table.AddColumn(#"Renamed Columns", "Info2", 
        each 
            if _[Val] <> null then null
            else
                Table.First(
                    Table.SelectRows(#"Sorted Rows",
                        (r) => r[Order] > _[Order] 
                                and r[Cat] = _[Cat] 
                                and r[Subcat] = _[Subcat]
                                and r[Val] <> null
                    )[[Order], [Val]]
                )
    ),
    #"Expanded Info2" = Table.ExpandRecordColumn(#"Added Info2", "Info2", {"Order", "Val"}, {"Info2.Order", "Info2.Val"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded Info2",{{"Info2.Order", "UpperOrder"}, {"Info2.Val", "UpperVal"}}),
    #"Added Custom1" = Table.AddColumn(#"Renamed Columns1", "Interpolation", each f_InterpolateLinearly(
    [Order],
    [LowerOrder],
    [LowerVal],
    [UpperOrder],
    [UpperVal]
)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"LowerOrder", "LowerVal", "UpperOrder", "UpperVal"}),
    #"Added Custom2" = Table.AddColumn(#"Removed Columns", "FinalVal", each if [Val] = null then [Interpolation] else [Val]),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom2",{"Interpolation"})
in
    #"Removed Columns1"

Just paste this code into Advanced Editor (as is) and watch this being evaluated step by step. Use the Applied Steps pane for that.

View solution in original post

Greg_Deckler · ‎07-16-2021

@orionjtaylor There might be:

Linear Interpolation - Microsoft Power BI Community

Mind the Gap (Irregular Time Series) - Microsoft Power BI Community

Also: Linear Interpolation with Power BI - Microsoft Power BI Community

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

orionjtaylor · ‎07-16-2021

Thanks @Greg_Deckler,

I completed the first portion of my task on my own luckily. Now, I'm stuck at the linear interpolations.

My data looks like this now:

State	County	Date	Population
NY	New York	2020-01-01	105
NY	New York	2020-01-02	106
NY	New York	2020-01-03	null
NY	New York	2020-01-04	null
NY	New York	2020-01-05	null
NY	New York	2020-01-06	114

I want to create a simple time trend interpolation over Population for each State/County group in Power Query M, such that the filled data looks like:

State	County	Date	Population
NY	New York	2020-01-01	105
NY	New York	2020-01-02	106
NY	New York	2020-01-03	108
NY	New York	2020-01-04	110
NY	New York	2020-01-05	112
NY	New York	2020-01-06	114

This seems like a simple task, but most of the information I've seen online makes this look like an incredibly convoluted task where one must create their own interpolation function. Do you know if there's a simple way to do this?

orionjtaylor · ‎07-16-2021

@Greg_Deckler also looks like perhaps both of those posts use DAX, but correct me if I'm wrong! (I am very new to Power BI in general. This kind of interpolation would be trivial in Python...)

Anonymous · ‎07-16-2021

@orionjtaylor

Then do it in Python. You can use Python in Power Query as far as I know...

orionjtaylor · ‎07-16-2021

@Anonymous didn't realize that was an option. That does the trick as anticipated, although I maintain they should have a native solution for this! Anyway, thanks!

Anonymous · ‎07-17-2021

@orionjtaylor

Here's PQ code in M that does linear interpolation. It's an example of how to do it natively, so to speak. It could be refined further to use a function to return the interpolated value. This code works fine if the following assumptions about the data set are met:

1) For each point/row that has a missing value there is a point that does have a value, its order is before the current point and it belongs to the same category and subcategory as the current point.

2) For each point/row that has a missing value there is a point that does have a value, its order is after the current point and it belongs to the same category and subcategory as the current point.

These are needed in order to be able to interpolate because you can't do it when there are no points with values on either side of a point with a missing value.

// f_InterpolateLinearly
(pointX as nullable number, 
    lowerPointX as nullable number, 
    lowerPointY as nullable number,
    upperPointX as nullable number,
    upperPointY as nullable number ) as nullable number =>
let
    slope = (upperPointY - lowerPointY) / (upperPointX - lowerPointX),
    value = slope * (pointX - lowerPointX) + lowerPointY
in
    value

// T
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bdA9CsMwDAXgu3hOIZJjxxnbHKBDoIvxkP5snUJ7/9pyCXlYg4z4eCCeYzTX7fnaTGfm9ZPf5Xt/yHJb3yZ10VDez2VOZSMxBhOyQFZsaGMOyIn53Xi/MIIJBaB6YWpj1IPVE0SAvmJZL2UO1cgCVhsUc4p5xUaw/+WgBCewUD+6b4NMirFiShNWmjA2yWVFsUuum9IP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Cat = _t, Subcat = _t, Val = _t, Column1 = _t]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Cat", type text}, {"Subcat", type text}, {"Val", Int64.Type}, {"Order", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Cat", Order.Ascending}, {"Subcat", Order.Ascending}, {"Order", Order.Ascending}}),
    #"Added Custom" = Table.AddColumn(#"Sorted Rows", "Info", 
        each 
            if _[Val] <> null then null
            else
                Table.Last(
                    Table.SelectRows(#"Sorted Rows",
                        (r) => r[Order] < _[Order] 
                                and r[Cat] = _[Cat] 
                                and r[Subcat] = _[Subcat]
                                and r[Val] <> null
                    )[[Order], [Val]]
                )
    ),
    #"Expanded Info" = Table.ExpandRecordColumn(#"Added Custom", "Info", {"Order", "Val"}, {"Info.Order", "Info.Val"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Info",{{"Info.Order", "LowerOrder"}, {"Info.Val", "LowerVal"}}),
    #"Added Info2" = Table.AddColumn(#"Renamed Columns", "Info2", 
        each 
            if _[Val] <> null then null
            else
                Table.First(
                    Table.SelectRows(#"Sorted Rows",
                        (r) => r[Order] > _[Order] 
                                and r[Cat] = _[Cat] 
                                and r[Subcat] = _[Subcat]
                                and r[Val] <> null
                    )[[Order], [Val]]
                )
    ),
    #"Expanded Info2" = Table.ExpandRecordColumn(#"Added Info2", "Info2", {"Order", "Val"}, {"Info2.Order", "Info2.Val"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded Info2",{{"Info2.Order", "UpperOrder"}, {"Info2.Val", "UpperVal"}}),
    #"Added Custom1" = Table.AddColumn(#"Renamed Columns1", "Interpolation", each f_InterpolateLinearly(
    [Order],
    [LowerOrder],
    [LowerVal],
    [UpperOrder],
    [UpperVal]
)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"LowerOrder", "LowerVal", "UpperOrder", "UpperVal"}),
    #"Added Custom2" = Table.AddColumn(#"Removed Columns", "FinalVal", each if [Val] = null then [Interpolation] else [Val]),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom2",{"Interpolation"})
in
    #"Removed Columns1"

Just paste this code into Advanced Editor (as is) and watch this being evaluated step by step. Use the Applied Steps pane for that.