Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
This is a three parter, but they are related. I am looking for a more efficient way to remove bad data, but it's all relative.
Part 1. I want to only keep vaccine dates (sorted chronologically oldest -->newest) that were "Given" or vaccine header contains "Measles" that were given on or after the first birthday and are at least 24 days after the previous vaccine. (Yes, my data does include null DOB and vaccine doses).
Part 2. I then have to merge with another table (that doesn't include Status but still has 6 possible vaccine dates), and repeat the process of removing duplicates, given before the first birthday or given < 24 days after the previous dose.
Part 3. Using the original dates to see if the person had a monovalent vaccine (1963-1967), or likely had disease (date <1963) or any combination therein mixed with the good dates.
Table 1:
People Soft ID | DOB | Date of MMR vaccine 1 | MMR Status 1 | Date of MMR vaccine 2 | MMR Status 2 | Date of MMR vaccine 3 | MMR Status 3 | Date of MMR vaccine 4 | MMR Status 4 | Date of Measles vaccine 1 | Date of Measles vaccine 2 |
1 | 7/12/1968 | 12/27/1989 | Given | Given | 10/28/1969 | 12/29/1969 | |||||
2 | 3/29/1970 | Given | Given | 3/1/1971 | 11/1/1971 | ||||||
3 | 11/17/1971 | 11/15/1972 | Given | 10/29/1973 | Given | 11/15/1972 | 1/16/1982 | ||||
4 | 10/22/1972 | 8/1/1991 | Given | Given | 1/1/1974 | 8/1/1989 | |||||
5 | 8/11/1995 | 4/17/2000 | Given | 10/17/2000 | Given | 10/17/2000 | Given | 6/15/1996 | 2/21/1998 | ||
6 | 1/20/1998 | 5/28/1999 | Given | 4/1/2003 | Deleted | 4/21/1998 | 5/21/1998 | ||||
7 | Given | Given | 3/1/1982 | 9/1/1998 | |||||||
8 | Given | Deleted | Deleted | 10/13/2000 | 10/13/2000 | ||||||
9 | 3/11/1999 | 3/24/2000 | Declined | 3/24/2000 | Deleted | 6/26/2003 | Given | 6/26/2003 | Deferred | ||
10 | 11/14/1995 | 11/21/1996 | Given | 11/2/2000 | Given | 12/26/2018 | Given | 12/26/2018 | Deleted | ||
11 | 2/21/2002 | 7/2/2003 | Deleted | 1/22/2007 | Declined | 7/2/2008 | Given | 7/2/2008 | Deleted | ||
12 | 5/8/1988 | 8/21/1989 | Given | 4/13/1993 | Given | 8/3/2018 | Given | 9/4/2018 | Given |
End Result:
People Soft ID | DOB | MMR 1 | MMR 2 | MMR 3 | MMR 4 |
1 | 7/12/1968 | 10/28/1969 | 12/29/1969 | 12/27/1989 | |
2 | 3/29/1970 | 3/1/1971 | |||
3 | 11/17/1971 | 1/16/1982 | |||
4 | 10/22/1972 | 1/1/1974 | 8/1/1989 | 8/1/1991 | |
5 | 8/11/1995 | 2/21/1998 | 4/17/2000 | 10/17/2000 | |
6 | 1/20/1998 | 5/28/1999 | |||
7 | 3/1/1982 | 9/1/1998 | |||
8 | 10/13/2000 | ||||
9 | 3/11/1999 | ||||
10 | 11/14/1995 | 11/21/1996 | 11/2/2000 | 12/26/2018 | |
11 | 2/21/2002 | 7/2/2008 | |||
12 | 5/8/1988 | 8/21/1989 | 4/13/1993 | 8/3/2018 | 9/4/2018 |
My current solution is creating new MMR Columns, then merging using the solution from this post then creating new MMR columns to check against birthdate and time between doses, and then comparing for stage 3. Below is my current Mcode. (I do apologize, as I have several of these workbooks going as I try to figure this out, and the code does include the merged table, but it should give an idea.
#"Added Custom1" = Table.AddColumn(#"Reordered Columns1", "Dose 1", each if [Date of Measles vaccine 1]=null or [Birthdate]=null then null
else if [Date of Measles vaccine 1]<#date(1963,1,1) then null
else if [Date of Measles vaccine 1]<#date(1968,1,1) and [Date of Measles vaccine 1]>=Date.AddYears([Birthdate],1) then null
else if [Date of Measles vaccine 1]>=Date.AddYears([Birthdate],1) then [Date of Measles vaccine 1] else null),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Dose 2", each if [Date of Measles vaccine 2]=null or [Birthdate]=null then null
else if [Date of Measles vaccine 2]<#date(1968,1,1) then null
else if [Dose 1]=null and [Date of Measles vaccine 2]>=Date.AddYears([Birthdate],1) then [Date of Measles vaccine 2]
else if [Dose 1]<>null and [Date of Measles vaccine 2]>=Date.AddDays([Dose 1],24) then [Date of Measles vaccine 2] else null),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "Dose 3", each if [MMR 1]=null or [Birthdate]=null then null
else if [MMR 1]<#date(1968,1,1) then null
else if [Dose 1]=null and [Dose 2]=null and [MMR 1]>=Date.AddYears([Birthdate],1) then [MMR 1]
else if [Dose 2]<>null and [MMR 1]>=Date.AddDays([Dose 2],24) then [MMR 1]
else if [Dose 2]=null and [Dose 1]<>null and [MMR 1]>=Date.AddDays([Dose 1],24) then [MMR 1] else null),
#"Sorted Rows" = Table.Sort(#"Added Custom3",{{"MMR 4", Order.Descending}}),
#"Added Custom4" = Table.AddColumn(#"Sorted Rows", "Dose 4", each if [MMR 2]=null or [Birthdate]=null then null
else if [MMR 2]<#date(1968,1,1) then null
else if [Dose 3]<>null and [MMR 2]>=Date.AddDays([Dose 3],24) then [MMR 2]
else if [Dose 3]=null and [Dose 2]<>null and [MMR 2]>=Date.AddDays([Dose 2],24) then [MMR 2]
else if [Dose 3]=null and [Dose 2]=null and [Dose 1]<>null and [MMR 2]>=Date.AddDays([Dose 1],24) then [MMR 2] else null),
#"Added Custom5" = Table.AddColumn(#"Added Custom4", "Dose 5", each if [MMR 3]=null or [Birthdate]=null then null
else if [MMR 3]<#date(1968,1,1) then null
else if [Dose 4]<>null and [MMR 3]>=Date.AddDays([Dose 4],24) then [MMR 3]
else if [Dose 4]=null and [Dose 3]<> null and [MMR 3]>=Date.AddDays([Dose 3],24) then [MMR 3]
else if [Dose 4]=null and [Dose 3]=null and [Dose 2]<>null and [MMR 3]>=Date.AddDays([Dose 2],24) then [MMR 3]
else if [Dose 4]=null and [Dose 3]=null and [Dose 2]=null and [Dose 1]<>null and [MMR 3]>=Date.AddDays([Dose 1],24) then [MMR 3] else null),
#"Added Custom6" = Table.AddColumn(#"Added Custom5", "Dose 6", each if [MMR 4]=null or [Birthdate]=null then null
else if [MMR 4]<#date(1968,1,1) then null
else if [Dose 5]<>null and [MMR 4]>=Date.AddDays([Dose 5],24) then [MMR 4]
else if [Dose 5]=null and [Dose 4]<>null and [MMR 4]>=Date.AddDays([Dose 4],24) then [MMR 4]
else if [Dose 5]=null and [Dose 4]=null and [Dose 3]<> null and [MMR 4]>=Date.AddDays([Dose 3],24) then [MMR 4]
else if [Dose 5]=null and [Dose 4]=null and [Dose 3]=null and [Dose 2]<>null and [MMR 4]>=Date.AddDays([Dose 2],24) then [MMR 4]
else if [Dose 5]=null and [Dose 4]=null and [Dose 3]=null and [Dose 2]=null and [Dose 1]<>null and [MMR 4]>=Date.AddDays([Dose 1],24) then [MMR 4] else null),
#"Added Custom7" = Table.AddColumn(#"Added Custom6", "Valid Vaccine Count", each List.NonNullCount({[Dose 1],[Dose 2],[Dose 3],[Dose 4],[Dose 5],[Dose 6]})),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom7",{{"Dose 1", type date}, {"Dose 2", type date}, {"Dose 3", type date}, {"Dose 4", type date}, {"Dose 5", type date}, {"Dose 6", type date}}),
#"Added Custom8" = Table.AddColumn(#"Changed Type1", "#Monovalent", each List.Count(List.Select({[Date of Measles vaccine 1],[Date of Measles vaccine 2],[MMR 1],[MMR 2],[MMR 3],[MMR 4]}, each _>=#date(1963,1,1) and _<#date(1968,1,1)))),
#"Added Custom9" = Table.AddColumn(#"Added Custom8", "Vacc Status", each if [Birthdate]=null then "No DOB"
else if [Date of Measles vaccine 1]=null then "Not Immune"
else if [Valid Vaccine Count]>=2 then "Immune by Vacc"
else if [Date of Measles vaccine 1]<#date(1963,1,1) and [Valid Vaccine Count]=1 then "Dz and 1 dose"
else if [Date of Measles vaccine 1]<#date(1963,1,1) and [#"#Monovalent"]=1 then "Dz and 1 Monovalent"
else if [Date of Measles vaccine 1]<#date(1963,1,1) then "Dz"
else if [Valid Vaccine Count]=1 and [#"#Monovalent"]=1 then "1 Valid and 1 Monovalent"
else if [#"#Monovalent"]=1 then "1 Monovalent"
else if [Valid Vaccine Count]=1 then "1 dose"
else if [Valid Vaccine Count]=0 and [#"#Monovalent"]=0 then "Not Immune"
else null),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom9",{"Date of Measles vaccine 1", "Date of Measles vaccine 2", "MMR 1", "MMR 2", "MMR 3", "MMR 4", "Valid Vaccine Count", "#Monovalent"})
in
#"Removed Columns1"
Thank you!
Solved! Go to Solution.
After I mentioned it, thought I'd try out the merge + actual unpivot and it was actually more straightforward than I thought, so including here. Basically, we still figure out the MMR Status / MMR Date pairings and iterate through them with another List.Accumulate to dynamically do all the merges. And with our merge function, we extract dates when the status is "Given". Steps are the same as before once we get the unpivoted table.
let
Source = #"Table 1",
//Column names that start with "Date of MMR"
MmrDateCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "date of mmr")),
//Column names that start with "MMR Status"
MmrStatCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "mmr status")),
//Pairing the MMR Status and Date columns names together together.
//We assume that the MMR Status and Date columns are in the same order.
MmrPairedCols = List.Zip({MmrStatCols, MmrDateCols}),
//Go through our MMR Pairs and merge them and extract the dates where the MMR Status is "Given".
MergePairsExtractGivenDates = List.Accumulate(
//Iterating through the MMR Pairs
MmrPairedCols,
//Seed is the original table
Source,
//Defining accumulator function. See here for more info: https://powerquery.how/list-accumulate/
(inputTable,currentPair)=>
Table.CombineColumns(
inputTable,
currentPair,
(combinedVals) as nullable date =>
//First element is the MMR Status; check if it is "Given"
if combinedVals{0} = "Given"
//If yes, return the second element (the MMR date)
then combinedVals{1}
//If no, return null
else null,
//Name the new column "MMR #" where # is the current pair's index + 1
"Merge " & Text.From( List.PositionOf( MmrPairedCols, currentPair ) + 1 )
)
),
//Unpivot all the date columns we want to filter on. Nulls are automatically dropped
UnpivotDates = Table.UnpivotOtherColumns(
MergePairsExtractGivenDates,
{"People Soft ID", "DOB"},
"Old Columns", "Good Date"
),
// Sort the table by ID and date.
// This is necessary for our use of GroupKind.Local in the next step
SortDates = Table.Sort(UnpivotDates, {{"People Soft ID", Order.Ascending}, {"Good Date", Order.Ascending}}),
//Filter out any dates that are before the first birthday.
FilterBeforeFirstBD = Table.SelectRows(
SortDates,
each [DOB] = null or [Good Date] >= Date.AddYears( [DOB], 1 )
),
//Group by ID and DOB to set the scope for the List.Accumulate function
GroupValidDates = Table.Group(
FilterBeforeFirstBD,
{"People Soft ID", "DOB"},
{{
"ValidDates",
each List.Accumulate(
//iterate over the dates in the current group
[Good Date],
//Seed is empty list
{},
//Defining accumulator function. See here for more info: https://powerquery.how/list-accumulate/
(state, current) =>
//If the current Good Date is greater than or equal to last accepted date + 24 days
//OR there are no dates in the list yet (state is empty)
if List.IsEmpty(state) or current >= Date.AddDays( List.Last(state), 24 )
//If yes: add the current Good Date to the list of accepted dates
then state & {current}
//IF no: we don't do anything and move on to next Good Date to check
else state
),
type { date }
}},
//The order of dates that we iterate through with List.Accumulate needs to be in ascending order.
//We can ensure this by using GroupKind.Local. The default, GroupKind.Global, does not guarantee this.
GroupKind.Local
),
//Calc the maximum number of valid dates in any group.
MaxValidDates = List.Max(List.Transform(GroupValidDates[ValidDates], List.Count)),
//Create a list of all possible field names we may need
ValidDateFieldNames = List.Generate(
() => 0,
each _ < MaxValidDates,
each _ + 1,
each "MMR " & Text.From(_ + 1)
),
//Create a record type with the field names we generated above.
ExtractRecord = Type.ForRecord(
Record.FromList(
List.Repeat({[Type = type date, Optional = false]}, MaxValidDates),
ValidDateFieldNames
),
false
),
//Transform the ValidDates list of dates into a record with the record type we generated above.
TransformDatesToRecord = Table.TransformColumns(
GroupValidDates,
{
"ValidDates",
each Record.FromList(
//If we have less dates than max, add nulls to the end of the list to fill it out
_ & List.Repeat({null}, MaxValidDates - List.Count(_)),
//Our record type provides Record.FromList with the field names and types
ExtractRecord
),
//Using record type again so the column type is set correctly
//(ensures properly typed columns after expansion)
ExtractRecord
}
),
//Expand the records into separate columns by referring to the field names we generated above
ExpandValidDateRecords = Table.ExpandRecordColumn(TransformDatesToRecord, "ValidDates", ValidDateFieldNames)
in
ExpandValidDateRecords
Similar to what some others have already suggested. This is still going off your original table and handling the "Given" filtering you want.
This is somewhat manually doing an unpivot action, with a separate treamtent on your two "date of measles" columns vs. the "mmr status" and "date of mmor" pairs. All steps below from Source to SortDates are performing this faux unpivot + filtering. Another way to do this (perhaps with better performance) would be to merge your status/date columns and then unpivot everything except ID and DOB, do your {birthday<, is "Given", not null} filtering, and then do a Group+Accumulate to filter our dates per the 24 day rule.
Source (note: added another row #13 to more directly test 24 day rule)
SortDates
The next steps, RemoveNullDates to ExpandValidDateRecords, then filter out dates before first birthday, group and filter our the dates that don't adhere to the 24 day rule, and then dynamically figure out the number of columns to expand.
Final table:
M code for adanced editor:
let
Source = #"Table 1",
//Manually set the column names for the ID and DOB columns
IdCols = {"People Soft ID", "DOB"},
//Column names that start with "Date of Measles"
MeaslesCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "date of measles")),
//Column names that start with "Date of MMR"
MmrDateCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "date of mmr")),
//Column names that start with "MMR Status"
MmrStatCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "mmr status")),
//Pairing the MMR Status and Date columns names together together.
//We assume that the MMR Status and Date columns are in the same order.
MmrPairedCols = List.Zip({MmrStatCols, MmrDateCols}),
//For each column name in MeaslesCols, create a table with the ID columns and dates
//from the current Measles column put into consolidation column, "Good Date"
MeaslesTables = List.Transform(
MeaslesCols,
//For each column name in MeaslesCols...
(thisMeasleCol) =>
let
//Select the ID columns and the current Measles column
idAndMeasleCol = Table.SelectColumns(Source, IdCols & {thisMeasleCol}),
//Rename the current Measles column to "Good Date"
renameMeasleCol = Table.RenameColumns(idAndMeasleCol, {thisMeasleCol, "Good Date"})
in
renameMeasleCol
),
//For each column name pair in MmrPairedCols, create a table with the ID columns and dates
//from the current MmrDate column name where corresponding status is "Given".
//Again, putting all dates into consolidation column, "Good Date"
GivenMmrTables = List.Transform(
MmrPairedCols,
//For each Status/Date pair of MMR column names...
(MmrPair) =>
let
//Per previous Zip step, status is first element of the pair (list)
mmrStatus = MmrPair{0},
//Per previous Zip step, date is second element of the pair (list)
mmrDate = MmrPair{1},
//Select the ID columns and the current MMR column pair
idAndMmrPair = Table.SelectColumns(Source, IdCols & MmrPair),
//Filter the table to only include rows where the MMR status is "Given"
filterToGiven = Table.SelectRows(idAndMmrPair, each Record.Field(_, mmrStatus) = "Given"),
//Remove the status column as no longer needed
removeStatCol = Table.RemoveColumns(filterToGiven, {mmrStatus}),
//Rename the current MMR date column to "Good Date"
renameMmrDateCol = Table.RenameColumns(removeStatCol, {mmrDate, "Good Date"})
in
renameMmrDateCol
),
//Combine all Id columns + "Good Date"s from previous steps
CombineAllDates = Table.Combine(MeaslesTables & GivenMmrTables),
// Sort the table by ID and date.
// This is necessary for our use of GroupKind.Local in the next step
SortDates = Table.Sort(CombineAllDates, {{"People Soft ID", Order.Ascending}, {"Good Date", Order.Ascending}}),
//Remove any null dates from the "Good Date" column
RemoveNullDates = Table.SelectRows(SortDates, each ([Good Date] <> null)),
//Filter out any dates that are before the first birthday.
FilterBeforeFirstBD = Table.SelectRows(RemoveNullDates, each [DOB] = null or [Good Date] >= Date.AddYears( [DOB], 1 ) ),
GroupValidDates = Table.Group(
FilterBeforeFirstBD,
{"People Soft ID", "DOB"},
{{
"ValidDates",
each List.Accumulate(
//iterate over the dates in the current group
[Good Date],
//Seed is empty list
{},
//Defining accumulator function. See here for more info: https://powerquery.how/list-accumulate/
(state, current) =>
//If the current Good Date is greater than or equal to last accepted date + 24 days
//OR there are no dates in the list yet (state is empty)
if List.IsEmpty(state) or current >= Date.AddDays( List.Last(state), 24 )
//If yes: add the current Good Date to the list of accepted dates
then state & {current}
//IF no: we don't do anything and move on to next Good Date to check
else state
),
type { date }
}},
//The order of dates that we iterate through with List.Accumulate needs to be in ascending order.
//We can ensure this by using GroupKind.Local. The default, GroupKind.Global, does not guarantee this.
GroupKind.Local
),
//Calc the maximum number of valid dates in any group.
MaxValidDates = List.Max(List.Transform(GroupValidDates[ValidDates], List.Count)),
//Create a list of all possible field names we may need
ValidDateFieldNames = List.Generate(() => 0, each _ < MaxValidDates, each _ + 1, each "MMR " & Text.From(_ + 1)),
//Create a record type with the field names we generated above.
ExtractRecord = Type.ForRecord(
Record.FromList(List.Repeat({[Type = type date, Optional = false]}, MaxValidDates), ValidDateFieldNames),
false
),
//Transform the ValidDates list of dates into a record with the record type we generated above.
TransformDatesToRecord = Table.TransformColumns(
GroupValidDates,
{
"ValidDates",
each Record.FromList(
//If we have less dates than max, add nulls to the end of the list to fill it out
_ & List.Repeat({null}, MaxValidDates - List.Count(_)),
//Our record type provides Record.FromList with the field names and types
ExtractRecord
),
//Using record type again so the column type is set correctly
//(ensures properly typed columns after expansion)
ExtractRecord
}
),
//Expand the records into separate columns by referring to the field names we generated above
ExpandValidDateRecords = Table.ExpandRecordColumn(TransformDatesToRecord, "ValidDates", ValidDateFieldNames)
in
ExpandValidDateRecords
After I mentioned it, thought I'd try out the merge + actual unpivot and it was actually more straightforward than I thought, so including here. Basically, we still figure out the MMR Status / MMR Date pairings and iterate through them with another List.Accumulate to dynamically do all the merges. And with our merge function, we extract dates when the status is "Given". Steps are the same as before once we get the unpivoted table.
let
Source = #"Table 1",
//Column names that start with "Date of MMR"
MmrDateCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "date of mmr")),
//Column names that start with "MMR Status"
MmrStatCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(Text.Lower(_), "mmr status")),
//Pairing the MMR Status and Date columns names together together.
//We assume that the MMR Status and Date columns are in the same order.
MmrPairedCols = List.Zip({MmrStatCols, MmrDateCols}),
//Go through our MMR Pairs and merge them and extract the dates where the MMR Status is "Given".
MergePairsExtractGivenDates = List.Accumulate(
//Iterating through the MMR Pairs
MmrPairedCols,
//Seed is the original table
Source,
//Defining accumulator function. See here for more info: https://powerquery.how/list-accumulate/
(inputTable,currentPair)=>
Table.CombineColumns(
inputTable,
currentPair,
(combinedVals) as nullable date =>
//First element is the MMR Status; check if it is "Given"
if combinedVals{0} = "Given"
//If yes, return the second element (the MMR date)
then combinedVals{1}
//If no, return null
else null,
//Name the new column "MMR #" where # is the current pair's index + 1
"Merge " & Text.From( List.PositionOf( MmrPairedCols, currentPair ) + 1 )
)
),
//Unpivot all the date columns we want to filter on. Nulls are automatically dropped
UnpivotDates = Table.UnpivotOtherColumns(
MergePairsExtractGivenDates,
{"People Soft ID", "DOB"},
"Old Columns", "Good Date"
),
// Sort the table by ID and date.
// This is necessary for our use of GroupKind.Local in the next step
SortDates = Table.Sort(UnpivotDates, {{"People Soft ID", Order.Ascending}, {"Good Date", Order.Ascending}}),
//Filter out any dates that are before the first birthday.
FilterBeforeFirstBD = Table.SelectRows(
SortDates,
each [DOB] = null or [Good Date] >= Date.AddYears( [DOB], 1 )
),
//Group by ID and DOB to set the scope for the List.Accumulate function
GroupValidDates = Table.Group(
FilterBeforeFirstBD,
{"People Soft ID", "DOB"},
{{
"ValidDates",
each List.Accumulate(
//iterate over the dates in the current group
[Good Date],
//Seed is empty list
{},
//Defining accumulator function. See here for more info: https://powerquery.how/list-accumulate/
(state, current) =>
//If the current Good Date is greater than or equal to last accepted date + 24 days
//OR there are no dates in the list yet (state is empty)
if List.IsEmpty(state) or current >= Date.AddDays( List.Last(state), 24 )
//If yes: add the current Good Date to the list of accepted dates
then state & {current}
//IF no: we don't do anything and move on to next Good Date to check
else state
),
type { date }
}},
//The order of dates that we iterate through with List.Accumulate needs to be in ascending order.
//We can ensure this by using GroupKind.Local. The default, GroupKind.Global, does not guarantee this.
GroupKind.Local
),
//Calc the maximum number of valid dates in any group.
MaxValidDates = List.Max(List.Transform(GroupValidDates[ValidDates], List.Count)),
//Create a list of all possible field names we may need
ValidDateFieldNames = List.Generate(
() => 0,
each _ < MaxValidDates,
each _ + 1,
each "MMR " & Text.From(_ + 1)
),
//Create a record type with the field names we generated above.
ExtractRecord = Type.ForRecord(
Record.FromList(
List.Repeat({[Type = type date, Optional = false]}, MaxValidDates),
ValidDateFieldNames
),
false
),
//Transform the ValidDates list of dates into a record with the record type we generated above.
TransformDatesToRecord = Table.TransformColumns(
GroupValidDates,
{
"ValidDates",
each Record.FromList(
//If we have less dates than max, add nulls to the end of the list to fill it out
_ & List.Repeat({null}, MaxValidDates - List.Count(_)),
//Our record type provides Record.FromList with the field names and types
ExtractRecord
),
//Using record type again so the column type is set correctly
//(ensures properly typed columns after expansion)
ExtractRecord
}
),
//Expand the records into separate columns by referring to the field names we generated above
ExpandValidDateRecords = Table.ExpandRecordColumn(TransformDatesToRecord, "ValidDates", ValidDateFieldNames)
in
ExpandValidDateRecords
It is very difficult for someone to read , i suggest putting up the query to co-pilot etc. to get the answers you want
It does looks like you are looking for a more efficient way to do what you are doing , here are my key takes on that:
In short : you need to be familiar with M to implement more efficient approaches
1. You can use the advanced editor to create a function
A function is of the following format, if you have used javascript it is similar to a arrow function
let
FunctionName = (column_name, table_name) => your function logic
2. You can use a conditional merge in power query but this option is not there in the merge query UI, you need to edit the M code for that.
3. You can pre aggregate data to optimize performance.
Thank you @Bhavnish,
Depending on my knowledge of the function, I will go into Advanced Editor, but sadly that knowledge is limited. I will see if I have access to Co-Pilot, but I doubt it.
I can however simplify my question for this post: I think what I need is a nested List.Select, however I don't know how to write the "selection as function". Is there a concise way for me to ignore nulls, sort chronologically oldest -> newest, then systematically remove dates that are less than 24 days from the previous date?
Barring that, how do I ignore nulls and sort chronologically to Table 1 so I don't have any blanks between dates in the same row?
Thank you!
Table 1:
People Soft ID | Date of MMR vaccine 1 | Date of MMR vaccine 2 | Date of MMR vaccine 3 | Date of MMR vaccine 4 | Date of Measles vaccine 1 | Date of Measles vaccine 2 |
1 | 12/27/1989 | 10/28/1969 | 12/29/1969 | |||
2 | 3/1/1971 | 11/1/1971 | ||||
3 | 11/15/1972 | 10/29/1973 | 11/15/1972 | 1/16/1982 | ||
4 | 8/1/1991 | 1/1/1974 | 8/1/1989 | |||
5 | 4/17/2000 | 10/17/2000 | 10/17/2000 | 6/15/1996 | 2/21/1998 | |
6 | 5/28/1999 | 4/1/2003 | 4/21/1998 | 5/21/1998 | ||
7 | 3/1/1982 | 9/1/1998 | ||||
8 | 10/13/2000 | 10/13/2000 | ||||
9 | 3/24/2000 | 3/24/2000 | 6/26/2003 | 6/26/2003 | ||
10 | 11/21/1996 | 11/2/2000 | 12/26/2018 | 12/26/2018 | ||
11 | 7/2/2003 | 1/22/2007 | 7/2/2008 | 7/2/2008 | ||
12 | 8/21/1989 | 4/13/1993 | 8/3/2018 | 9/4/2018 |
End Result:
People Soft ID | MMR 1 | MMR 2 | MMR 3 | MMR 4 |
1 | 10/28/1969 | 12/29/1969 | 12/27/1989 | |
2 | 3/1/1971 | 11/1/71 | ||
3 | 11/15/1972 | 10/29/1973 | 1/16/1982 | |
4 | 1/1/1974 | 8/1/1989 | 8/1/1991 | |
5 | 6/15/1996 | 2/21/1998 | 4/17/2000 | 10/17/2000 |
4/21/98 | 5/28/1998 | 5/28/1999 | 4/1/2003 | |
7 | 3/1/1982 | 9/1/1998 | ||
8 | 10/13/2000 | |||
9 | 3/24/2000 | 6/26/2003 | ||
10 | 11/21/1996 | 11/2/2000 | 12/26/2018 | |
11 | 1/22/2007 | 7/2/2008 | ||
12 | 8/21/1989 | 4/13/1993 | 8/3/2018 | 9/4/2018 |
I believe you are looking for custom sort in a particular group i.e., remove nulls and sort the dates within a group.
If this is true, try this:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dVJbDsQgCLzKxu9NELQKZ2l6/2usgLa02X6YwDjMjI99T5i+CQmoAwrLaD5hYQbisdFkssSb47snenB1FcBB6KaJq1ZymcimEE1lFesl2t0YgE0zkQnUgbApCj5DutHFYA+4DaACdqCcs1u+NKrS3FraqMdBzYlNR5HN70HENXUwBq/ngFHDcH+9JdZDCgQu/+FqzhJDz0b5YlpU13asG1BbKWPtujqN2W/c07bZnFbkQ8jPJijoS3SfsQcGsrpfMN/LMEv2XITr21U92ghSDC/LTaA+jI8f", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"People Soft ID" = _t, #"Date of MMR vaccine 1" = _t, #"Date of MMR vaccine 2" = _t, #"Date of MMR vaccine 3" = _t, #"Date of MMR vaccine 4" = _t, #"Date of Measles vaccine 1" = _t, #"Date of Measles vaccine 2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"People Soft ID", Int64.Type}, {"Date of MMR vaccine 1", type date}, {"Date of MMR vaccine 2", type date}, {"Date of MMR vaccine 3", type date}, {"Date of MMR vaccine 4", type date}, {"Date of Measles vaccine 1", type date}, {"Date of Measles vaccine 2", type date}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"People Soft ID"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"People Soft ID", Order.Ascending}, {"Value", Order.Ascending}}),
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"People Soft ID"}, {{"aalr", each _, type table [People Soft ID=nullable number, Value=date]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([aalr], "Index", 1, 1, Int64.Type)),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"People Soft ID", "Value", "Index"}, {"Custom.People Soft ID", "Custom.Value", "Custom.Index"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Custom",{{"Custom.People Soft ID", Int64.Type}, {"Custom.Value", type date}, {"Custom.Index", type text}}),
#"Added Prefix" = Table.TransformColumns(#"Changed Type1", {{"Custom.Index", each "MMR " & Text.From(_, "en-US"), type text}}),
#"Pivoted Column" = Table.Pivot(#"Added Prefix", List.Distinct(#"Added Prefix"[Custom.Index]), "Custom.Index", "Custom.Value"),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"Custom.People Soft ID", "People Soft ID"}})
in
#"Renamed Columns"
Note: If this is what you want, then you can find the optimizing ways based on your data. Like doing the pivot inside the group itself or some other ways. Let us get the basic output you need first!
Output:
To implement a custom sort within a group, such as by PeopleSoft ID, and organize the dates to display them in grid format, I used the following method in Power Query:
Import the data.
Unpivot the columns.
Sort the data.
Group the data by the desired field "PeopleSoft ID"
Create an index within each group.
Expand the grouped data and add a prefix text "MMR".
Pivot the data back into its original format.
Rename the columns as needed.
Finally, adjust the process to suit your specific requirements.
Hope this helps!
If you dont want duplicates, add the step in between Sorted Rows and Grouped Rows
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"People Soft ID", Order.Ascending}, {"Value", Order.Ascending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows"),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"People Soft ID"}, {{"aalr", each _, type table [People Soft ID=nullable number, Value=date]}}),
Then the outputput will be
Thank you @sevenhills ,
I added in the step for removing duplicates, and while that worked, the sort didn't work regardless if the removing duplicate step was included, or was before or after the sort step.
The co pilot option is there right beside the search bar in microsoft edge, it is available to everyone:
I have added my own comments after the # in the steps
let
// Sample list of dates
Dates = {#date(2023, 1, 1), null, #date(2023, 1, 15), #date(2023, 2, 1), #date(2023, 2, 10), #date(2023, 3, 5)},
// Step 1: Remove nulls
NonNullDates = List.RemoveNulls(Dates), #make sure that using this step doesn't affects the applied steps later
// Step 2: Sort chronologically
SortedDates = List.Sort(NonNullDates # since the last step name in this case is NonNullDates, Order.Ascending),
// Step 3: Filter dates with a nested List.Select
Result = List.Accumulate(SortedDates # you need to ensure that over here you put the last applied step name and not sorted Dates, {}, (accumulator, currentDate) =>
if List.IsEmpty(accumulator) or Duration.Days(currentDate - List.Last(accumulator)) >= 24
then accumulator & {currentDate}
else accumulator
)
in
Result # this is the name of the last applied step
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
16 | |
10 | |
10 | |
8 | |
7 |