Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
kblommer
Frequent Visitor

Consolidate Rows based on Dates within range of each other

Hello,

I have a table with two columns: ID and Date.

 

The ID column is a 20-digit whole number

The Date column is in the format MM/DD/YYYY HH:MM:SS.

 

There are some rows with matching IDs. Of the rows with matching IDs, some have dates within X days of each other, say 3 days. I am trying to consolidate the rows with matching IDs AND dates within 3 days of each other. Is there any easy way to do this?

 

Ideally, the finished product would be a table with the following three columns:

  1. Column with unique IDs
  2. (Probably multiple) column(s) with each date that is not within three days of another for each ID
  3. The total number of unique dates per ID (sum of columns in previous bullet point)

Thanks in advance,

Kyle

1 ACCEPTED SOLUTION
edhans
Super User
Super User

Ok @kblommer - see if this helps.

 

Original Data I used as a test (the comment is to show you what I will recognize as a repeat repair after the query. I didn't cheat and put that in my source queries 😉 )

 

ID Date Comment
1 1/1/2020  
1 1/3/2020 Repeat
2 2/1/2020  
3 1/13/2020  
4 2/20/2020  
5 1/30/2020  
5 1/31/2020 Repeat
6 1/21/2020  
6 2/21/2020  
6 3/4/2020  
7 1/18/2020  
7 3/3/2020  
8 1/1/2020  
9 1/28/2020  
10 3/31/2020  
10 4/1/2020 Repeat
10 5/1/2020  

 

In the file below, I did three queries:

  1. Repairs - this is the original table and I did nothing to it really other than the formatting and get rid of my cheating comment field.
  2. Repairs Near Date
    1. This is using Repairs as the source
    2. Added an index to keep track of the original data.
    3. Added a list of the next two dates. So Jan 1 in the date column would generate a list of Jan 2 and Jan 3. You could adjust that logic in the "Added Near Dates" step.
    4. Expanded the Near Dates column. I now have massive duplication.
  3. Matching Date Range
    1. Used Repairs Near Date query above as the source
    2. Did a left join to the original Repairs query matching the date and repair ID number, then expanded the query. You'll note now most are null. Those that are not null are the culprits.
    3. I grouped the ID, Date, and Index column and threw everything else into an All Rows nested table. You'll note I'm back to my original 17 rows in the source data. All duplications are in the nested table.
    4. I added a new column called Matching Row Record that uses Table.Max to get the maximum value for the repeated Reapair.ID (not the original ID) column. I just wanted to get rows with data and nulls where there were no matches. 
    5. Expanded that maximum record. Now I know what is matching but I need to find a way to filter that out from the source. For example, I now know that the ID#1 for Jan 1 has a match for ID#1 for Jan 3, but don't yet have a way to get rid of the ID#1 Jan 3 record in the original ID/Date columns.
    6. Filled the expanded records down, filling in all of the nulls.
    7. Added a column that returns TRUE/FALSE if the ID and Date match. It returns 3 trues, my repeat repairs. I then filter out all TRUE values.
    8. Now down to 14 records - 17 original - 3 repeats.
    9. Used the "Matching Repair Date" function below to find those that are the repeats in the 3 day range. If they are, keep the repair date, otherwise return null.
    10. Kept the ID, Date, and Matching Repair date columns.
    11. The end.

 

 

Matching Repair Date Formula

if ([ID] = [Repairs.ID]) and (Number.Abs(Number.From([Repairs.Date]) - Number.From([Date])) < 3)
then [Repairs.Date]
else null

 

 

So my end data looks like this:

2020-02-15 17_06_08-20200213 - Grouping Dates - Excel.png

 

I did all of this in Excel, but it is the exact same logic in Power BI's Power Query tool. Here is my Excel file.

 

Let me know if that works for you.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

11 REPLIES 11
edhans
Super User
Super User

Ok @kblommer - see if this helps.

 

Original Data I used as a test (the comment is to show you what I will recognize as a repeat repair after the query. I didn't cheat and put that in my source queries 😉 )

 

ID Date Comment
1 1/1/2020  
1 1/3/2020 Repeat
2 2/1/2020  
3 1/13/2020  
4 2/20/2020  
5 1/30/2020  
5 1/31/2020 Repeat
6 1/21/2020  
6 2/21/2020  
6 3/4/2020  
7 1/18/2020  
7 3/3/2020  
8 1/1/2020  
9 1/28/2020  
10 3/31/2020  
10 4/1/2020 Repeat
10 5/1/2020  

 

In the file below, I did three queries:

  1. Repairs - this is the original table and I did nothing to it really other than the formatting and get rid of my cheating comment field.
  2. Repairs Near Date
    1. This is using Repairs as the source
    2. Added an index to keep track of the original data.
    3. Added a list of the next two dates. So Jan 1 in the date column would generate a list of Jan 2 and Jan 3. You could adjust that logic in the "Added Near Dates" step.
    4. Expanded the Near Dates column. I now have massive duplication.
  3. Matching Date Range
    1. Used Repairs Near Date query above as the source
    2. Did a left join to the original Repairs query matching the date and repair ID number, then expanded the query. You'll note now most are null. Those that are not null are the culprits.
    3. I grouped the ID, Date, and Index column and threw everything else into an All Rows nested table. You'll note I'm back to my original 17 rows in the source data. All duplications are in the nested table.
    4. I added a new column called Matching Row Record that uses Table.Max to get the maximum value for the repeated Reapair.ID (not the original ID) column. I just wanted to get rows with data and nulls where there were no matches. 
    5. Expanded that maximum record. Now I know what is matching but I need to find a way to filter that out from the source. For example, I now know that the ID#1 for Jan 1 has a match for ID#1 for Jan 3, but don't yet have a way to get rid of the ID#1 Jan 3 record in the original ID/Date columns.
    6. Filled the expanded records down, filling in all of the nulls.
    7. Added a column that returns TRUE/FALSE if the ID and Date match. It returns 3 trues, my repeat repairs. I then filter out all TRUE values.
    8. Now down to 14 records - 17 original - 3 repeats.
    9. Used the "Matching Repair Date" function below to find those that are the repeats in the 3 day range. If they are, keep the repair date, otherwise return null.
    10. Kept the ID, Date, and Matching Repair date columns.
    11. The end.

 

 

Matching Repair Date Formula

if ([ID] = [Repairs.ID]) and (Number.Abs(Number.From([Repairs.Date]) - Number.From([Date])) < 3)
then [Repairs.Date]
else null

 

 

So my end data looks like this:

2020-02-15 17_06_08-20200213 - Grouping Dates - Excel.png

 

I did all of this in Excel, but it is the exact same logic in Power BI's Power Query tool. Here is my Excel file.

 

Let me know if that works for you.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans thank you! I'll probably end up doing this in Power Query as we want it to be a live table updating as the excel sheet gets updated.

 

I really appreciate the help, this is exactly what I was looking for!

Great @kblommer . Hope the rest of your project is successful.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Jimmy801
Community Champion
Community Champion

Hello @kblommer 

 

what do you think of such an approach?

let
	Source = #table
	(
		{"ID","Date"},
		{
			{"1","Jan 1, 2020"},	{"1","Jan 2, 2020"},	{"1","Jan 3, 2020"},	{"1","Jan 4, 2020"},	{"1","Jan 5, 2020"},	{"2","Jan 1, 2020"},	{"2","Jan 2, 2020"},	{"2","Jan 3, 2020"},	
			{"2","Jan 4, 2020"},	{"2","Jan 5, 2020"}
		}
	),
	ToDate = Table.TransformColumns
	(
		Source,
		{
			{
				"Date",
				each Date.FromText(_,"en-US"),
				type date
			}
		}
	),
    Sort =Table.Buffer( Table.Sort
	(
		ToDate,
		{{"ID", Order.Ascending}, {"Date", Order.Ascending}}
	)),
    Group = Table.Group
	(
		Sort, 
		{"ID", "Date"}, 
		{{"AllRows", each _, type table [ID=text, Date=date]}},
		GroupKind.Local,
		(group,current) => if Value.Compare(current[ID],group[ID])=0 and current[Date]-group[Date]<#duration(3,0,0,0) then 0 else 1
		

	),
    DeleteColumn = Table.RemoveColumns(Group,{"Date"}),
    Group2 = Table.Group(DeleteColumn, {"ID"}, {{"AllRows", each try Table.AddIndexColumn(Table.FromList({_[AllRows]{0}, Table.Combine(List.Range(_[AllRows],1))},Splitter.SplitByNothing()),"Index",1) otherwise Table.AddIndexColumn(Table.FromList( {_[AllRows]},Splitter.SplitByNothing()),"Index", 1)}}),
    ExpandAllRows = Table.ExpandTableColumn(Group2, "AllRows", {"Column1","Index"}, {"Dates","Index"}),
    PivotColumn = Table.Pivot(Table.TransformColumnTypes(ExpandAllRows, {{"Index", type text}}, "de-DE"), List.Distinct(Table.TransformColumnTypes(ExpandAllRows, {{"Index", type text}}, "de-DE")[Index]), "Index", "Dates"),
    RenameColumn = Table.RenameColumns(PivotColumn,{{"1", "Within date Range"}, {"2", "Out of date Range"}})
in
    RenameColumn

 

Copy paste this code to the advanced editor in a new blank query to see how the solution works. If this solution fits your need, copy and past a part of it and implement it in your query.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

Thanks Jimmy,

 

So it looks like with your approach we'd end up with embedded tables with all the dates. Do you know if there's a way I could pull out the dates from the embedded tables and add them into the main table adjacent to their respective IDs?

 

Thanks,

Kyle

 

edhans
Super User
Super User

Not sure I understand your logic requirements. For example, how would you match the following:

 

IDDate
1Jan 1, 2020
1Jan 2, 2020
1Jan 3, 2020
1Jan 4, 2020
1Jan 5, 2020

 

would those get matched to one record? The Jan 1 and Jan 5 dates are more than 3 days apart, but Jan 5 is 1 day from Jan 4, which is 3 days from Jan 1.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

There won't be a case where there is that many dates that closely spaced.

 

The table that I have is a repair log. The table gets updated by those who do the repairs and they fill in the time of the repair. Some repairs require multiple people or a long time (~1 day). Thus, sometimes multiple entries are done into the table either because more information needed to be logged as the repair progressed or another person came in and made an addional entry. No repairs are going to take longer than 3 days, so there won't be a log like your example of 5 successive days. If there are two entries separated by more than 3 days, they are separate repairs.

 

Does that help clarify?

Yes, but if there are more than 3 days separating and are separate repairs, won't the IDs be unique? Or are the IDs like a customer number, not a repair number?

 

I need to work up some fake data here to play with.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans Sorry, didn't make that clear. The IDs are fiber numbers and we are repairing fibers. They are not Repair numbers.

 

The data we have covers a 3 year period, so there are multiple repairs for each ID over that time frame.

Ok. Maybe someone else will jump in first. Otherwise, I need to think about this. I think the key is to join the table to itself and do some comparisons.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans sorry forgot to link you on my reply above. Thanks for your help!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors