Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hello, I hope you can help me with the following:
For context, I have a database of trips, so I have columns for bus ID, dates, times, and so on, as shown below (I'll trim the number of columns for practicality):
Bus_ID | Date | Scheduled_TimeDep | Calculate_TimeDep | Index |
AYM229 | 27/04/2023 | 08:00:00 | 08:05:00 | 0 |
FQC453 | 27/04/2023 | 08:21:00 | - | 1 |
FQC453 | 27/04/2023 | 08:24:00 | 08:30:00 | 2 |
LNE817 | 27/04/2023 | 08:54:00 | 08:59:00 | 3 |
As can be seen, some trips have their time adjusted with respect to the scheduled time (whether for earlier or later), which in theory generates duplicates of the same trip, as can be seen in rows 1 and 2 of the Index.
What I would like is to be able to compare the rows with the same ID and remove one of these based on criteria, resulting in a table like this:
Bus_ID | Date | Scheduled_TimeDep | Calculate_TimeDep | Index |
AYM229 | 27/04/2023 | 08:00:00 | 08:05:00 | 0 |
FQC453 | 27/04/2023 | 08:24:00 | 08:30:00 | 2 |
LNE817 | 27/04/2023 | 08:54:00 | 08:59:00 | 3 |
Initially I was thinking of comparing the rows in a column and then raising an error on the row to be removed and then using "Table.RemoveRowsWithErrors".
To compare the rows, I have tried using what some call "Shift" to compare rows with respect to their position using the index column (the idea is to compare it with the previous row), but due to the number of rows and columns it takes too long.
The codebase of what I thought to call the value of the previous row is this:
let
Source = BBDD,
#"Add Index" = Table.AddIndexColumn(BBDD, "Index", 0, 1, Int64.Type),
#"Add ID2" = Table.AddColumn(#"Add Index", "ID_Previus", each if [Index]>0 then #"Add Index"[Bus_ID]{[Index]-1} else null)
in
#"Add ID2"
I appreciate it if someone has already faced a similar problem can help me.
Hi @Anonymous,
Try adding a boolean filter with something like this:
let
BusID = List.Buffer( BBDD[Bus_ID] ),
Source = BBDD,
t = Table.FromColumns(
Table.ToColumns( BBDD ) &
{
List.Transform(
List.Zip( {BusID, List.Skip( BusID, 1)} ),
each _{0} = _{1}
)
}, Table.ColumnNames( BBDD ) & { "Check" }
)
in
t
Ps. If this helps solve your query please mark this post as Solution, thanks!
Hi, @m_dekorte
Thank you very much for your solution
Sorry to reply so late
As I see that solution works in that example and I guess with little data. The drawback I have is that the database has about 63000 data per month. And when carrying out the process that you propose, it takes a long time (normally each step takes a maximum of 2 minutes, and in that case it took almost 2 hours and did not finish), and since I hope to carry out the process for the data of a complete year, the amount of data will be much higher.
I have noticed that in the lower right part, where the process load size is shown (I don't know if I am mentioning it correctly), it usually ends up at 33.9MB, but when performing the step of comparing the two rows it can go up to 1.3GB.
I appreciate if with these details you know of an alternative solution
Hi @Anonymous
Sorry to hear that didn't work for you, see if this does a better job
let
Source = BBDD,
GroupLocal = Table.Group( Source, {"Bus_ID"},
{
{"t", each Table.LastN( _, 1), type table [Bus_ID=nullable text, Date=nullable text, Scheduled_TimeDep=nullable text, Calculate_TimeDep=nullable text, Index=nullable text]}
}, GroupKind.Local
),
ExpandTbl = Table.ExpandTableColumn(GroupLocal, "t", List.Skip( Table.ColumnNames( Source), 1 ))
in
ExpandTbl
Returns the same result, but doesn't require an Index or list operation.
Ps. If this helps solve your query please mark this post as Solution, thanks!
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 19 | |
| 10 | |
| 9 | |
| 7 | |
| 6 |