Solved: Re: filtering on multiple columns

wynhodgkiss · ‎05-19-2025

Hi,

I have a really simple table of: FileName, Index (from 1 with 1 being the latest received file) and WeekRef (6 digit text that contains the weeks in each file).

WeekRef will duplicate as each file contains 12 weeks of data so what I need to do is filter the table to display only the FileName and WeekRef with the lowest index.

The end result is a list of files ensuring all WeekRefs are present, most recent & not duplicated.

For example WeekRef 202507 will appear 12 times with an Index between 1-13 and I only want 1, WeekRef 202347 will appear 12 times with an Index between 65-77 and I only want 65.

**Needs to be in power query, sources are multiple csvs on OneDrive

Thanks

v-pgoloju · ‎05-20-2025

Hi @wynhodgkiss,

Thank you for reaching out to the Microsoft Fabric Forum Community.

You can try the following Power Query code to achieve your goal(Modify according to your requirement).

Source = Folder.Files("C:\Users\YourName\OneDrive\YourFolderPath"),
FilteredCSVs = Table.SelectRows(Source, each Text.EndsWith([Extension], ".csv")),
Imported = Table.AddColumn(FilteredCSVs, "Content", each Csv.Document(File.Contents([Folder Path] & [Name]), [Delimiter=",", Columns=3, Encoding=65001, QuoteStyle=QuoteStyle.None])),
Expanded = Table.ExpandTableColumn(Imported, "Content", {"Column1", "Column2", "Column3"}, {"WeekRef", "Index", "Name"}),

// Change data types
ChangedTypes = Table.TransformColumnTypes(Expanded, {{"WeekRef", type text}, {"Index", Int64.Type}, {"Name", type text}}),

// Sort by Index (ascending)
Sorted = Table.Sort(ChangedTypes, {{"Index", Order.Ascending}}),

// Remove duplicates based on WeekRef
Deduplicated = Table.Distinct(Sorted, {"WeekRef"}),

// Select final columns
Final = Table.SelectColumns(Deduplicated, {"WeekRef", "Index", "Name"})
in
Final

As a side note, is there any possibility that a newer file could have a higher Index due to sync delays or upload timing? If so, how would you suggest handling that edge case?

If this solution helped, please consider marking the response as accepted and giving it a thumbs-up so others can benefit as well.

Best regards,
Prasanna Kumar

View solution in original post

wynhodgkiss · ‎05-20-2025

Thanks to @burakkaragoz & @v-pgoloju , a combination of both with a coupe of tweaks got me exactly what I needed😁

burakkaragoz · ‎05-20-2025

Hi @wynhodgkiss ,

If I got you right, you're trying to filter your table so that only the latest file (Index = 1) is shown, and from that, just the unique WeekRef values?

You can try this approach in Power Query:

First, filter your table where Index = 1.
Then, remove duplicates based on the WeekRef column.
Finally, keep only the columns you need (like FileName and WeekRef).

That should give you a clean list of WeekRefs from the latest file only.

Let me know if you need help with the exact steps in Power Query!

If my response resolved your query, kindly mark it as the Accepted Solution to assist others. Additionally, I would be grateful for a 'Kudos' if you found my response helpful.
translation and formatting supported by AI

wynhodgkiss · ‎05-20-2025

No, I will always want the lines of data where Index = 1 because that is my latest data but this is only 12 weeks of data and as it is a rolling 12 weeks each week will appear in 12 files. For each WeekRef I need to find the latest file it appears in. WeekRef 202507 for example will appear in every weekly file from 17th Feb 2025 to 12 May 2025 (12 weeks) but I only want the latest. Week 202507 is not in the file from 12th May as it is over 12 weeks but is in the file from 5th May (and every file for the 12 weeks previous to that) so I want only the latest.

From the file with an index of 1 (latest file) I want all data.

From the file with an index of 2, I want only the lines where the WeekRef is not in the file with an index of 1.

From the file with an index of 3, I want only the lines where the WeekRef is not in 1 or 2.

From the file with an index of 4, I want only the lines where the WeekRef is not in 1, 2 or 3 etc etc

burakkaragoz · ‎05-20-2025

Hey, thanks for the clarification — that makes perfect sense now!

What you're looking for is:
For each WeekRef, keep only the row where it appears with the lowest Index value.

You can do this in Power Query like this:

Sort your table by Index ascending.
Group by WeekRef, and for each group, keep the first row (which will have the lowest Index).
Expand the grouped table back out.

This way, you'll always get the latest appearance of each WeekRef based on your rolling logic.
translation and formatting supported by AI

wynhodgkiss · ‎05-20-2025

Yes, that's exactly what I need but what do you mean by step 3?

So far: #"Sorted Rows1" = Table.Sort(#"Removed Duplicates",{{"Index", Order.Ascending}, {"WeekRef", Order.Descending}}),
#"Grouped Rows" = Table.Group(#"Sorted Rows1", {"WeekRef"}, {{"Latest", each List.Min([Index]), type number}}),

v-pgoloju · ‎05-20-2025