Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

A new Data Days event is coming soon! This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. Don't miss out.

Reply
JPorterValmetal
Frequent Visitor

Filter Old Draft Folders

Good day everyone. I am having quite a problem trying to figure out how to accomplish my goal using Power Query, if it's even possible.

 

I keep a running index of all of the projects that my team works on at work. I am trying to figure out if there is a way to use Power Query to get the most recent file (date modified) from a group of subfolders, and filter out the subfolders that do not contain that file. I'll do my best to explain the file structure below, as I believe that is what is causing problems.

 

Root Folder is Projects/2026. Then within 2026 there are project number folders labeled PR-26001 etc. Then within PR-26001 there are folders labeled Draft 01, Draft 02, Draft 03, etc. Finally, within that "Draft" folders there is a folder labeled "Equipment Proposal". I want to be able to get the most recent file from all of the "Equipment Proposal" folders of all of the "PR-26***" folders, and get rid of the "Draft" folders that do not contain that file. The final output should be the folder path to the most recent file, as well as it's date modified.

 

One thing to mention that is also giving me problems, is the name of the file I'm trying to get does not stay consistant.

 

Please let me know if you have any questions that may help you understand what I'm trying to accomplish.

 

1 ACCEPTED SOLUTION

The grouping is currently set at the Draft folder level, which is why you’re seeing one file per Draft instead of one per PR folder.

 

You should group at the PR-26*** level so all Draft folders are considered together. Then, use the maximum modified date within each group and filter to that row. This will give you only the most recent file across all Draft folders for each PR.

If you still get multiple rows, sort within each group and keep just the top row to ensure a single result.

 

Please try this approach and let us know how it works for you.

View solution in original post

19 REPLIES 19
ralf_anton
Resolver I
Resolver I


@JPorterValmetal wrote:

Wie erwähnt, brauche ich nur die neueste Datei aus dem neuesten "Draft"-Ordner jedes "PR-26***"-Ordners. Deshalb probiere ich den zweiten Codesatz aus. Ich habe die Fehler behoben, aber ich habe etwa 20 Minuten gewartet und es lädt immer noch alle Schritte.

*****************************************************************************************************************

In diesem Fall brauchst Du tatsächlich den unteren Code. Aber wenn Du 20 Minuten brauchst, dann stimmt da was nicht. Da hat sich sicher ein Kopier- oder Übersetzungsfehler eingeschlichen.

 

Hab Dir mal einen Code gemacht, bei dem zumindest Übersetzungsfehler ausgeschlossen sein dürften. Probier mal den:

let 
    Quelle = Folder.Files("C:\Users\User\OneDrive\Dokumente\All_PQ\Beispiele\Mehrere Daten aus einem Quellordner auslesen\CSV Files Mit Unterordner"),
    LowerKeys = Table.TransformColumns(Quelle,{{"Extension", Text.Lower, type text}}),
    NurCSV = Table.SelectRows(LowerKeys, each ([Extension] = ".csv")),
    FolderGroups = Table.Group(NurCSV, {"Folder Path"}, {{"Foldergroup", each _, type table [Content=binary, Name=text, Extension=text, Date accessed=nullable datetime, Date modified=nullable datetime, Date created=nullable datetime, Attributes=nullable record, Folder Path=text]}}),
    AddGroupIndex = Table.AddIndexColumn(FolderGroups, "GroupIndex", 0, 1, Int64.Type),
    ExpandGI = Table.ExpandTableColumn(AddGroupIndex, "Foldergroup", {"Content", "Date modified", "Name"}, {"Content", "Date modified", "Name"}),
    SortRows = Table.Sort(ExpandGI,{{"GroupIndex", Order.Ascending}, {"Date modified", Order.Descending}}),

    Filter = Table.AddColumn(SortRows, "LastDat", 
                            each 
                                let                                     
                                    GrInd = [GroupIndex] 
                                in
                                    List.Max(Table.Column(Table.SelectRows(SortRows, 
                                                                            each  [GroupIndex] = GrInd  ),"Date modified"))),
                                
    AddCheckCol = Table.AddColumn(Filter, "Check", each [Date modified] = [LastDat]),
    LatestFiles = Table.SelectRows(AddCheckCol, each ([Check] = true)),
    RemoveOtherCols = Table.SelectColumns(LatestFiles,{"Folder Path", "Name", "Date modified", "Content"})
in
    RemoveOtherCols

 

Was mir noch einfällt:

Wenn die Quelldaten in Excelfiles stehen, muss der Code natürlich auf Excelfiles angepasst werden! 

Im Schritt:

 NurCSV = Table.SelectRows(LowerKeys, each ([Extension] = ".csv")),

muss der Filter von .csv auf .xlsx geändert werden.

 

Falls aber auch Makro- oder Vorlagendateien in den Ordnern stehen und ausgewertet werden sollen, kannst Du den Schritt aber auch so abändern:

NurCSV = Table.SelectRows(LowerKeys, each Text.Contains([Extension],".xl")),

...oder nach welchen Dateityp (doc, txt, jpg, mpeg,....) Du auch immer suchst...

ralf_anton
Resolver I
Resolver I

Du nutzt offenbar den unteren Code. Da Du aber immer nur EINE Datei sehen willst, musst Du den oberen nutzen. Ich füge ihn Dir hier nochmal ein:

let
    //Den Ordner auswählen, in welchem sich die Dateien befinden - Pfad anpassen
    //evtl. vorhandene Unterordner werden berücksichtigt
    Quelle = Folder.Files("C:\Users\User\OneDrive\Dokumente\All_PQ\Beispiele\Mehrere Daten aus einem Quellordner auslesen\CSV Files Mit Unterordner"),
    LowExtension = Table.TransformColumns(Quelle,{{"Extension", Text.Lower, type text}}),
    //Nur csv-Dateien anzeigen
    NurCSV = Table.SelectRows(LowExtension, each ([Extension] = ".csv")),
    //Nach Datei(en) mit neuestem Bearbeitungsdatum filtern
    LastModified = Table.SelectRows(NurCSV, let latest = List.Max(NurCSV[Date modified]) in each [Date modified] = latest),
    SpalteUmbenennen = Table.RenameColumns(LastModified, {{"Name", "Datei"}}),
    DelColumns = Table.SelectColumns(SpalteUmbenennen, { "Content", "Datei"}),
    #"Importierte CSV" = Csv.Document(DelColumns[Content]{0},[Delimiter=";", Columns=4, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    HeaderHoch = Table.PromoteHeaders(#"Importierte CSV", [PromoteAllScalars=true])
in
    HeaderHoch

Die Fehlermeldung ist tatsächlich der Übersetzung geschuldet, denn die vermisste Spalte FolderGroup heißt im Original:

"Ordnergruppe".

War aber, wie gesagt, sowieso der falsche Code, den Du benutzt hast.

 

Den Code sicherheitshalber nicht übersetzen lassen... 🙂 Zumindest die in meinem ersten Post grün dargestellten Texte nicht. Oder wird hier auch der Code generell übersetzt? Wär ja blöd...

Falls nicht, dann einfach den Code von hier kopieren und in Deinem Editor einfügen.

I got the the code you put here to work, however it's not what I want. It only shows one single file that is the newest from the entire source data.

 

As I mentioned I need the newest file from only the newest "Draft" folder of each "PR-26***" folder. That's why I'm trying the second code set. I got the errors fixed, but I've waited about 20 minutes and it's still loading all of the steps.

ralf_anton
Resolver I
Resolver I

Hi,

 

da es sehr unwahrscheinlich ist, dass mehrere User mehrere Dateien sekundengenau gleichzeitig ändern, kommt eigentlich immer nur eine Datei in Frage. Ich hab Dir mal ein Beispiel für mehrere csv Dateien in einem Ordner inklusive Unterordner gemacht. Im ersten Beispiel wird nur die neueste Datei aller Ordner ausgegeben.

Um die neuesten Dateien aller Ordner und Unterorder zu bestimmen, müsste noch nach den Ordnern gruppiert werden (im Beispiel 2).

Für Exceldateien ist das Vorgehen ein wenig anders, da die Daten sowohl in normalen als auch in formatierten Tabellen oder Namensbereichen liegen können. Es müsste deshalb noch nach der Art der Datenquelle gefiltert werden. Aus Unkenntnis Deiner Struktur beschränke ich mich deshalb auf csv Dateien.

 

let
    //Den Ordner auswählen, in welchem sich die Dateien befinden - Pfad anpassen
    //evtl. vorhandene Unterordner werden berücksichtigt
    Quelle = Folder.Files("C:\Users\User\OneDrive\Dokumente\All_PQ\Beispiele\Mehrere Daten aus einem Quellordner auslesen\CSV Files Mit Unterordner"),
    LowExtension = Table.TransformColumns(Quelle,{{"Extension", Text.Lower, type text}}),
    //Nur csv-Dateien anzeigen
    NurCSV = Table.SelectRows(LowExtension, each ([Extension] = ".csv")),
    //Nach Datei(en) mit neuestem Bearbeitungsdatum filtern
    LastModified = Table.SelectRows(NurCSV, let latest = List.Max(NurCSV[Date modified]) in each [Date modified] = latest),
    SpalteUmbenennen = Table.RenameColumns(LastModified, {{"Name", "Datei"}}),
    DelColumns = Table.SelectColumns(SpalteUmbenennen, { "Content", "Datei"}),
    #"Importierte CSV" = Csv.Document(DelColumns[Content]{0},[Delimiter=";", Columns=4, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    HeaderHoch = Table.PromoteHeaders(#"Importierte CSV", [PromoteAllScalars=true])
in
    HeaderHoch

Im folgenden Beispiel wird die jeweils zuletzt geänderte Datei auch aller Unterordner angezeigt: Also für jeden Ordner die jeweils letzte....

Das Ergebnis sähe in etwa so aus:
A Bild fürs Forum.jpg

 

let 
    Quelle = Folder.Files("C:\Users\User\OneDrive\Dokumente\All_PQ\Beispiele\Mehrere Daten aus einem Quellordner auslesen\CSV Files Mit Unterordner"),
    Kleinbuchstaben = Table.TransformColumns(Quelle,{{"Extension", Text.Lower, type text}}),
    NurCSV = Table.SelectRows(Kleinbuchstaben, each ([Extension] = ".csv")),
    Pfadgruppen = Table.Group(NurCSV, {"Folder Path"}, {{"Ordnergruppe", each _, type table [Content=binary, Name=text, Extension=text, Date accessed=nullable datetime, Date modified=nullable datetime, Date created=nullable datetime, Attributes=nullable record, Folder Path=text]}}),
    AddGruppenindex = Table.AddIndexColumn(Pfadgruppen, "Gruppenindex", 0, 1, Int64.Type),
    ExpandGI = Table.ExpandTableColumn(AddGruppenindex, "Ordnergruppe", {"Content", "Date modified", "Name"}, {"Content", "Date modified", "Name"}),
    #"Sortierte Zeilen" = Table.Sort(ExpandGI,{{"Gruppenindex", Order.Ascending}, {"Date modified", Order.Descending}}),
    fkt = (DatListe as list, GrIndex as number) =>
        let 
            letzte = Table.SelectRows(#"Sortierte Zeilen",each [Gruppenindex] = GrIndex)[Date modified]{0} 
        in letzte,     
    Liste = Table.AddColumn(#"Sortierte Zeilen", "Liste", each fkt(#"Sortierte Zeilen"[Date modified],[Gruppenindex])),
    Checken = Table.AddColumn(Liste, "Behalten", each [Date modified] = [Liste]),
    JustFirstFile = Table.SelectRows(Checken, each ([Behalten] = true)),
    #"Andere entfernte Spalten" = Table.SelectColumns(JustFirstFile,{"Content","Folder Path", "Name",  "Date modified"})
in
    #"Andere entfernte Spalten"

 

Da ich die Ordnerstruktur jetzt nicht nachstellen will, musst Du aus der Quelle in der Spalte Folder Path noch alle unerwünschten Ordner herausfiltern.

 

In der Abfrage befindet sich noch die Spalte Content, die Du brauchst, wenn Du die Dateien weiter bearbeiten willst. Aber das war ja hier nicht gefragt.

I'm trying to follow your second example code, but it's a little difficult with the language barrier. I've used Google Translate to change the code into English. The copy pasted it.

 

I'm getting an error in the code on the "let last = table.selectrows" section. It says "Token ',' expected". I don't know how to fix this.

Alright. I was able to get the syntax errors fixed. Now I am running into a different error.

JPorterValmetal_0-1776113419211.png

 

lbendlin
Super User
Super User

Should be possible with Power Query but it may not be fast.  How many files overall are in that folder structure?

 

Also note that "Last Modified"  is a slippery slope.  Very easy to get false positives on that.

The amount of files is really dependent on how many drafts we end up with. On average I would say propably 5 files counting everything from the PR-26*** folder down. I don't really need it to be fast. 

Connect to the Sharepoint site via "Sharepoint Folders", click "Transform data"  *

Filter for the desired file name patterns (and folder names if needed)

Sort by Last Modified Date descending

Keep the top 1 row.

 

*) This will give you all the files in all the folders on that sharepoint site.

I think maybe explanation of what the final result should be was confusing. Keep top row 1 gives me only the most recent file from the entire PR-26*** list of files. I want the most recent file from the group of "Draft" folders in each PR-26*** folder with the file path to each file. Also I should mention there is no SharePoint involved, so this would all be "from file - from folder". 

Connect to the parent folder, transform data

Group by Folder Path, aggregate by Max Last Modified Date but also include "All Rows"

Expand the "Rows" column

filter all rows where the Last Modified Date equals the Max Last Modified Date

I'll give that a shot on Monday when I'm at work and let you know how it goes. 

Just following up did you have a chance to test this on your side? Please let us know how it went or if you’re still experiencing the same issue.

I'm a little confused on the last step. When I add a date filter by equals, I have to choose a specific date. Do I need to make a custom collumn comparing the Last Modified Date, and the Max Modified Date. Then filter out everything that is false from the custom column?

I used the custom column that I described above and I'm still having the same issue. It's not getting rid of the "draft" folders that have older files in them. It's giving me the latest file that is in every "draft" folder.

 

Here is a screenshot of the folder structure when there are multiple draft folders.

JPorterValmetal_1-1776096291281.png

 

I want the final result to show the most recent file path out of the "draft" folders. If draft 4 has the most recent file, show only that file path. If draft 02 has the most recent file show only the draft 02 file path.

 

The current result I am getting is the file path for the most recent file in ALL of the draft folders.

 

 

 

The grouping is currently set at the Draft folder level, which is why you’re seeing one file per Draft instead of one per PR folder.

 

You should group at the PR-26*** level so all Draft folders are considered together. Then, use the maximum modified date within each group and filter to that row. This will give you only the most recent file across all Draft folders for each PR.

If you still get multiple rows, sort within each group and keep just the top row to ensure a single result.

 

Please try this approach and let us know how it works for you.

Ok. I I got it. I'm not sure if I did it the best way, but I made it work. I split my file path column using "\" as a delimeter. Grouped the PR-26*** folders based on max date modified. I made a new query using the same source folder, and did an inner merge on the two queries using max modified date and last modified date.

I'm very new to Power Query as you can probably tell. How do change the level at which the grouping is done? I've always used the "Group By" button.



@JPorterValmetal wrote:

I want the final result to show the most recent file path out of the "draft" folders. If draft 4 has the most recent file, show only that file path. If draft 02 has the most recent file show only the draft 02 file path.

 

Dann verwende den oberen Beispielcode. Und wie ich schon sagte: 

... solltest Du aus der Quelle in der Spalte Folder Path noch alle unerwünschten Ordner herausfiltern.

 


Helpful resources

Announcements
May Power BI Update Carousel

Power BI Monthly Update - May 2026

Check out the May 2026 Power BI update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.