Solved: Get all combined subDirectories with Folder.Conten...

ESE_fh · ‎03-17-2023

Dear Pro-Users, most likely an easy question,
What is a performant PQ best practice to query Folder.Contents (directories only) from different servers/directories?
starting from a table with all directories to query, (maybe) using the function Folder.Contents()
in combination with each or (_)=> or better nested functions (Attributes: Directory = true)?

let
dirs = Table.SelectColumns(Table.FromRecords({  
            [dirID = 001, path = "\\server01\MainFolder01"],  
            [dirID = 002, path = "\\server01\MainFolder02"] ,  
            [dirID = ..., path = "\\server..\MainFolder.."] ,  
            [dirID = 00N, path = "\\server0M\MainFolder0K"]  
      }))[path],
dirsAsTxt = Lines.toText(Table.ToList(dirs)),   // probably not needed
CombTbls = Table.Combine()
in
CombTbls

// for each item in dirsAsTxt ... next
(dir) as text =>
let
   SubFolders = Folder.Contents(dir as text),
   ListSubFolders = Do.SomeTransforms(...)
in
ListSubFolders

Kind Regards (first question, please advice if non-compliant)

lbendlin · ‎03-23-2023

No, you need to bring out the big guns. AD extracts, Purview, Scanner API. Low code/no code tools tend to be overwhelmed when confronted with massive amounts of data.

View solution in original post

ESE_fh · ‎03-23-2023

possible value relevance: Most companies are SMEs (future value/cashflow). Most SMEs are not fully digitalized.
Many SMEs are MS-clients, still running file-servers, many run RDBs, fewer RDFs or kafka, i.e. data is stored in files.

The wish is to have a stackable select-ETL (until full digitalization, if at all) to select/access file-data via PBI (PQ/PP).

AIM: Enable any user to browse & combi-select multiple subdirectories to query file-content with separate ETLs.

Regards

lbendlin · ‎03-23-2023

You may want to continue exploring other tools. While Power Query is very powerful and flexible it is not a "proper" ETL tool. Its sole aim is to prepare data so that it can be used by Vertipaq. There was never a design goal to load anything elsewhere.

ESE_fh · ‎03-23-2023

Note on the concrete task to be accomplished that appears to be a standard BI use case
Task: Load data from distributed XLS-files (of same kind) from changing (renamed/moved) folders
via PBI (PQ) into a flattened MS-Cube to consolidate file-data for comparative pivot-analysis.

lbendlin · ‎03-23-2023

Task: Load data from distributed XLS-files (of same kind) from changing (renamed/moved) folders
via PBI (PQ) into a flattened MS-Cube to consolidate file-data for comparative pivot-analysis.

This is from when? 2013 ?

ESE_fh · ‎03-23-2023

Thank you for the VertiPaq hint, indeed helpful/new.
You may want to continue exploring other tools?
It means power platform apps/page/automate/va, eventhub/dataverse...?
Coming py/R/SQL/git, MS strongly adviced no/low-Code power platform.

lbendlin · ‎03-23-2023

No, you need to bring out the big guns. AD extracts, Purview, Scanner API. Low code/no code tools tend to be overwhelmed when confronted with massive amounts of data.

ESE_fh · ‎03-23-2023

Thanks again. Issue closed Please note: Process is exactly designed to avoid massive data.
The select process serves the purpose to select only those few files (data) of current interest.
There is just no singel point of truth SPOT, since files flaot on file server.

This fact requires iterative (therefore performant) folder search (folder browsing).

Hypothesis: Many SMEs should face a similar situation, (i.e. money on the street?)

ESE_fh · ‎03-23-2023

Hopefully pointing to a working solution: to be further generalized by recursive call, i.e. flexible folder-levels.

Idea: Each User-Folder-Select returns parent/child folders, which is like browsing the windows explorer (standard)
(Pretty sure, there is a conformant & better standard off-the-shelf solution, but ignored or over looked here)

// fct named GetSubDirs (apology if too trivial, mimics only one step of explorer browsing)
= (MainDir as text) as table =>
    let
        // GetSubDirs queries (for each row in a table) MainDir only the next level SubDirs 
        Source = Folder.Contents(MainDir),
        SubDir = Table.SelectColumns(Table.SelectRows(Source, each ([Attributes][Directory] = true)),{"Folder Path", "Name"}) 
    in
        SubDir

calls custom-fct (probably be better & faster done as if nested)

let 
    // Sub2Dirs (table[column] with Sub2Dirs) gets for each row get all next level Sub3Dirs
    Sub2Dir = Table.TransformColumnTypes(Table.SelectColumns(Excel.CurrentWorkbook(){[Name = "Sub2Dirs"]}[Content], {"Sub2Folder"}),{{"Sub2Folder", type text}}),
    CustFct = Table.TransformColumnTypes(Table.ExpandTableColumn(Table.AddColumn(Sub2Dir, "SubDir", each GetSubDirs([Sub2Folder]), type text), "SubDir", {"Name"}, {"Sub3Dir"}),{{"Sub3Dir", type text}}),
    Result = Table.SelectColumns(Table.SelectRows(Table.AddColumn(CustFct, "Sub3Folder", each Text.Combine({[Sub2Folder], [Sub3Dir]}, ""), type text), each ([Sub3Folder] <> "")), "Sub3Folder")
in
   Result

lbendlin · ‎03-23-2023

This can be "simplified" by using recursive functions. But will it be faster? Not likely.

ESE_fh · ‎03-20-2023

Thank you for your response and hint. The limiting externality is the amount of files and resulting response time.
Folder.Files() would be nice, but each file-server contains millions of files in constantly changing directories.

Table.Combine(Folder.Files("\\server01\???"), Folder.Files("\\server02\???"), more...) requires hours to answer.

The (common?) SME use case is a dynamic, multiple folder-select starting from different main directories.

Loop: For fast response, the user selects folders level by level. Easier approach highly appreciated
Step1: User-select of server(s) & call Folder.Contents( "get only (next) 1st level directories for selection" )
Step2: User-select of 1st.folders & call Folder.Contents( "get only (next) 2nd level directories for selection" )
Step3: User-select of 2st.folders & call Folder.Contents( "get only (next) 3rd level directories for selection" )
stop if no further user select, else continue loop until last deepest level

more PQ-conformant approach would be great help

Regards

lbendlin · ‎03-19-2023

Remember that the Folder connector gives you all folder names for free.

let
    Source = Folder.Files("\\192.168.3.8\pi"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Folder Path"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns")
in
    #"Removed Duplicates"

So no need to loop through anything again.

ESE_fh · ‎03-20-2023

Thank for a lot your hint and response.
Folder.files won't work because response time exceeds common user acceptance (explanation see above)

regards

lbendlin · ‎03-20-2023

Unless your data sources ( the file servers ) can help with periodic/fast meta data exports I don't think you'll have much luck making this any faster.

Get all combined subDirectories with Folder.Contents() from multiple servers (mainDirectories),

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024