Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ESE_fh
Frequent Visitor

Get all combined subDirectories with Folder.Contents() from multiple servers (mainDirectories),

Dear Pro-Users, most likely an easy question, 
What is a performant PQ best practice to query Folder.Contents (directories only) from different servers/directories?
starting from a table with all directories to query, (maybe) using the function Folder.Contents() 
in combination with each or (_)=> or better nested functions (Attributes: Directory = true)? 

let
dirs = Table.SelectColumns(Table.FromRecords({ [dirID = 001, path = "\\server01\MainFolder01"], [dirID = 002, path = "\\server01\MainFolder02"] , [dirID = ..., path = "\\server..\MainFolder.."] , [dirID = 00N, path = "\\server0M\MainFolder0K"] }))[path],
dirsAsTxt = Lines.toText(Table.ToList(dirs)), // probably not needed
CombTbls = Table.Combine()
in
CombTbls
// for each item in dirsAsTxt ... next
(dir) as text =>
let
SubFolders = Folder.Contents(dir as text),
ListSubFolders = Do.SomeTransforms(...)
in
ListSubFolders

Kind Regards (first question, please advice if non-compliant)

1 ACCEPTED SOLUTION

No, you need to bring out the big guns.  AD extracts, Purview, Scanner API.  Low code/no code tools tend to be overwhelmed when confronted with massive amounts of data.

View solution in original post

13 REPLIES 13
ESE_fh
Frequent Visitor

possible value relevance: Most companies are SMEs (future value/cashflow). Most SMEs are not fully digitalized.
Many SMEs are MS-clients, still running file-servers, many run RDBs, fewer RDFs or kafka, i.e. data is stored in files.

 

The wish is to have a stackable select-ETL (until full digitalization, if at all) to select/access file-data via PBI (PQ/PP).

AIM: Enable any user to browse & combi-select multiple subdirectories to query file-content with separate ETLs.

Regards

 

 

You may want to continue exploring other tools. While Power Query is very powerful and flexible it is not a "proper" ETL tool. Its sole aim is to prepare data so that it can be used by Vertipaq. There was never a design goal to load anything elsewhere.

Note on the concrete task to be accomplished that appears to be a standard BI use case
Task: Load data from distributed XLS-files (of same kind) from changing (renamed/moved) folders
via PBI (PQ) into a flattened MS-Cube to consolidate file-data for comparative pivot-analysis.

 

Task: Load data from distributed XLS-files (of same kind) from changing (renamed/moved) folders
via PBI (PQ) into a flattened MS-Cube to consolidate file-data for comparative pivot-analysis.

This is from when? 2013 ?

Thank you for the VertiPaq hint, indeed helpful/new.
You may want to continue exploring other tools?
It means power platform apps/page/automate/va, eventhub/dataverse...?
Coming py/R/SQL/git, MS strongly adviced no/low-Code power platform.

No, you need to bring out the big guns.  AD extracts, Purview, Scanner API.  Low code/no code tools tend to be overwhelmed when confronted with massive amounts of data.

Thanks again. Issue closed Please note: Process is exactly designed to avoid massive data.
The select process serves the purpose to select only those few files (data) of current interest.
There is just no singel point of truth SPOT, since files flaot on file server.

This fact requires iterative (therefore performant) folder search (folder browsing).

Hypothesis: Many SMEs should face a similar situation, (i.e. money on the street?)

ESE_fh
Frequent Visitor

Hopefully pointing to a working solution: to be further generalized by recursive call, i.e. flexible folder-levels.

Idea: Each User-Folder-Select returns parent/child folders, which is like browsing the windows explorer (standard)
(Pretty sure, there is a conformant & better standard off-the-shelf solution, but ignored or over looked here)

 

 

// fct named GetSubDirs (apology if too trivial, mimics only one step of explorer browsing)
= (MainDir as text) as table =>
    let
        // GetSubDirs queries (for each row in a table) MainDir only the next level SubDirs 
        Source = Folder.Contents(MainDir),
        SubDir = Table.SelectColumns(Table.SelectRows(Source, each ([Attributes][Directory] = true)),{"Folder Path", "Name"}) 
    in
        SubDir

 

 

calls custom-fct (probably be better & faster done as if nested)

let 
    // Sub2Dirs (table[column] with Sub2Dirs) gets for each row get all next level Sub3Dirs
    Sub2Dir = Table.TransformColumnTypes(Table.SelectColumns(Excel.CurrentWorkbook(){[Name = "Sub2Dirs"]}[Content], {"Sub2Folder"}),{{"Sub2Folder", type text}}),
    CustFct = Table.TransformColumnTypes(Table.ExpandTableColumn(Table.AddColumn(Sub2Dir, "SubDir", each GetSubDirs([Sub2Folder]), type text), "SubDir", {"Name"}, {"Sub3Dir"}),{{"Sub3Dir", type text}}),
    Result = Table.SelectColumns(Table.SelectRows(Table.AddColumn(CustFct, "Sub3Folder", each Text.Combine({[Sub2Folder], [Sub3Dir]}, ""), type text), each ([Sub3Folder] <> "")), "Sub3Folder")
in
   Result

This can be "simplified" by using recursive functions. But will it be faster? Not likely.

ESE_fh
Frequent Visitor

Thank you for your response and hint. The limiting externality is the amount of files and resulting response time.
Folder.Files() would be nice, but each file-server contains millions of files in constantly changing directories.

Table.Combine(Folder.Files("\\server01\???"),  Folder.Files("\\server02\???"), more...) requires hours to answer.

 

The (common?) SME use case is a dynamic, multiple folder-select starting from different main directories.


Loop
: For fast response, the user selects folders level by level. Easier approach highly appreciated

Step1: User-select of server(s)    & call Folder.Contents( "get only (next) 1st level directories for selection" )
Step2: User-select of 1st.folders & call Folder.Contents( "get only (next) 2nd level directories for selection" )
Step3: User-select of 2st.folders & call Folder.Contents( "get only (next) 3rd level directories for selection" )
stop if no further user select, else continue loop until last deepest level

more PQ-conformant approach would be great help

Regards

lbendlin
Super User
Super User

Remember that the Folder connector gives you all folder names for free.

 

let
    Source = Folder.Files("\\192.168.3.8\pi"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Folder Path"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns")
in
    #"Removed Duplicates"

So no need to loop through anything again.

Thank for a lot your hint and response.
Folder.files won't work because response time exceeds common user acceptance (explanation see above)

regards

Unless your data sources ( the file servers ) can help with periodic/fast meta data exports I don't think you'll have much luck making this any faster.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors