Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
tmul
Frequent Visitor

DataFlow does not save due to dynamic data source

Hi, 
I want to connect to several sharepoints and get one Excel file and load it into a dataflow. 
The idea is to use a function, which gets the Excel file from the different sharepoints and preprocesses the data.
I have all the URLs of the different sharepoints and idealy, I would like to loop over the sharepoint names to minimize the workload. 


This is my function to get the sharepoints and preprocess the data:

 

(sharePointURL)=>
 let
  source = SharePoint.Files("https://xxxx.sharepoint.com/sites/" & sharePointURL, [ApiVersion = 15]),
  workbook_binary = source{[Name = "ExcelFile.xlsx"]}[Content],
  workbook = Excel.Workbook(workbook_binary),
  input = workbook{[Item = "Sheet1", Kind = "Sheet"]}[Data],
  headers = Table.PromoteHeaders(input, [PromoteAllScalars = true]),
  noerrors = Table.RemoveRowsWithErrors(headers, {"Key"}),
  notnull = Table.SelectRows(noerrors, each [Column1] <> null)
in
notnull

 

 Then I have a table of sharepoint names which would then specify the sharePointURL in the function above

sharePointURL
Name1
Name2


Then to get my result, I can simply add a new column to the table above which calls the function above and I get the table in the preview of the dataflow and in PowerBI desktop.
When I want to save the dataflow, I get the error message that the dataflow cannot be saved since it contains a dynamic dataset.
When I hardcode the URL in the function the dataflow will be saved, e.g. when I use:  SharePoint.Files("https://xxxx.sharepoint.com/sites/Name1", [ApiVersion = 15])
Can I also use something like the RelativePath in the Web.Content function? 
I have a premium workspace and a pro license. 

Thanks for any help!


1 ACCEPTED SOLUTION

Hello - yes, I understand.  It is the different folders that are causing the dynamic data source error.  I realize the solution in the link provided is much more complex and does more than you are asking.  Here is another option...

 

This is the line that is your problem:

source = SharePoint.Files("https://xxxx.sharepoint.com/sites/" & sharePointURL, [ApiVersion = 15]),

 

In order to avoid the dynamic data source error that occurs in dataflows, you will need to first query the SP root site, then handle the subfolders in different steps.  Dataflows does require that all of the files you wish to query be contained in the same root site - the only way around that one is to have separate queries for each root site and then append them.  

 

Add three text parameters to your dataflow:

  • paramSharepointRootUrl
  • paramSharepointDocsDirectory
    • This is the text that appears between your SP site root URL and the next forward slash.  It is typically "Documents" (for private sites) and "Shared Documents" (for sites that are not private).
  •  paramSharepointFileName
    • If all of the files have the same file name, add it as a parameter.

Add a new query that connects to the root Sharepoint Url.  This query's loading should be disabled (becuase it will contain complex column types).  In my dataflow my query is named source_RootFolderContent.

let
  Source = SharePoint.Files(paramSharepointRootUrl, [ApiVersion = 15])
in
  Source

 

Next, add another query like the example below.  We may need to tweak this further based on your needs.  I recommend adding it to a query as it is and work through each step.  After you have confirmed that the correct set of files are being returned, we can convert it into a function and remove any steps that are not needed for your scenario. 

 

let
  // START: Declare variables
  //
  varDocsDirectory = Text.Lower ( paramSharepointDocsDirectory),  
  varFileName = Text.Lower ( paramSharepointFileName ),
  // If you want to filter the files for a specific date range, you can create additional parameters and use them like this.
  // If the latest date is before the earliest date, use the earliest date.
  // varMaxDateForBinaries = if LatestFileModificationDate < EarliestFileModificationDate then EarliestFileModificationDate else LatestFileModificationDate,                                          

  // If you'd like to only look for files in a specific set of subfolders, you can specify another parameter and use it like in these next few steps:
  // Remove the specified character from the start of the string.
  // ParentFolders_TrimCharFromStart = 
  //   if Text.StartsWith ( SharepointParentFolders, "/" ) 
  //   then Text.TrimStart ( SharepointParentFolders, 1 )
  //  else SharepointParentFolders,

  // Remove the specified character from the end of the string.
  // ParentFolders_TrimCharFromEnd = 
  //  if Text.EndsWith ( ParentFolders_TrimCharFromStart, "/" ) 
  //  then ParentFolders_TrimCharFromStart
  //  else ParentFolders_TrimCharFromStart & "/",

  //Final variable
  // varParentFolders = ParentFolders_TrimCharFromEnd,
  // 
  // END: Declare Variables

  source = source_RootFolderContent,
  // Add a column for the subfolder paths (extracted from the file paths that are returned in the table.
  // This and some of the other steps below may not be required in all scenarios.  It is primarily used as a helper column to check a subset of subfolders, in the event the target name appears in more than one subfolder level, like this - /first folder name/target folder name/another subfolder/target folder name
  add_subfolders = Table.AddColumn(Source, "Subfolders", each Text.BetweenDelimiters([Folder Path], "/sites/", "/", 0, 0), type text),
  // Add a column for the Sharepoint documents directory.
  add_dir = Table.AddColumn(add_subfolders , "Directory", each Text.BetweenDelimiters([Folder Path], [Subfolders] & "/", "/", 0, 0), type text),
  // Count the number path separators that exist between the directory and the file name.
  // This is only needed if you are wanting to perform validation on the number of subfolders.
  add_folder_delimiter_count = Table.AddColumn(
      add_dir, 
      "Delimiters Btwn Dir and End", 
      each List.Count ( Text.PositionOfAny ( [Folder Path], {"/"}, Occurrence.All ) ) + 1,
      Int64.Type
    ),
  // Extract the remaining subfolders in the path.
  add_parent_folders = Table.AddColumn (
      add_folder_delimiter_count, 
      "Parent Folders", 
      each Text.BetweenDelimiters(
            [Folder Path], 
            [Directory] & "/", 
            "/", 
            0, 
            [Delimiters Btwn Dir and End]
        ), type text
    ),
  remove_folder_path = Table.RemoveColumns(add_parent_folders, {"Folder Path", "Delimiters Btwn Dir and End"}),
  // Keep only files that are in the specified directory with the specified parent folders.
  // When selecting rows in a Sharepoint directory in dataflows, the filter must begin
  // after the directory in order to avoid the dynamic datasource error message.
  // If you don't need to look for specific files or folders then this is not needed.
  filter_files = Table.SelectRows(
    remove_folder_path, 
    each [Attributes]?[Hidden]? <> true
    and Text.Contains ( [Directory], varDocsDirectory, Comparer.OrdinalIgnoreCase )
    // and Text.Contains ( [Parent Folders], varParentFolders, Comparer.OrdinalIgnoreCase )
    and Text.Contains ( [Name], varFileName, Comparer.OrdinalIgnoreCase )
  ),
  // Filter the files by the Date Modified. (optional)
  // select_files_by_date_modified = Table.SelectRows (
  //  filter_files,
  //  each [Date modified] >= EarliestFileModificationDate and [Date modified] <= varMaxDateForBinaries
  ),
  // Optionally limit the results to the first N files.
  // first_n_files = if FirstNFiles = null then select_files_by_date_modified else Table.FirstN ( select_files_by_date_modified, FirstNFiles ),
  // Add a unique ID (integer) for each file. (optional)
  add_id = Table.AddIndexColumn(filter_files, "FileId", 1, 1, Int64.Type)
in
  add_id

 

View solution in original post

5 REPLIES 5
jennratten
Super User
Super User

Hello!  Please check out this post where I explain how to do this efficiently.

https://community.powerbi.com/t5/Power-Query/SharePoint-Folder-not-combining-files/m-p/2032890/highl... 

Hi @jennratten, thanks for the message. I am not sure if your suggestion will actually help. I do have all my files in different sharepoint folders but all have the same name. And what I need to do is basically loop over the different sharepoint folders and get the data. The combination of the files was so far not a problem. I am only looking for a solution to be able to use this approach in the Power BI dataflow such that I do not need to use it in Power BI Desktop. 

Hello - yes, I understand.  It is the different folders that are causing the dynamic data source error.  I realize the solution in the link provided is much more complex and does more than you are asking.  Here is another option...

 

This is the line that is your problem:

source = SharePoint.Files("https://xxxx.sharepoint.com/sites/" & sharePointURL, [ApiVersion = 15]),

 

In order to avoid the dynamic data source error that occurs in dataflows, you will need to first query the SP root site, then handle the subfolders in different steps.  Dataflows does require that all of the files you wish to query be contained in the same root site - the only way around that one is to have separate queries for each root site and then append them.  

 

Add three text parameters to your dataflow:

  • paramSharepointRootUrl
  • paramSharepointDocsDirectory
    • This is the text that appears between your SP site root URL and the next forward slash.  It is typically "Documents" (for private sites) and "Shared Documents" (for sites that are not private).
  •  paramSharepointFileName
    • If all of the files have the same file name, add it as a parameter.

Add a new query that connects to the root Sharepoint Url.  This query's loading should be disabled (becuase it will contain complex column types).  In my dataflow my query is named source_RootFolderContent.

let
  Source = SharePoint.Files(paramSharepointRootUrl, [ApiVersion = 15])
in
  Source

 

Next, add another query like the example below.  We may need to tweak this further based on your needs.  I recommend adding it to a query as it is and work through each step.  After you have confirmed that the correct set of files are being returned, we can convert it into a function and remove any steps that are not needed for your scenario. 

 

let
  // START: Declare variables
  //
  varDocsDirectory = Text.Lower ( paramSharepointDocsDirectory),  
  varFileName = Text.Lower ( paramSharepointFileName ),
  // If you want to filter the files for a specific date range, you can create additional parameters and use them like this.
  // If the latest date is before the earliest date, use the earliest date.
  // varMaxDateForBinaries = if LatestFileModificationDate < EarliestFileModificationDate then EarliestFileModificationDate else LatestFileModificationDate,                                          

  // If you'd like to only look for files in a specific set of subfolders, you can specify another parameter and use it like in these next few steps:
  // Remove the specified character from the start of the string.
  // ParentFolders_TrimCharFromStart = 
  //   if Text.StartsWith ( SharepointParentFolders, "/" ) 
  //   then Text.TrimStart ( SharepointParentFolders, 1 )
  //  else SharepointParentFolders,

  // Remove the specified character from the end of the string.
  // ParentFolders_TrimCharFromEnd = 
  //  if Text.EndsWith ( ParentFolders_TrimCharFromStart, "/" ) 
  //  then ParentFolders_TrimCharFromStart
  //  else ParentFolders_TrimCharFromStart & "/",

  //Final variable
  // varParentFolders = ParentFolders_TrimCharFromEnd,
  // 
  // END: Declare Variables

  source = source_RootFolderContent,
  // Add a column for the subfolder paths (extracted from the file paths that are returned in the table.
  // This and some of the other steps below may not be required in all scenarios.  It is primarily used as a helper column to check a subset of subfolders, in the event the target name appears in more than one subfolder level, like this - /first folder name/target folder name/another subfolder/target folder name
  add_subfolders = Table.AddColumn(Source, "Subfolders", each Text.BetweenDelimiters([Folder Path], "/sites/", "/", 0, 0), type text),
  // Add a column for the Sharepoint documents directory.
  add_dir = Table.AddColumn(add_subfolders , "Directory", each Text.BetweenDelimiters([Folder Path], [Subfolders] & "/", "/", 0, 0), type text),
  // Count the number path separators that exist between the directory and the file name.
  // This is only needed if you are wanting to perform validation on the number of subfolders.
  add_folder_delimiter_count = Table.AddColumn(
      add_dir, 
      "Delimiters Btwn Dir and End", 
      each List.Count ( Text.PositionOfAny ( [Folder Path], {"/"}, Occurrence.All ) ) + 1,
      Int64.Type
    ),
  // Extract the remaining subfolders in the path.
  add_parent_folders = Table.AddColumn (
      add_folder_delimiter_count, 
      "Parent Folders", 
      each Text.BetweenDelimiters(
            [Folder Path], 
            [Directory] & "/", 
            "/", 
            0, 
            [Delimiters Btwn Dir and End]
        ), type text
    ),
  remove_folder_path = Table.RemoveColumns(add_parent_folders, {"Folder Path", "Delimiters Btwn Dir and End"}),
  // Keep only files that are in the specified directory with the specified parent folders.
  // When selecting rows in a Sharepoint directory in dataflows, the filter must begin
  // after the directory in order to avoid the dynamic datasource error message.
  // If you don't need to look for specific files or folders then this is not needed.
  filter_files = Table.SelectRows(
    remove_folder_path, 
    each [Attributes]?[Hidden]? <> true
    and Text.Contains ( [Directory], varDocsDirectory, Comparer.OrdinalIgnoreCase )
    // and Text.Contains ( [Parent Folders], varParentFolders, Comparer.OrdinalIgnoreCase )
    and Text.Contains ( [Name], varFileName, Comparer.OrdinalIgnoreCase )
  ),
  // Filter the files by the Date Modified. (optional)
  // select_files_by_date_modified = Table.SelectRows (
  //  filter_files,
  //  each [Date modified] >= EarliestFileModificationDate and [Date modified] <= varMaxDateForBinaries
  ),
  // Optionally limit the results to the first N files.
  // first_n_files = if FirstNFiles = null then select_files_by_date_modified else Table.FirstN ( select_files_by_date_modified, FirstNFiles ),
  // Add a unique ID (integer) for each file. (optional)
  add_id = Table.AddIndexColumn(filter_files, "FileId", 1, 1, Int64.Type)
in
  add_id

 

Hi @jennratten , thanks for the help so far. 
I was able to implement the solution you suggested. How would I continue to loop then over the different sharepoint root URLs? 
So far, I see that the query takes the paramSharepointRootUrl which is only one root URL but I do have several root URL and I cannot connect only to https://xxx.sharepoint.com/sites.
I could change the second query to a function to apply it to the different root URLs I have. Or how would you continue?

Hi @tmul ,

 

You can create a table with all item number which you wanted, then add a custom column with web connector function to get data from each row items and use expand feature expand all records.

How to connect Power-BI with multiple web data sources or URLs

 

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors