Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi,
I want to connect to several sharepoints and get one Excel file and load it into a dataflow.
The idea is to use a function, which gets the Excel file from the different sharepoints and preprocesses the data.
I have all the URLs of the different sharepoints and idealy, I would like to loop over the sharepoint names to minimize the workload.
This is my function to get the sharepoints and preprocess the data:
(sharePointURL)=>
let
source = SharePoint.Files("https://xxxx.sharepoint.com/sites/" & sharePointURL, [ApiVersion = 15]),
workbook_binary = source{[Name = "ExcelFile.xlsx"]}[Content],
workbook = Excel.Workbook(workbook_binary),
input = workbook{[Item = "Sheet1", Kind = "Sheet"]}[Data],
headers = Table.PromoteHeaders(input, [PromoteAllScalars = true]),
noerrors = Table.RemoveRowsWithErrors(headers, {"Key"}),
notnull = Table.SelectRows(noerrors, each [Column1] <> null)
in
notnull
Then I have a table of sharepoint names which would then specify the sharePointURL in the function above
sharePointURL |
Name1 |
Name2 |
Then to get my result, I can simply add a new column to the table above which calls the function above and I get the table in the preview of the dataflow and in PowerBI desktop.
When I want to save the dataflow, I get the error message that the dataflow cannot be saved since it contains a dynamic dataset.
When I hardcode the URL in the function the dataflow will be saved, e.g. when I use: SharePoint.Files("https://xxxx.sharepoint.com/sites/Name1", [ApiVersion = 15])
Can I also use something like the RelativePath in the Web.Content function?
I have a premium workspace and a pro license.
Thanks for any help!
Solved! Go to Solution.
Hello - yes, I understand. It is the different folders that are causing the dynamic data source error. I realize the solution in the link provided is much more complex and does more than you are asking. Here is another option...
This is the line that is your problem:
source = SharePoint.Files("https://xxxx.sharepoint.com/sites/" & sharePointURL, [ApiVersion = 15]),
In order to avoid the dynamic data source error that occurs in dataflows, you will need to first query the SP root site, then handle the subfolders in different steps. Dataflows does require that all of the files you wish to query be contained in the same root site - the only way around that one is to have separate queries for each root site and then append them.
Add three text parameters to your dataflow:
Add a new query that connects to the root Sharepoint Url. This query's loading should be disabled (becuase it will contain complex column types). In my dataflow my query is named source_RootFolderContent.
let
Source = SharePoint.Files(paramSharepointRootUrl, [ApiVersion = 15])
in
Source
Next, add another query like the example below. We may need to tweak this further based on your needs. I recommend adding it to a query as it is and work through each step. After you have confirmed that the correct set of files are being returned, we can convert it into a function and remove any steps that are not needed for your scenario.
let
// START: Declare variables
//
varDocsDirectory = Text.Lower ( paramSharepointDocsDirectory),
varFileName = Text.Lower ( paramSharepointFileName ),
// If you want to filter the files for a specific date range, you can create additional parameters and use them like this.
// If the latest date is before the earliest date, use the earliest date.
// varMaxDateForBinaries = if LatestFileModificationDate < EarliestFileModificationDate then EarliestFileModificationDate else LatestFileModificationDate,
// If you'd like to only look for files in a specific set of subfolders, you can specify another parameter and use it like in these next few steps:
// Remove the specified character from the start of the string.
// ParentFolders_TrimCharFromStart =
// if Text.StartsWith ( SharepointParentFolders, "/" )
// then Text.TrimStart ( SharepointParentFolders, 1 )
// else SharepointParentFolders,
// Remove the specified character from the end of the string.
// ParentFolders_TrimCharFromEnd =
// if Text.EndsWith ( ParentFolders_TrimCharFromStart, "/" )
// then ParentFolders_TrimCharFromStart
// else ParentFolders_TrimCharFromStart & "/",
//Final variable
// varParentFolders = ParentFolders_TrimCharFromEnd,
//
// END: Declare Variables
source = source_RootFolderContent,
// Add a column for the subfolder paths (extracted from the file paths that are returned in the table.
// This and some of the other steps below may not be required in all scenarios. It is primarily used as a helper column to check a subset of subfolders, in the event the target name appears in more than one subfolder level, like this - /first folder name/target folder name/another subfolder/target folder name
add_subfolders = Table.AddColumn(Source, "Subfolders", each Text.BetweenDelimiters([Folder Path], "/sites/", "/", 0, 0), type text),
// Add a column for the Sharepoint documents directory.
add_dir = Table.AddColumn(add_subfolders , "Directory", each Text.BetweenDelimiters([Folder Path], [Subfolders] & "/", "/", 0, 0), type text),
// Count the number path separators that exist between the directory and the file name.
// This is only needed if you are wanting to perform validation on the number of subfolders.
add_folder_delimiter_count = Table.AddColumn(
add_dir,
"Delimiters Btwn Dir and End",
each List.Count ( Text.PositionOfAny ( [Folder Path], {"/"}, Occurrence.All ) ) + 1,
Int64.Type
),
// Extract the remaining subfolders in the path.
add_parent_folders = Table.AddColumn (
add_folder_delimiter_count,
"Parent Folders",
each Text.BetweenDelimiters(
[Folder Path],
[Directory] & "/",
"/",
0,
[Delimiters Btwn Dir and End]
), type text
),
remove_folder_path = Table.RemoveColumns(add_parent_folders, {"Folder Path", "Delimiters Btwn Dir and End"}),
// Keep only files that are in the specified directory with the specified parent folders.
// When selecting rows in a Sharepoint directory in dataflows, the filter must begin
// after the directory in order to avoid the dynamic datasource error message.
// If you don't need to look for specific files or folders then this is not needed.
filter_files = Table.SelectRows(
remove_folder_path,
each [Attributes]?[Hidden]? <> true
and Text.Contains ( [Directory], varDocsDirectory, Comparer.OrdinalIgnoreCase )
// and Text.Contains ( [Parent Folders], varParentFolders, Comparer.OrdinalIgnoreCase )
and Text.Contains ( [Name], varFileName, Comparer.OrdinalIgnoreCase )
),
// Filter the files by the Date Modified. (optional)
// select_files_by_date_modified = Table.SelectRows (
// filter_files,
// each [Date modified] >= EarliestFileModificationDate and [Date modified] <= varMaxDateForBinaries
),
// Optionally limit the results to the first N files.
// first_n_files = if FirstNFiles = null then select_files_by_date_modified else Table.FirstN ( select_files_by_date_modified, FirstNFiles ),
// Add a unique ID (integer) for each file. (optional)
add_id = Table.AddIndexColumn(filter_files, "FileId", 1, 1, Int64.Type)
in
add_id
Hello! Please check out this post where I explain how to do this efficiently.
Hi @jennratten, thanks for the message. I am not sure if your suggestion will actually help. I do have all my files in different sharepoint folders but all have the same name. And what I need to do is basically loop over the different sharepoint folders and get the data. The combination of the files was so far not a problem. I am only looking for a solution to be able to use this approach in the Power BI dataflow such that I do not need to use it in Power BI Desktop.
Hello - yes, I understand. It is the different folders that are causing the dynamic data source error. I realize the solution in the link provided is much more complex and does more than you are asking. Here is another option...
This is the line that is your problem:
source = SharePoint.Files("https://xxxx.sharepoint.com/sites/" & sharePointURL, [ApiVersion = 15]),
In order to avoid the dynamic data source error that occurs in dataflows, you will need to first query the SP root site, then handle the subfolders in different steps. Dataflows does require that all of the files you wish to query be contained in the same root site - the only way around that one is to have separate queries for each root site and then append them.
Add three text parameters to your dataflow:
Add a new query that connects to the root Sharepoint Url. This query's loading should be disabled (becuase it will contain complex column types). In my dataflow my query is named source_RootFolderContent.
let
Source = SharePoint.Files(paramSharepointRootUrl, [ApiVersion = 15])
in
Source
Next, add another query like the example below. We may need to tweak this further based on your needs. I recommend adding it to a query as it is and work through each step. After you have confirmed that the correct set of files are being returned, we can convert it into a function and remove any steps that are not needed for your scenario.
let
// START: Declare variables
//
varDocsDirectory = Text.Lower ( paramSharepointDocsDirectory),
varFileName = Text.Lower ( paramSharepointFileName ),
// If you want to filter the files for a specific date range, you can create additional parameters and use them like this.
// If the latest date is before the earliest date, use the earliest date.
// varMaxDateForBinaries = if LatestFileModificationDate < EarliestFileModificationDate then EarliestFileModificationDate else LatestFileModificationDate,
// If you'd like to only look for files in a specific set of subfolders, you can specify another parameter and use it like in these next few steps:
// Remove the specified character from the start of the string.
// ParentFolders_TrimCharFromStart =
// if Text.StartsWith ( SharepointParentFolders, "/" )
// then Text.TrimStart ( SharepointParentFolders, 1 )
// else SharepointParentFolders,
// Remove the specified character from the end of the string.
// ParentFolders_TrimCharFromEnd =
// if Text.EndsWith ( ParentFolders_TrimCharFromStart, "/" )
// then ParentFolders_TrimCharFromStart
// else ParentFolders_TrimCharFromStart & "/",
//Final variable
// varParentFolders = ParentFolders_TrimCharFromEnd,
//
// END: Declare Variables
source = source_RootFolderContent,
// Add a column for the subfolder paths (extracted from the file paths that are returned in the table.
// This and some of the other steps below may not be required in all scenarios. It is primarily used as a helper column to check a subset of subfolders, in the event the target name appears in more than one subfolder level, like this - /first folder name/target folder name/another subfolder/target folder name
add_subfolders = Table.AddColumn(Source, "Subfolders", each Text.BetweenDelimiters([Folder Path], "/sites/", "/", 0, 0), type text),
// Add a column for the Sharepoint documents directory.
add_dir = Table.AddColumn(add_subfolders , "Directory", each Text.BetweenDelimiters([Folder Path], [Subfolders] & "/", "/", 0, 0), type text),
// Count the number path separators that exist between the directory and the file name.
// This is only needed if you are wanting to perform validation on the number of subfolders.
add_folder_delimiter_count = Table.AddColumn(
add_dir,
"Delimiters Btwn Dir and End",
each List.Count ( Text.PositionOfAny ( [Folder Path], {"/"}, Occurrence.All ) ) + 1,
Int64.Type
),
// Extract the remaining subfolders in the path.
add_parent_folders = Table.AddColumn (
add_folder_delimiter_count,
"Parent Folders",
each Text.BetweenDelimiters(
[Folder Path],
[Directory] & "/",
"/",
0,
[Delimiters Btwn Dir and End]
), type text
),
remove_folder_path = Table.RemoveColumns(add_parent_folders, {"Folder Path", "Delimiters Btwn Dir and End"}),
// Keep only files that are in the specified directory with the specified parent folders.
// When selecting rows in a Sharepoint directory in dataflows, the filter must begin
// after the directory in order to avoid the dynamic datasource error message.
// If you don't need to look for specific files or folders then this is not needed.
filter_files = Table.SelectRows(
remove_folder_path,
each [Attributes]?[Hidden]? <> true
and Text.Contains ( [Directory], varDocsDirectory, Comparer.OrdinalIgnoreCase )
// and Text.Contains ( [Parent Folders], varParentFolders, Comparer.OrdinalIgnoreCase )
and Text.Contains ( [Name], varFileName, Comparer.OrdinalIgnoreCase )
),
// Filter the files by the Date Modified. (optional)
// select_files_by_date_modified = Table.SelectRows (
// filter_files,
// each [Date modified] >= EarliestFileModificationDate and [Date modified] <= varMaxDateForBinaries
),
// Optionally limit the results to the first N files.
// first_n_files = if FirstNFiles = null then select_files_by_date_modified else Table.FirstN ( select_files_by_date_modified, FirstNFiles ),
// Add a unique ID (integer) for each file. (optional)
add_id = Table.AddIndexColumn(filter_files, "FileId", 1, 1, Int64.Type)
in
add_id
Hi @jennratten , thanks for the help so far.
I was able to implement the solution you suggested. How would I continue to loop then over the different sharepoint root URLs?
So far, I see that the query takes the paramSharepointRootUrl which is only one root URL but I do have several root URL and I cannot connect only to https://xxx.sharepoint.com/sites.
I could change the second query to a function to apply it to the different root URLs I have. Or how would you continue?
Hi @tmul ,
You can create a table with all item number which you wanted, then add a custom column with web connector function to get data from each row items and use expand feature expand all records.
How to connect Power-BI with multiple web data sources or URLs
Best Regards,
Stephen Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
70 | |
63 | |
40 | |
28 | |
16 |