Reply
DebbieE
Community Champion
Community Champion

Changing a Pipleine to work with wildcarded folders and not just wildcard the files

I had a pipeline that dealt with loading json files to SQL DW with a delta load. Unprocessed Files only. The data lake structure was for example project/rawdata/202304054364.json

 

All the parameters were fed from a SQL table so I could use the same pipeline for multiple files.

 

Now I need to change the process. I have a Data lake containing JSON files and the folder structure is as follows project/toProcess/filetypeA/20231007/1210073536169.json

So the date folders will fill with files and then a new day folder will be created. I need to change my original Pipeline to deal with this.

This is the original solution

 

I have a Pipeline that uses parameters from PIPELINE_PARAMETERS table and usp_PIPELINE_PARAMETERS gets the Parametertype and Parameter for each instance.

 

I have a lookup to get the parameters for the pipeline.using the above Stored procedure

For example rootfolder project,  filepath toProcess/fileTypeA, filename *.json (Note at present the filepath has no wildcard)

 

A Get MetaData to get the list of JSON files and this is working. it uses a json_dataset and I have parameterised rootFolder, filePath and fileName

 

Parameters rootfolder @activity('LookupGetParameters').output.firstRow.rootFolder

Field List Child Items, this only gives you the file names. The parameters are set:.

rootFolder @activity('LookupGetParameters').output.firstRow.rootFolder

filePath @activity('LookupGetParameters').output.firstRow.filePath

fileName @activity('LookupGetParameters').output.firstRow.fileName

 

Again, the output is filename only

Then a Lookup to look processed files in our table that logs at the end of the process table SELECT FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'

 

Then a filter to filter out processed files from unprocessed

Items @activity('GetJsonFiles').output.childItems.

Condition @IF(empty(activity('LookupProcessedFiles').output.value), true, not(contains(string(activity('LookupProcessedFiles').output.value), item().name)))

So this just gives you the filenames of the unprocessed files

 

Then a Foreach. Items: @activity('FilterProcessedFiles').output.value which gets the filename only. Inside the Foreach is a copy activity , Source, the Json rootfolder filePath and filename and the Sink sqlSchema and sqlTable (Parameters set in the SQL data set)

Last a SQL pool Stored Procedure that sets the date, filename, rowcount and source (the folder) into the PROCESSED_FILE. 

 

I have reset the Lookup to get the folder and filename SELECT Source+'/'+FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}' 

 

But I simply cannot get me a Get Data that brings out the wildcarded filepath and the file name. Its always either File type or Folder type with Child items.

 

Im really struggling to get help with this. has anyone got any tips on changing my pipeline so I can also wildcard the folder and pull that through to the for each

2 ACCEPTED SOLUTIONS
spencer_sa
Solution Supplier
Solution Supplier

I have a solution to your problem - nested pipelines using Invoke Pipeline (Legacy) activity and Parameters.

  • Parent pipeline has a Get Metadata activity and ForEach to cycle through the first folder layer.
  • Inside the ForEach is an Invoke Pipeline (Legacy) activity that calls a child pipeline with a Parameter called Path
  • Child Pipeline has a Parameter called Path that is fed into a Get Metadata activity to get the file list and a ForEach to then process each found file one at a time
  • The rest of your processing is inside the ForEach (or you can invoke another pipeline with the now complete Path/Filename.

You can nest this solution to any depth, and if you play with If Conditions on the file type can probably make it recursive.  I've only tested it to one layer of folders/files.

View solution in original post

DebbieE
Community Champion
Community Champion

All sorted. I have the foreach which executes a pipeline. that pipeline takes the folder from the top level

View solution in original post

7 REPLIES 7
DebbieE
Community Champion
Community Champion

All sorted. I have the foreach which executes a pipeline. that pipeline takes the folder from the top level

spencer_sa
Solution Supplier
Solution Supplier

I have a solution to your problem - nested pipelines using Invoke Pipeline (Legacy) activity and Parameters.

  • Parent pipeline has a Get Metadata activity and ForEach to cycle through the first folder layer.
  • Inside the ForEach is an Invoke Pipeline (Legacy) activity that calls a child pipeline with a Parameter called Path
  • Child Pipeline has a Parameter called Path that is fed into a Get Metadata activity to get the file list and a ForEach to then process each found file one at a time
  • The rest of your processing is inside the ForEach (or you can invoke another pipeline with the now complete Path/Filename.

You can nest this solution to any depth, and if you play with If Conditions on the file type can probably make it recursive.  I've only tested it to one layer of folders/files.

spencer_sa
Solution Supplier
Solution Supplier

Doesn't solve your problem using the get data activity in a pipeline, but a Notebook would solve this problem for you - traverse the lakehouse file structure using mssparkutils and output (as JSON) the list of filenames and file paths.

Unfortunatly i can't use a notebook.

 

Im getting there by having a new data set with just the root folder set. Then a Get Meta data that gets the folder. Then a for each and another get Meta Data this time pointing to the file name with the folder dynamically set. Im some way off byt the idea is beginning to form

 

NOPE: Im not getting there I can't add another for each into the for each getting the dynamic folders. I really need some advice. And it has to be a pipeline.

 

So this time Im using a top level pipeline to get the folders which Im then feeding into a child pipeline.  that iterates through each file in the folder. Fingers crossed this works OK

[Edit - I've found a proper solution posted in another reply - leaving this one here for reference]

You've discovered you can't have nested ForEach activities.  Even if you're able to create them, when you try to execute them you get the following error message;
"Operation on target ForEach1 failed: Container activity cannot include another container activitynull"
So if you either already know the filenames in the subfolders you want to process *or* there's only 1 file to process in the subfolder you'll be golden.

You'll probably end up with something like the following in order to get the filename and path - this is what I got working.

@concat(activity('Get First Level').output.itemName,'/',item().name,'/',activity('Get Second Level').output.childItems[0].name)



I agree with @spencer_sa 

 

I just made some test and found that in a ForEach activity, I can't create a nested ForEach activity. The ForEach option is greyed out here. 

vjingzhanmsft_0-1728639112323.png

 

So we are able to iterate each date folder to get the file name list in it, but we cannot create another ForEach to copy the data. You have to invoke a child pipeline to copy the data of each file just like you've achieved it now. 

 

Best Regards,
Jing
Community Support Team

I discovered I could cut and paste the ForEach to get it nested, but then it failed on execution.
 
avatar user

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)