March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I had a pipeline that dealt with loading json files to SQL DW with a delta load. Unprocessed Files only. The data lake structure was for example project/rawdata/202304054364.json
All the parameters were fed from a SQL table so I could use the same pipeline for multiple files.
Now I need to change the process. I have a Data lake containing JSON files and the folder structure is as follows project/toProcess/filetypeA/20231007/1210073536169.json
So the date folders will fill with files and then a new day folder will be created. I need to change my original Pipeline to deal with this.
This is the original solution
I have a Pipeline that uses parameters from PIPELINE_PARAMETERS table and usp_PIPELINE_PARAMETERS gets the Parametertype and Parameter for each instance.
I have a lookup to get the parameters for the pipeline.using the above Stored procedure
For example rootfolder project, filepath toProcess/fileTypeA, filename *.json (Note at present the filepath has no wildcard)
A Get MetaData to get the list of JSON files and this is working. it uses a json_dataset and I have parameterised rootFolder, filePath and fileName
Parameters rootfolder @activity('LookupGetParameters').output.firstRow.rootFolder
Field List Child Items, this only gives you the file names. The parameters are set:.
rootFolder @activity('LookupGetParameters').output.firstRow.rootFolder
filePath @activity('LookupGetParameters').output.firstRow.filePath
fileName @activity('LookupGetParameters').output.firstRow.fileName
Again, the output is filename only
Then a Lookup to look processed files in our table that logs at the end of the process table SELECT FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'
Then a filter to filter out processed files from unprocessed
Items @activity('GetJsonFiles').output.childItems.
Condition @IF(empty(activity('LookupProcessedFiles').output.value), true, not(contains(string(activity('LookupProcessedFiles').output.value), item().name)))
So this just gives you the filenames of the unprocessed files
Then a Foreach. Items: @activity('FilterProcessedFiles').output.value which gets the filename only. Inside the Foreach is a copy activity , Source, the Json rootfolder filePath and filename and the Sink sqlSchema and sqlTable (Parameters set in the SQL data set)
Last a SQL pool Stored Procedure that sets the date, filename, rowcount and source (the folder) into the PROCESSED_FILE.
I have reset the Lookup to get the folder and filename SELECT Source+'/'+FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'
But I simply cannot get me a Get Data that brings out the wildcarded filepath and the file name. Its always either File type or Folder type with Child items.
Im really struggling to get help with this. has anyone got any tips on changing my pipeline so I can also wildcard the folder and pull that through to the for each
Solved! Go to Solution.
I have a solution to your problem - nested pipelines using Invoke Pipeline (Legacy) activity and Parameters.
You can nest this solution to any depth, and if you play with If Conditions on the file type can probably make it recursive. I've only tested it to one layer of folders/files.
All sorted. I have the foreach which executes a pipeline. that pipeline takes the folder from the top level
All sorted. I have the foreach which executes a pipeline. that pipeline takes the folder from the top level
I have a solution to your problem - nested pipelines using Invoke Pipeline (Legacy) activity and Parameters.
You can nest this solution to any depth, and if you play with If Conditions on the file type can probably make it recursive. I've only tested it to one layer of folders/files.
Doesn't solve your problem using the get data activity in a pipeline, but a Notebook would solve this problem for you - traverse the lakehouse file structure using mssparkutils and output (as JSON) the list of filenames and file paths.
Unfortunatly i can't use a notebook.
Im getting there by having a new data set with just the root folder set. Then a Get Meta data that gets the folder. Then a for each and another get Meta Data this time pointing to the file name with the folder dynamically set. Im some way off byt the idea is beginning to form
NOPE: Im not getting there I can't add another for each into the for each getting the dynamic folders. I really need some advice. And it has to be a pipeline.
So this time Im using a top level pipeline to get the folders which Im then feeding into a child pipeline. that iterates through each file in the folder. Fingers crossed this works OK
[Edit - I've found a proper solution posted in another reply - leaving this one here for reference]
You've discovered you can't have nested ForEach activities. Even if you're able to create them, when you try to execute them you get the following error message;
"Operation on target ForEach1 failed: Container activity cannot include another container activitynull"
So if you either already know the filenames in the subfolders you want to process *or* there's only 1 file to process in the subfolder you'll be golden.
You'll probably end up with something like the following in order to get the filename and path - this is what I got working.
I agree with @spencer_sa
I just made some test and found that in a ForEach activity, I can't create a nested ForEach activity. The ForEach option is greyed out here.
So we are able to iterate each date folder to get the file name list in it, but we cannot create another ForEach to copy the data. You have to invoke a child pipeline to copy the data of each file just like you've achieved it now.
Best Regards,
Jing
Community Support Team
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
7 | |
3 | |
3 | |
3 | |
2 |