<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Changing a Pipleine to work with wildcarded folders and not just wildcard the files in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4236870#M4540</link>
    <description>&lt;P&gt;I had a pipeline that dealt with loading json files to SQL DW with a delta load. Unprocessed Files only. The data lake structure was for example project/rawdata/202304054364.json&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All the parameters were fed from a SQL table so I could use the same pipeline for multiple files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I need to change the process. I have a Data lake containing JSON files and the folder structure is as follows project/toProcess/filetypeA/20231007/1210073536169.json&lt;/P&gt;&lt;P&gt;So the date folders will fill with files and then a new day folder will be created. I need to change my original Pipeline to deal with this.&lt;/P&gt;&lt;P&gt;This is the original solution&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a Pipeline that uses parameters from PIPELINE_PARAMETERS table and usp_PIPELINE_PARAMETERS gets the Parametertype and Parameter for each instance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a lookup to get the parameters for the pipeline.using the above Stored procedure&lt;/P&gt;&lt;P&gt;For example rootfolder project,&amp;nbsp; filepath toProcess/fileTypeA, filename *.json (Note at present the filepath has no wildcard)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A Get MetaData to get the list of JSON files and this is working.&amp;nbsp;it uses a json_dataset and I have parameterised rootFolder, filePath and fileName&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Parameters rootfolder @activity('LookupGetParameters').output.firstRow.rootFolder&lt;/P&gt;&lt;P&gt;Field List Child Items, this only gives you the file names. The parameters are set:.&lt;/P&gt;&lt;P&gt;rootFolder @activity('LookupGetParameters').output.firstRow.rootFolder&lt;/P&gt;&lt;P&gt;filePath @activity('LookupGetParameters').output.firstRow.filePath&lt;/P&gt;&lt;P&gt;fileName @activity('LookupGetParameters').output.firstRow.fileName&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Again, the output is filename only&lt;/P&gt;&lt;P&gt;Then a Lookup to look processed files in our table that logs at the end of the process table SELECT FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then a filter to filter out processed files from unprocessed&lt;/P&gt;&lt;P&gt;Items @activity('GetJsonFiles').output.childItems.&lt;/P&gt;&lt;P&gt;Condition &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/213851"&gt;@IF&lt;/a&gt;(empty(activity('LookupProcessedFiles').output.value), true, not(contains(string(activity('LookupProcessedFiles').output.value), item().name)))&lt;/P&gt;&lt;P&gt;So this just gives you the filenames of the unprocessed files&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then a Foreach. Items: @activity('FilterProcessedFiles').output.value which gets the filename only. Inside the Foreach is a copy activity , Source, the Json rootfolder filePath and filename and the Sink sqlSchema and sqlTable (Parameters set in the SQL data set)&lt;/P&gt;&lt;P&gt;Last a SQL pool Stored Procedure that sets the date, filename, rowcount and source (the folder) into the PROCESSED_FILE.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have reset the Lookup to get the folder and filename&amp;nbsp;SELECT Source+'/'+FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But I simply cannot get me a Get Data that brings out the wildcarded filepath and the file name. Its always either File type or Folder type with Child items.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Im really struggling to get help with this. has anyone got any tips on changing my pipeline so I can also wildcard the folder and pull that through to the for each&lt;/P&gt;</description>
    <pubDate>Thu, 10 Oct 2024 11:15:49 GMT</pubDate>
    <dc:creator>DebbieE</dc:creator>
    <dc:date>2024-10-10T11:15:49Z</dc:date>
    <item>
      <title>Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4236870#M4540</link>
      <description>&lt;P&gt;I had a pipeline that dealt with loading json files to SQL DW with a delta load. Unprocessed Files only. The data lake structure was for example project/rawdata/202304054364.json&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All the parameters were fed from a SQL table so I could use the same pipeline for multiple files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I need to change the process. I have a Data lake containing JSON files and the folder structure is as follows project/toProcess/filetypeA/20231007/1210073536169.json&lt;/P&gt;&lt;P&gt;So the date folders will fill with files and then a new day folder will be created. I need to change my original Pipeline to deal with this.&lt;/P&gt;&lt;P&gt;This is the original solution&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a Pipeline that uses parameters from PIPELINE_PARAMETERS table and usp_PIPELINE_PARAMETERS gets the Parametertype and Parameter for each instance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a lookup to get the parameters for the pipeline.using the above Stored procedure&lt;/P&gt;&lt;P&gt;For example rootfolder project,&amp;nbsp; filepath toProcess/fileTypeA, filename *.json (Note at present the filepath has no wildcard)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A Get MetaData to get the list of JSON files and this is working.&amp;nbsp;it uses a json_dataset and I have parameterised rootFolder, filePath and fileName&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Parameters rootfolder @activity('LookupGetParameters').output.firstRow.rootFolder&lt;/P&gt;&lt;P&gt;Field List Child Items, this only gives you the file names. The parameters are set:.&lt;/P&gt;&lt;P&gt;rootFolder @activity('LookupGetParameters').output.firstRow.rootFolder&lt;/P&gt;&lt;P&gt;filePath @activity('LookupGetParameters').output.firstRow.filePath&lt;/P&gt;&lt;P&gt;fileName @activity('LookupGetParameters').output.firstRow.fileName&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Again, the output is filename only&lt;/P&gt;&lt;P&gt;Then a Lookup to look processed files in our table that logs at the end of the process table SELECT FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then a filter to filter out processed files from unprocessed&lt;/P&gt;&lt;P&gt;Items @activity('GetJsonFiles').output.childItems.&lt;/P&gt;&lt;P&gt;Condition &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/213851"&gt;@IF&lt;/a&gt;(empty(activity('LookupProcessedFiles').output.value), true, not(contains(string(activity('LookupProcessedFiles').output.value), item().name)))&lt;/P&gt;&lt;P&gt;So this just gives you the filenames of the unprocessed files&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then a Foreach. Items: @activity('FilterProcessedFiles').output.value which gets the filename only. Inside the Foreach is a copy activity , Source, the Json rootfolder filePath and filename and the Sink sqlSchema and sqlTable (Parameters set in the SQL data set)&lt;/P&gt;&lt;P&gt;Last a SQL pool Stored Procedure that sets the date, filename, rowcount and source (the folder) into the PROCESSED_FILE.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have reset the Lookup to get the folder and filename&amp;nbsp;SELECT Source+'/'+FileName FROM [metadatafwk].[PROCESSED_FILE_LOG] WHERE Source = '@{activity('LookupGetParameters').output.firstRow.filePath}'&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But I simply cannot get me a Get Data that brings out the wildcarded filepath and the file name. Its always either File type or Folder type with Child items.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Im really struggling to get help with this. has anyone got any tips on changing my pipeline so I can also wildcard the folder and pull that through to the for each&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2024 11:15:49 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4236870#M4540</guid>
      <dc:creator>DebbieE</dc:creator>
      <dc:date>2024-10-10T11:15:49Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4237059#M4541</link>
      <description>&lt;P&gt;Doesn't solve your problem using the get data activity in a pipeline, but a Notebook would solve this problem for you - traverse the lakehouse file structure using mssparkutils and output (as JSON) the list of filenames and file paths.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2024 13:26:45 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4237059#M4541</guid>
      <dc:creator>spencer_sa</dc:creator>
      <dc:date>2024-10-10T13:26:45Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4237062#M4542</link>
      <description>&lt;P&gt;Unfortunatly i can't use a notebook.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Im getting there by having a new data set with just the root folder set. Then a Get Meta data that gets the folder. Then a for each and another get Meta Data this time pointing to the file name with the folder dynamically set. Im some way off byt the idea is beginning to form&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOPE: Im not getting there I can't add another for each into the for each getting the dynamic folders. I really need some advice. And it has to be a pipeline.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So this time Im using a top level pipeline to get the folders which Im then feeding into a child pipeline.&amp;nbsp; that iterates through each file in the folder. Fingers crossed this works OK&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2024 14:45:45 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4237062#M4542</guid>
      <dc:creator>DebbieE</dc:creator>
      <dc:date>2024-10-10T14:45:45Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238603#M4554</link>
      <description>&lt;P&gt;[Edit - I've found a proper solution posted in another reply - leaving this one here for reference]&lt;BR /&gt;&lt;BR /&gt;You've discovered you can't have nested ForEach activities.&amp;nbsp; Even if you're able to create them, when you try to execute them you get the following error message;&lt;BR /&gt;&lt;SPAN&gt;"Operation on target ForEach1 failed: Container activity cannot include another container activitynull"&lt;BR /&gt;So if you either already know the filenames in the subfolders you want to process *or* there's only 1 file to process in the subfolder you'll be golden.&lt;BR /&gt;&lt;BR /&gt;You'll probably end up with something like the following in order to get the filename and path - this is what I got working.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;concat(activity(&lt;/SPAN&gt;&lt;SPAN&gt;'Get First Level'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;output&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;itemName&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;'/'&lt;/SPAN&gt;&lt;SPAN&gt;,item().&lt;/SPAN&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;'/'&lt;/SPAN&gt;&lt;SPAN&gt;,activity(&lt;/SPAN&gt;&lt;SPAN&gt;'Get Second Level'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;output&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;childItems&lt;/SPAN&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;].&lt;/SPAN&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Oct 2024 09:52:30 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238603#M4554</guid>
      <dc:creator>spencer_sa</dc:creator>
      <dc:date>2024-10-11T09:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238612#M4555</link>
      <description>&lt;P&gt;I agree with&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/679603"&gt;@spencer_sa&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I just made some test and found that in a ForEach activity, I can't create a nested ForEach activity. The ForEach option is greyed out here.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vjingzhanmsft_0-1728639112323.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1181690i3FDA80BCAE2156FC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vjingzhanmsft_0-1728639112323.png" alt="vjingzhanmsft_0-1728639112323.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So we are able to iterate each date folder to get the file name list in it, but we cannot create another ForEach to copy the data. You have to invoke a child pipeline to copy the data of each file just like you've achieved it now.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;BR /&gt;Jing&lt;BR /&gt;Community Support Team&lt;/P&gt;</description>
      <pubDate>Fri, 11 Oct 2024 09:39:40 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238612#M4555</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-10-11T09:39:40Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238628#M4556</link>
      <description>&lt;P&gt;I have a solution to your problem - nested pipelines using Invoke Pipeline (Legacy) activity and Parameters.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Parent pipeline has a Get Metadata activity and ForEach to cycle through the first folder layer.&lt;/LI&gt;&lt;LI&gt;Inside the ForEach is an Invoke Pipeline (Legacy) activity that calls a child pipeline with a Parameter called Path&lt;/LI&gt;&lt;LI&gt;Child Pipeline has a Parameter called Path that is fed into a Get Metadata activity to get the file list and a ForEach to then process each found file one at a time&lt;/LI&gt;&lt;LI&gt;The rest of your processing is inside the ForEach (or you can invoke another pipeline with the now complete Path/Filename.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;You can nest this solution to any depth, and if you play with If Conditions on the file type can probably make it recursive.&amp;nbsp; I've only tested it to one layer of folders/files.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Oct 2024 09:50:51 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238628#M4556</guid>
      <dc:creator>spencer_sa</dc:creator>
      <dc:date>2024-10-11T09:50:51Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238676#M4557</link>
      <description>&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;@Anonymous&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;I discovered I could cut and paste the ForEach to get it nested, but then it failed on execution.&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 11 Oct 2024 10:27:22 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238676#M4557</guid>
      <dc:creator>spencer_sa</dc:creator>
      <dc:date>2024-10-11T10:27:22Z</dc:date>
    </item>
    <item>
      <title>Re: Changing a Pipleine to work with wildcarded folders and not just wildcard the files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238734#M4558</link>
      <description>&lt;P&gt;All sorted. I have the foreach which executes a pipeline. that pipeline takes the folder from the top level&lt;/P&gt;</description>
      <pubDate>Fri, 11 Oct 2024 11:34:29 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Changing-a-Pipleine-to-work-with-wildcarded-folders-and-not-just/m-p/4238734#M4558</guid>
      <dc:creator>DebbieE</dc:creator>
      <dc:date>2024-10-11T11:34:29Z</dc:date>
    </item>
  </channel>
</rss>

