Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
DebbieE
Community Champion
Community Champion

Fabric Pipeline. Data flow. Removing items from a Filter with GetMetaData and Lookup

OK so I have a Bronze Datalake and a Silver Data Lake

In Silver I have a Parquet File of processed file names e.g.

Proja.csv

Projb.csv

Projc.csv

Projd.csv

 

And in the dataflow I have a Get MetaData activity conntected to the childitems in my Bronze datalake. So its finding files

lookup.csv

Proja.csv

Projb.csv

Projc.csv

Projd.csv

Proje.csv (Which is the new file)

 

I then have a filter to remove the Lookup.csv file 

@and(equals(item().type,'File'),startswith(item().name,'Proj'))
 
And now I want to get a list of everything in GetMetadata that doesnt exist in the lookup
Which would leave me with Proje.csv 
The hope is that I can use this to run a notebook so it only uses these files (Not sure how to do that yet but Im concentrating on the first bit)
 
I thought I could add another lookup. Connect it to the Filter (Source Files) and the Lookup (Processed Files) But Im really stuck at this point.
 
Should I be using a Lookup and what code should I use to
 
Get All Items from Filter where Not in Lookup?

 

1 ACCEPTED SOLUTION
frithjof_v
Super User
Super User

Perhaps you can do similar like below:

 

I have a Bronze Lakehouse and a Silver Lakehouse.

 

The files in my Bronze Lakehouse are as follows:

frithjof_v_0-1721915655333.png

 

 

The files in my Silver Lakehouse are as follows:

frithjof_v_1-1721915907699.png

 

I made a pipeline like this:

frithjof_v_2-1721915943800.png

 

 

The Get Metadata activities get the Child items metadata from the File folder in Bronze lakehouse and Silver lakehouse, respectively.

frithjof_v_3-1721916050920.png

 

The Filter activity removes the lookup.csv file from the output of the metadata activity from Bronze lakehouse:

frithjof_v_4-1721916166056.png

Items: @activity('Get Metadata Bronze').output.childItems

Condition: @not(equals(item().name, 'lookup.csv'))

 

 

The Items in the ForEach activity is the output from the Filter activity:

frithjof_v_5-1721916215603.png

Items: @activity('Filter Away Lookup file').output.Value

 

 

The If Condition inside the ForEach activity:

frithjof_v_6-1721916273266.png

Expression: @contains(activity('Get Metadata Silver').output.childItems, item())

 

 

The Copy activity if the If Condition is False:

frithjof_v_7-1721916333614.png

frithjof_v_8-1721916358371.png

After I run the pipeline, the Proje.csv file has been copied to Silver:

frithjof_v_9-1721916448145.png

 

 

 

I don't know if Fabric Data Pipeline has any limits (like output size, number of items in collection, number of items in foreach activity, etc.) which needs to be taken into consideration or it can result in pipeline failure or unexpected results if the number of files in any of the folders grow above the limits.

View solution in original post

7 REPLIES 7
frithjof_v
Super User
Super User

Perhaps you can do similar like below:

 

I have a Bronze Lakehouse and a Silver Lakehouse.

 

The files in my Bronze Lakehouse are as follows:

frithjof_v_0-1721915655333.png

 

 

The files in my Silver Lakehouse are as follows:

frithjof_v_1-1721915907699.png

 

I made a pipeline like this:

frithjof_v_2-1721915943800.png

 

 

The Get Metadata activities get the Child items metadata from the File folder in Bronze lakehouse and Silver lakehouse, respectively.

frithjof_v_3-1721916050920.png

 

The Filter activity removes the lookup.csv file from the output of the metadata activity from Bronze lakehouse:

frithjof_v_4-1721916166056.png

Items: @activity('Get Metadata Bronze').output.childItems

Condition: @not(equals(item().name, 'lookup.csv'))

 

 

The Items in the ForEach activity is the output from the Filter activity:

frithjof_v_5-1721916215603.png

Items: @activity('Filter Away Lookup file').output.Value

 

 

The If Condition inside the ForEach activity:

frithjof_v_6-1721916273266.png

Expression: @contains(activity('Get Metadata Silver').output.childItems, item())

 

 

The Copy activity if the If Condition is False:

frithjof_v_7-1721916333614.png

frithjof_v_8-1721916358371.png

After I run the pipeline, the Proje.csv file has been copied to Silver:

frithjof_v_9-1721916448145.png

 

 

 

I don't know if Fabric Data Pipeline has any limits (like output size, number of items in collection, number of items in foreach activity, etc.) which needs to be taken into consideration or it can result in pipeline failure or unexpected results if the number of files in any of the folders grow above the limits.

If there is a more efficient way to compare the two collections of child items from Get Metadata Silver and Get Metadata Bronze and return the items which only exist in the Get Metadata Bronze, then I would like to know.

 

(I am thinking if there exists some kind of anti join functionality, or similar?
Perhaps some way to do one array minus another array, which keeps only the items which are only in the first array?)

 

 

In my solution, I am using the ForEach activity with an IF condition inside to achieve a similar effect.

If you want to use the lookup.csv file to lookup which files don't need to be processed again (instead of using the file names in the Silver lakehouse directory for this purpose):

 

frithjof_v_1-1721920576261.png

 

In my case, the lookup.csv file has the following content:

frithjof_v_2-1721920667584.png

 

The 'ForEach LookupFileRow' activity:

Items: @activity('Get Lookup File Content').output.value

 

The 'Append varLookupFileNames' activity inside the 'ForEach LookupFileRow' activity:

frithjof_v_3-1721920804950.png

 

The 'IF Condition' inside the 'ForEach' activity:

Expression: @contains(variables('varLookupFileNames'), item().name)

 

 

Otherwise similar like the previous example pipeline.

 

 

I don't know if Fabric Data Pipeline has any size limits (like output size, number of items in collection, number of items in foreach activity, result size in lookup activity, etc.) which needs to be taken into consideration or it can result in pipeline failure or unexpected results if the number of files in any of the folders grow above the limits.

For example, the Lookup activity has some limitations:

Lookup activity - Microsoft Fabric | Microsoft Learn

Anonymous
Not applicable

HI @DebbieE,

I think you need a template list or query result that used to compare with current items, or you can't define which not exist and use to filter.

Regards,

Xiaoxin Sheng

I would need some specific information to work with here for how I would go about that. This is all in a fabric pipeline

Anonymous
Not applicable

Hi @DebbieE,

Here is the document link about use dataflow in data pipeline, you can use M query editor to operation with query table records:

Use a dataflow in a pipeline - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.