Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!
Hi,
I need to retrieve several historical csv files from a on-premise folder.
The name of each file mainly has a prefix, a middle string and a final date, in yyyymmdd format, plus the csv extension.
F.e. the name of a file could be sourceprefixfile_middlestring_20251231.csv.
In the remote folder there are a file for each day: sourceprefixfile_middlestring_20251201.csv, sourceprefixfile_middlestring_20251202.csv, sourceprefixfile_middlestring_20251203.csv and so on; but I need to get the end of month file as sourceprefixfile_middlestring_20251031.csv, sourceprefixfile_middlestring_20251130.csv, sourceprefixfile_middlestring_20251231.csv, etc and not all files.
So, I've implemented a SQL view in a warehouse to obtain the end of month files respect to each source prefix file starting from 2022-01-01. This view exposes the source prefix file and the month end date (in yyyymmdd format).
Then, I've tried to filter all files by the SQL view using a pipeline: I've implemented a lookup activity to read the SQL view, then a get metadata activity to get all file name (childitems), but I'd like to match the corresponding output arrays possibly without reading all files more times. I cannot implement an exact match, I can filter a file respect to the source prefix file and the month end date, returned by the lookup activity and not to an exact name.
Then I should use a for each activity to iterate the filtered files in order to copy them in a lakehouse.
Now, any suggests to me to implement the right filter condition for the filter activity? Many thanks
Hi @pmscorca ,
Here are a few additional best practices and enhancements you can consider to make your pipeline more reliable, maintainable, and scalable when filtering and copying historical end-of-month CSV files from on-prem folders using a SQL view + Data Pipeline approach in Microsoft Fabric:
Make Filtering Case-Insensitive (If Needed) -- If filenames might have inconsistent casing (e.g.,SourcePrefix_20240131.csv)
Trim Extra Whitespace in Lookup Output-- To ensure the prefix and month_end columns are trimmed to prevent false negatives
If this post helps, then please appreciate giving a Kudos or accepting as a Solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @pmscorca
Following up to confirm if the earlier responses addressed your query. If not, please share your questions and we’ll assist further.
Ok, about this issue I think that the For each should be sequential, isn't it?
Thanks
Hi @pmscorca
Yes, ForEach is the right approach here. You will use ForEach to loop through each SQL row.
Thank You.
you already have everything that you need in the lookup. The simplest approach would be:
Lookup gets rows: (prefix, month_end_yyyymmdd)
Get-metadata gets all filenames (childItems)
Filter activity keeps only files where:
filename startswith(prefix) AND filename contains(month_end)
In Fabric pipelines the filter condition looks like this:
@And(
startswith(item().name, item().prefix),
contains(item().name, item().month_end_yyyymmdd)
)
If your lookup returns an array of objects, wrap the filter under a ForEach on the lookup output, but do NOT do multiple Get Metadata calls:
1 Get Metadata
1 ForEach over lookup records
Inside: Filter on the same childItems list
This prevents repeated enumeration of the folder. You generate a single list of files and apply logical filtering per prefix/date combination.
Hi, thanks for your reply.
I'm implementing this solution:
The filter condition used is:
@And(
startswith(item().name, variables('SourcePrefixFile')),
contains(item().name, variables('MonthEndDate'))
)I think that the For each should be sequential.
Thanks