Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!
Hi Community,
I’m working on a requirement in Microsoft Fabric where I need to validate CSV files (comma or pipe-delimited) stored in a Lakehouse. The goal is to identify blank lines in two specific scenarios:
Blank Lines Before the Header
Blank Lines After the Header
Could you please guide me on how to achieve this using Fabric Data Pipelines? Your suggestions and best practices would be greatly appreciated—they keep learners like me motivated!
Thank you!
Solved! Go to Solution.
Hi @Mamatha77 ,
This is a really good quality-check requirement, but with the constraints you listed it’s important to be clear about what Fabric can and can’t do today.
Fabric Data Pipelines by themselves (Copy, Lookup, Get Metadata, etc.) can see files and basic properties (name, size, modified time), but they cannot read file content line-by-line. That means they can’t look inside a CSV and say “this line is blank”, “this is the header”, or “this row is empty after the header”.
Those kinds of checks need something that can actually parse the text — in Fabric that’s:
a Notebook (Spark / Python), or
Dataflow Gen2 (Power Query),
and you’ve already ruled out Dataflow Gen2.
So, staying fully inside Fabric and keeping the files in place, the realistic pattern is:
use a Data Pipeline to orchestrate,
call a Notebook activity that does the actual validation,
have the notebook write a small log table (e.g. csv_validation_log) in the Lakehouse with columns like file_path, is_valid, reason.
The notebook would just read each CSV as text, find the first non-blank line as the header, then check for blank lines before or after it and record the result for each file. The CSVs themselves stay where they are; only the log is written.
If you are not allowed to use notebooks either, and the rule is “pipelines only, no Dataflow Gen2, no ADF, no external services”, then the honest answer is:
With the current features, Fabric Data Pipelines alone cannot implement this exact validation, because there is no built-in activity that can inspect CSV content at the line level.
In that case, the design would need to be relaxed slightly (for example, allow a small notebook just for validation) to be achievable purely in Fabric.
So just to summarise in plain terms:
Yes, you can do this inside Fabric with a pipeline + notebook and a validation log.
No, you cannot do it with pipeline activities only today, because they can’t look inside the file content.
– Gopi Krishna
Hi @Mamatha77 ,
This is a really good quality-check requirement, but with the constraints you listed it’s important to be clear about what Fabric can and can’t do today.
Fabric Data Pipelines by themselves (Copy, Lookup, Get Metadata, etc.) can see files and basic properties (name, size, modified time), but they cannot read file content line-by-line. That means they can’t look inside a CSV and say “this line is blank”, “this is the header”, or “this row is empty after the header”.
Those kinds of checks need something that can actually parse the text — in Fabric that’s:
a Notebook (Spark / Python), or
Dataflow Gen2 (Power Query),
and you’ve already ruled out Dataflow Gen2.
So, staying fully inside Fabric and keeping the files in place, the realistic pattern is:
use a Data Pipeline to orchestrate,
call a Notebook activity that does the actual validation,
have the notebook write a small log table (e.g. csv_validation_log) in the Lakehouse with columns like file_path, is_valid, reason.
The notebook would just read each CSV as text, find the first non-blank line as the header, then check for blank lines before or after it and record the result for each file. The CSVs themselves stay where they are; only the log is written.
If you are not allowed to use notebooks either, and the rule is “pipelines only, no Dataflow Gen2, no ADF, no external services”, then the honest answer is:
With the current features, Fabric Data Pipelines alone cannot implement this exact validation, because there is no built-in activity that can inspect CSV content at the line level.
In that case, the design would need to be relaxed slightly (for example, allow a small notebook just for validation) to be achievable purely in Fabric.
So just to summarise in plain terms:
Yes, you can do this inside Fabric with a pipeline + notebook and a validation log.
No, you cannot do it with pipeline activities only today, because they can’t look inside the file content.
– Gopi Krishna
Hi @Mamatha77,
Data quality validation is the "silent guardian" of any data platform.
You have a very specific "physical" validation requirement (detecting blank lines) rather than a "logical" one (checking column types). Because of this, Fabric Data Pipelines alone (using Get Metadata or Lookup activities) are not the right tool for the actual validation logic. They cannot easily "see" a blank line before a header; they will simply try to parse the next line as the header or fail silently.
The Solution: Use a Fabric Pipeline to orchestrate the process, but offload the logic to a Fabric Notebook (using Python).
You'll build a pipeline that:
Lists all CSV files in a Lakehouse folder (using a Notebook).
For each file:
Reads it as raw text (not as a table).
Detects:
Blank lines before the header
Blank lines in the data rows
Logs each file as Valid or Invalid into a Lakehouse table.
Pipeline orchestrates the notebook and refreshes the validation table.
Hope it can help you !
Best regards,
Antoine