Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!

Reply
Mamatha77
New Member

How to Validate Blank Lines in CSV Files Using Microsoft Fabric Data Pipelines

Hi Community,

I’m working on a requirement in Microsoft Fabric where I need to validate CSV files (comma or pipe-delimited) stored in a Lakehouse. The goal is to identify blank lines in two specific scenarios:

 

Validation Rules

  1. Blank Lines Before the Header

    • The first non-blank line should be treated as the header.
    • If any blank lines exist above the header, the file should be marked as invalid.
  2. Blank Lines After the Header

    • Any blank rows in the data section should also make the file invalid.

Additional Constraints

  • The solution must be implemented entirely within Microsoft Fabric (no Azure Data Factory).
  • Dataflow Gen2 should not be used.
  • Files should remain in the same folder; we only need an output response or log indicating which files are valid or invalid.

Could you please guide me on how to achieve this using Fabric Data Pipelines? Your suggestions and best practices would be greatly appreciated—they keep learners like me motivated!

Thank you!

1 ACCEPTED SOLUTION
Ugk161610
Continued Contributor
Continued Contributor

Hi @Mamatha77 ,

 

This is a really good quality-check requirement, but with the constraints you listed it’s important to be clear about what Fabric can and can’t do today.

 

Fabric Data Pipelines by themselves (Copy, Lookup, Get Metadata, etc.) can see files and basic properties (name, size, modified time), but they cannot read file content line-by-line. That means they can’t look inside a CSV and say “this line is blank”, “this is the header”, or “this row is empty after the header”.

 

Those kinds of checks need something that can actually parse the text — in Fabric that’s:

 

  • a Notebook (Spark / Python), or

  • Dataflow Gen2 (Power Query),

and you’ve already ruled out Dataflow Gen2.

 

So, staying fully inside Fabric and keeping the files in place, the realistic pattern is:

 

 

  • use a Data Pipeline to orchestrate,

  • call a Notebook activity that does the actual validation,

  • have the notebook write a small log table (e.g. csv_validation_log) in the Lakehouse with columns like file_path, is_valid, reason.

The notebook would just read each CSV as text, find the first non-blank line as the header, then check for blank lines before or after it and record the result for each file. The CSVs themselves stay where they are; only the log is written.

 

If you are not allowed to use notebooks either, and the rule is “pipelines only, no Dataflow Gen2, no ADF, no external services”, then the honest answer is:

 

With the current features, Fabric Data Pipelines alone cannot implement this exact validation, because there is no built-in activity that can inspect CSV content at the line level.

 

In that case, the design would need to be relaxed slightly (for example, allow a small notebook just for validation) to be achievable purely in Fabric.

 

So just to summarise in plain terms:

  • Yes, you can do this inside Fabric with a pipeline + notebook and a validation log.

  • No, you cannot do it with pipeline activities only today, because they can’t look inside the file content.

– Gopi Krishna

 

 

 

View solution in original post

2 REPLIES 2
Ugk161610
Continued Contributor
Continued Contributor

Hi @Mamatha77 ,

 

This is a really good quality-check requirement, but with the constraints you listed it’s important to be clear about what Fabric can and can’t do today.

 

Fabric Data Pipelines by themselves (Copy, Lookup, Get Metadata, etc.) can see files and basic properties (name, size, modified time), but they cannot read file content line-by-line. That means they can’t look inside a CSV and say “this line is blank”, “this is the header”, or “this row is empty after the header”.

 

Those kinds of checks need something that can actually parse the text — in Fabric that’s:

 

  • a Notebook (Spark / Python), or

  • Dataflow Gen2 (Power Query),

and you’ve already ruled out Dataflow Gen2.

 

So, staying fully inside Fabric and keeping the files in place, the realistic pattern is:

 

 

  • use a Data Pipeline to orchestrate,

  • call a Notebook activity that does the actual validation,

  • have the notebook write a small log table (e.g. csv_validation_log) in the Lakehouse with columns like file_path, is_valid, reason.

The notebook would just read each CSV as text, find the first non-blank line as the header, then check for blank lines before or after it and record the result for each file. The CSVs themselves stay where they are; only the log is written.

 

If you are not allowed to use notebooks either, and the rule is “pipelines only, no Dataflow Gen2, no ADF, no external services”, then the honest answer is:

 

With the current features, Fabric Data Pipelines alone cannot implement this exact validation, because there is no built-in activity that can inspect CSV content at the line level.

 

In that case, the design would need to be relaxed slightly (for example, allow a small notebook just for validation) to be achievable purely in Fabric.

 

So just to summarise in plain terms:

  • Yes, you can do this inside Fabric with a pipeline + notebook and a validation log.

  • No, you cannot do it with pipeline activities only today, because they can’t look inside the file content.

– Gopi Krishna

 

 

 

AntoineW
Memorable Member
Memorable Member

Hi @Mamatha77,

 

Data quality validation is the "silent guardian" of any data platform.

You have a very specific "physical" validation requirement (detecting blank lines) rather than a "logical" one (checking column types). Because of this, Fabric Data Pipelines alone (using Get Metadata or Lookup activities) are not the right tool for the actual validation logic. They cannot easily "see" a blank line before a header; they will simply try to parse the next line as the header or fail silently.

The Solution: Use a Fabric Pipeline to orchestrate the process, but offload the logic to a Fabric Notebook (using Python).

 

You'll build a pipeline that:

  1. Lists all CSV files in a Lakehouse folder (using a Notebook).

  2. For each file:

    • Reads it as raw text (not as a table).

    • Detects:

      • Blank lines before the header

      • Blank lines in the data rows

    • Logs each file as Valid or Invalid into a Lakehouse table.

  3. Pipeline orchestrates the notebook and refreshes the validation table.

 

Hope it can help you !

Best regards,

Antoine

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.