Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.

Reply
joshua1990
Post Prodigy
Post Prodigy

Ingesting CSV files and appending them with ODBC Data

I am working on building a table in our Lakehouse that consolidates data from both CSV files and an ODBC connection. However, I encountered some challenges regarding the upload of CSV files directly in lakehouse GUI; specifically, the column types are not defined, which is leading to issues.

To address this, I have created a dataflow that consolidates the CSV files into a single table. Now, I would like to extend this table by appending data retrieved via the ODBC connection. The data sourced from the ODBC connection mirrors the data contained in the CSV files, which we save quarterly before deleting it from its original source.

 

Is there a way to verify whether the information from the CSV files is already present before appending the ODBC data to the Lakehouse table? Essentially, I would like to perform a check for existing records prior to the appending process.

Your insights on this process would be greatly appreciated! Maybe a dataflow is not the best approach here

1 ACCEPTED SOLUTION
v-lgarikapat
Community Support
Community Support

Hi @joshua1990 

Thanks for reaching out to the Microsoft fabric community forum.

@lbendlin ,

Thanks for your prompt response

@joshua1990 ,

 

Step-by-Step Approach to Append ODBC Data to Lakehouse after Checking CSV History

1.Standardize Schema Early

You’ve likely done this in your dataflow, but just to be safe: define explicit column types and ensure both CSV and ODBC sources match. You can do this in Power Query inside your Dataflow by setting each column's data type manually  this avoids schema drift or mismatches downstream

2.Stage ODBC Data in a Temporary Table

Before appending to your main table, load the ODBC data into a staging table in the Lakehouse (e.g., stg_odbc_quarterly). You can ingest using Dataflow Gen2, Pipelines, or even a Notebooks-based Fabric data task

3.Deduplicate with a SQL Statement

Once you have both datasets available in the Lakehouse, use a MERGE or INSERT ... SELECT SQL statement with a NOT EXISTS or LEFT JOIN condition to append only new records:

INSERT INTO final_consolidated_table

SELECT *

FROM stg_odbc_quarterly AS o

WHERE NOT EXISTS (

    SELECT 1

    FROM final_consolidated_table AS f

    WHERE f.primary_key = o.primary_key

);

Make sure the primary_key (or composite key) you're comparing is consistent between both CSV and ODBC sources.

 

If this post helped resolve your issue, please consider the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.

We appreciate your engagement and thank you for being an active part of the community.

Best regards,
LakshmiNarayana
.

View solution in original post

7 REPLIES 7
v-lgarikapat
Community Support
Community Support

Hi @joshua1990 

Thanks for reaching out to the Microsoft fabric community forum.

@lbendlin ,

Thanks for your prompt response

@joshua1990 ,

 

Step-by-Step Approach to Append ODBC Data to Lakehouse after Checking CSV History

1.Standardize Schema Early

You’ve likely done this in your dataflow, but just to be safe: define explicit column types and ensure both CSV and ODBC sources match. You can do this in Power Query inside your Dataflow by setting each column's data type manually  this avoids schema drift or mismatches downstream

2.Stage ODBC Data in a Temporary Table

Before appending to your main table, load the ODBC data into a staging table in the Lakehouse (e.g., stg_odbc_quarterly). You can ingest using Dataflow Gen2, Pipelines, or even a Notebooks-based Fabric data task

3.Deduplicate with a SQL Statement

Once you have both datasets available in the Lakehouse, use a MERGE or INSERT ... SELECT SQL statement with a NOT EXISTS or LEFT JOIN condition to append only new records:

INSERT INTO final_consolidated_table

SELECT *

FROM stg_odbc_quarterly AS o

WHERE NOT EXISTS (

    SELECT 1

    FROM final_consolidated_table AS f

    WHERE f.primary_key = o.primary_key

);

Make sure the primary_key (or composite key) you're comparing is consistent between both CSV and ODBC sources.

 

If this post helped resolve your issue, please consider the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.

We appreciate your engagement and thank you for being an active part of the community.

Best regards,
LakshmiNarayana
.

Hi @joshua1990 ,

If your issue has been resolved, please consider marking the most helpful reply as the accepted solution. This helps other community members who may encounter the same issue to find answers more efficiently.

If you're still facing challenges, feel free to let us know we’ll be glad to assist you further.

Looking forward to your response.

Best regards,
LakshmiNarayana.

Hi @joshua1990 ,

If your question has been answered, kindly mark the appropriate response as the Accepted Solution. This small step goes a long way in helping others with similar issues.

We appreciate your collaboration and support!

Best regards,
LakshmiNarayana

Hi @joshua1990 ,

As we haven't heard back from you, we are closing this thread. If you are still experiencing the issue, please feel free to create a new thread we’ll be happy to assist you further.Thank you for your patience and support.

If you found our response helpful, please mark it as Accepted Solution and consider giving a Kudos, so others with similar queries can find it easily.

 

Best Regards,

Lakshmi Narayana

lbendlin
Super User
Super User

Two things to keep in mind

 

1. CSV data sources by definition do not fold.

2. The smallest granularity you can achieve with non-folding data sources is the partition level. The smallest partition you can create via normal means is a daily partition

 

Ideally you would load all that data into a SQL database (in Fabric or elsewhere) and then do the deduplication there.

@lbendlin : Why not just using a dataflow instead of creating a full SQL database for the deduplication?

A dataflow is just a bunch of CSV files in Azure Blob storage. Duplication of storage for no functional gain.

Helpful resources

Announcements
September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.