Solved: Fabric Copy Data Pipeline Shifting Data In Columns

mrowe-revlocal · ‎08-23-2024

I recently stumbled upon an odd issue with the copy data pipeline activity. The activity is loading the source data into the wrong destination columns. My pipeline is very simple. I execute a pyspark notebook that creates a table in a Lakehouse. When the notebook finishes, I execute a copy data activity to copy data out of the Lakehouse into a Warehouse. When comparing the two data sets, the first several columns are correct, but others seem to have the values shifted into the wrong columns. Performing a simial activity with a Dataflow (gen 2) works fine, but my objective requires the use of the copy data activity.

Notice when selecting from the Lakehouse (top) and Warehouse (bottom) how the data has shifted columns:

lbendlin · ‎08-26-2024

If your meta data changes halfway through then you cannot realistically expect this to be automated and robust. Either start over (flush and fill the parquets) or maybe implement some error handling for missing columns.

View solution in original post

Anonymous · ‎08-25-2024

Hi @mrowe-revlocal

Thank you very much lbendlin for your prompt reply.

I have tested your question and here are my results:

I recommend that you try configuring the mapping manually.

https://learn.microsoft.com/en-us/fabric/data-factory/copy-data-activity#configure-your-mappings-und...

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

mrowe-revlocal · ‎08-26-2024

Thank you Nono & Ibendlin,

When using mapped columns, I receive the following error:

ErrorCode=ParquetColumnNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column lineid does not exist in Parquet file.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,'

I won't pretend to fully understand how the data is stored or organized in the Lakehouse, but I'm guessing there are sequention Parquet or Delta files. The first likely has fewer columns than a later file...

When loading the Lakehouse, I'm selecting * from the source system via a rest api that returns paged results in 1000 rows increments.

I could potentially define every column in my source system query, but I was realy hoping to make this entire flow dynamic and abstract the entire process to avoid creating notebooks & copy data pipeline activities for hundreds of tables.

lbendlin · ‎08-26-2024

If your meta data changes halfway through then you cannot realistically expect this to be automated and robust. Either start over (flush and fill the parquets) or maybe implement some error handling for missing columns.

lbendlin · ‎08-23-2024

Weird indeed. What if you use forced mapping?

Fabric Copy Data Pipeline Shifting Data In Columns

Helpful resources

Fabric Community Update - July 2025

Fabric Monthly Update - June 2025

Party with Power BI’s own Guy in a Cube

Fabric Copy Data Pipeline Shifting Data In Columns

Helpful resources

Fabric Community Update - July 2025

Fabric Monthly Update - June 2025