Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I recently stumbled upon an odd issue with the copy data pipeline activity. The activity is loading the source data into the wrong destination columns. My pipeline is very simple. I execute a pyspark notebook that creates a table in a Lakehouse. When the notebook finishes, I execute a copy data activity to copy data out of the Lakehouse into a Warehouse. When comparing the two data sets, the first several columns are correct, but others seem to have the values shifted into the wrong columns. Performing a simial activity with a Dataflow (gen 2) works fine, but my objective requires the use of the copy data activity.
Notice when selecting from the Lakehouse (top) and Warehouse (bottom) how the data has shifted columns:
Solved! Go to Solution.
If your meta data changes halfway through then you cannot realistically expect this to be automated and robust. Either start over (flush and fill the parquets) or maybe implement some error handling for missing columns.
Thank you very much lbendlin for your prompt reply.
I have tested your question and here are my results:
I recommend that you try configuring the mapping manually.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you Nono & Ibendlin,
When using mapped columns, I receive the following error:
ErrorCode=ParquetColumnNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column lineid does not exist in Parquet file.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,'
I won't pretend to fully understand how the data is stored or organized in the Lakehouse, but I'm guessing there are sequention Parquet or Delta files. The first likely has fewer columns than a later file...
When loading the Lakehouse, I'm selecting * from the source system via a rest api that returns paged results in 1000 rows increments.
I could potentially define every column in my source system query, but I was realy hoping to make this entire flow dynamic and abstract the entire process to avoid creating notebooks & copy data pipeline activities for hundreds of tables.
If your meta data changes halfway through then you cannot realistically expect this to be automated and robust. Either start over (flush and fill the parquets) or maybe implement some error handling for missing columns.
Weird indeed. What if you use forced mapping?