Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
heastham
New Member

Exclude All Null Column

Hi There,

 

I am building a pipeline that copies some tables from one Database and plonks it into an Azurew SQL Database, in which I can then save the tables into a Lakehouse as Parquet files. My pipeline is currently failing, with the following error code at the Copy Each Table point: 


Details: Activity failed because an inner activity failed; Inner activity name: Save Table to SQL Database, Error: Failure happened on 'destination' side. ErrorCode=UserErrorInvalidColumnName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The column LandscapePreviewImageTitle is not found in target side,Source=Microsoft.DataTransfer.ClientLibrary,'

After investigating, it turns out that a new column has been added to a table and the column contains all Null values. I have set the pipeline up to be dynamic (in the sense that regardless of table changes, it should still work).How would I go about excluding columns that have all null values? 

 

Any help would be really appreciated! 

 

heastham_0-1698310585522.png

 

 

7 REPLIES 7
AndyDDC
Super User
Super User

Just to confirm,

  • Is this a Fabric Data Pipeline? (e.g. not a data factory pipeline)
  • The first Copy Data task populates a list of available tables for import?
  • The For Each contains the copy task to move the data from the source database to the azure sql database, and then from the Azure SQL db to the Lakehouse?
  • Has the new column been added to the source database that you're loading from?
  • The For Each is failing because the new column doesn't exist downstream in the Azure SQL database?
  • Is there a reason why you're not writing directly to the Lakehouse from the source database (eg cut out the azure sql db)?

Hi @AndyDDC , thanks for responding! 

In answer to your bullet points:

  • The first Copy Data task populates a list of available tables for import?
    Yes, it runs through and lists out all of the tables for import. The table in question shows up in this step. 
  • The For Each contains the copy task to move the data from the source database to the azure sql database, and then from the Azure SQL db to the Lakehouse?
    Yes, it sends a copy of the data to the Azure database and then from there to the Lakehouse as Parquet files. 
  • Has the new column been added to the source database that you're loading from?
    Yes, the new column was added to the source database last week, but it wasn't communicated that it was being added so we only found out about it through the pipeline breaking.
  • The For Each is failing because the new column doesn't exist downstream in the Azure SQL database?
    I believe so, but I'm not sure how to go about fixing this step as there is no explicit mapping to the source to be able to define the column or data type. I think this is the step that I'm just not getting my head around. I'm still getting to grasps with it all as I'm an analyst by trade but having to fill in until we get a proper engineer so trying my best until then 😊
  • Is there a reason why you're not writing directly to the Lakehouse from the source database (eg cut out the azure sql db)?
    We're building to a medallion architecture so the parquet files will give us some flexibility when applying transformations for the 'Silver Layer'. Definitely more than happy to listen to any suggestions if there are better ways of doing it. 

Thanks.

 

So this is just my opinion on this, but you could write from the source database directly to the Lakehouse Files section (your raw/bronze layer).  That would mean that any new columns/altered columns in the source will just get automatically written to Parquet files in the lakehouse files section.

 

AndyDDC_0-1698318429813.png

 

 

The issue as you've discovered is that there is no auto-update on the azure sql database table schema, you'd have to either:

  • drop and recreate the table in the azure sql database using a pre-copy script in the copy data task and set the auto-create table option on (this will of course delete the data from the azure sql db)
  • AndyDDC_2-1698318492577.png

     

 

  • for each table, pass in a sql statement that only included the columns you want into the copy data task
  • AndyDDC_1-1698318453486.png

     

 

 

 

You could go one step further and just write the source data to the lakehouse tables and use Overwrite (although it does "overwrite" the data in the destination table, although keeps history using delta versioning).

Hi @heastham ,
We haven’t heard from you on the last response and was just checking back to see if your query got resolved. Please let us know if you have any further queries.

Hi @v-nikhilan-msft - I've been frozen out of the fatory for the past few days due to 'capacity' issues. We don't have nearly enough data for this to even be a problem yet so I'm logging a ticket currently with Microsoft to see what the issue is. Once I'm successfully back in then I can attempt the solutions given above. Thanks! 

Hi @heastham ,
Thanks for the update. 

Just a thought, but if you use the Overwrite option when writing to a Lakehouse table (not Files as the schema could evolve anyway if you were just writing parquet files) then that will evolve the schema of the lakehouse table automatically when source columns get added to the source.

 

Not sure if that could help in your situation?  Overwrite would replace the data downstream with the latest incoming so not sure if that's what you want

 

AndyDDC_0-1698317461666.png

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.