Re: Problem loading Lakehouse file/table into ware...

westf · ‎11-04-2024

Hello,

I'm facing issues loading data from Fabric Lakehouse Table/File to Fabric Warehouse. Below is my notebook code. Im using pyspark

lakehouse_table_path = f"{lakehouse_path}/Tables/{table_name}"
lakehouse_df = spark.read.format("delta").load(lakehouse_table_path)

# Infer the schema
schema = lakehouse_df.schema


# Function to map Spark SQL data types to SQL Server data types
def map_data_type(spark_type):
    if spark_type == "string":
        return "VARCHAR(4000)"
    elif spark_type == "INT":
        return "INT"
    elif spark_type == "bigint":
        return "BIGINT"    
    elif spark_type == "smallint":
        return "INT"        
    elif spark_type == "Double":
        return "FLOAT"
    elif spark_type == "boolean":
        return "BIT"
    elif spark_type == "timestamp":
        return "DATETIME2(3)"
    else:
        return spark_type  # Default to NVARCHAR for other types

 
# Generate the CREATE TABLE statement
create_table_query = f"CREATE TABLE {DestTableSchemaName}.{DestTableName} ("
for field in schema.fields:
    sql_type = map_data_type(field.dataType.simpleString())
    create_table_query += f"{field.name} {sql_type}, "
create_table_query = create_table_query.rstrip(", ") + ")"
print(create_table_query)

df= pd.read_parquet(ParquetFileLocation)
constr = (
    f"Driver={{ODBC Driver 18 for SQL Server}};"
    f"Server={server};"
    f"Database={database};"
    f"UID={clientId};"
    f"PWD={clientSecret};"
    f"Authentication=ActiveDirectoryServicePrincipal;"
    f"Encrypt=yes;"
    f"TrustServerCertificate=no;"
    f"Connection Timeout=30;"
)

# Establish connection
conn = pyodbc.connect(constr)
cursor = conn.cursor()

print(create_table_query)

cursor.execute(create_table_query)
conn.commit()


tblnm ="dbo.contract"
print(tblnm)
engine = create_engine(f"mssql+pyodbc:///?odbc_connect={constr}")
df.to_sql(tblnm, engine, if_exists='append', index=False)

cursor.close()
conn.close()

Now when i run the script, based on the schema i got the table created in the Fabric Warehouse.
however, when i try to push the dataframe data into the created table, i get the below error.

ProgrammingError<span>: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The data type 'datetime' is not supported in this edition of SQL Server.
(24574) (SQLExecDirectW)") [SQL:
CREATE TABLE dbo.contract ( [Id] BIGINT NULL, [ContractId] BIGINT NULL, [VIN] VARCHAR(max) NULL, [RatedVIN] VARCHAR(max) NULL, [Make] VARCHAR(max) NULL, [Model] VARCHAR(max) NULL, [ModelYear] SMALLINT NULL, [Class] VARCHAR(max) NULL, [SeriesTrim] VARCHAR(max) NULL,
[FuelType] VARCHAR(max) NULL, [Cylinders] VARCHAR(max) NULL, [InServiceDate] DATETIME NULL, [IsNew] BIT NULL,
[PurchasePrice] VARCHAR(max) NULL, [LastModifiedBy] VARCHAR(max) NULL,[CreatedOnDateUtc] DATETIME NULL )

however, the table i created has the correct data types.
Not sure why when i try to push the dataframe data, it tries to create the same table again with unsupported datatypes

Srisakthi · ‎11-06-2024

Hi @westf ,

It seems like there is an issue in datetime data type. Can you check your mapping function . datetime2(6) should be used.

Ref : https://learn.microsoft.com/en-us/fabric/data-warehouse/data-types

Regards,

Srisakthi

westf · ‎11-07-2024

Hello, Thanks for the reply. I don't have any problem creating the table in Warehouse because im generating the create sql. The problem is when i try to push the data frame data into the newly created sql table.

For some reason, though i set that to APPEND, it is trying to create another table by itself and i have no control over the create df.to_sql.

Anonymous · ‎11-04-2024

Hi @westf

It seems df.to_sql() is a Pandas dataframe function. I haven't tested it yet.

BTW, do you consider using copy data activity in data pipeline to copy data from lakehouse into warehouse? You can set up the column mapping within the activity. This will be easier.

How to copy data using copy activity - Microsoft Fabric | Microsoft Learn

Best Regards,
Jing
Community Support Team

westf · ‎11-07-2024

Thanks for the reply. i can't use copy activity becuase im writing dynamic pipeline using notebook. All i need is a way to push Lakhouse table data into Warehouse.

Anonymous · ‎11-08-2024

Hi @westf

You may check the data type of the datetime column in the Pandas dataframe. In my testing, I got the following datetime64. It may be not compatible with the datetime2 type in the destination warehouse. I guess this might be the cause of the error. Not sure if this data type could be converted to meet the target. You may research this further.

Best Regards,
Jing