Reply
amaaiia
Super User
Super User

Error partitioning lakehouse tables depending on the order of partitions

Hi,

I have a Data Pipeline to COPY tables from one lakehouse into another lakehouse. Tables are partitioned by some columns. I'm getting this error:

amaaiia_1-1741686395411.png

 

I've tried to reproduce it in a simple way:

1. Create random dataframe:

 

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit
from pyspark.sql.types import StringType, IntegerType, DateType
import pandas as pd
import numpy as np

spark = SparkSession.builder.appName("CrearDataFrame").getOrCreate()

columnas = ['Fecha', 'Texto1', 'Texto2', 'Texto3', 'Entero1', 'Entero2', 'Entero3']
datos = {
    'Fecha': pd.date_range(start='2023-01-01', periods=20, freq='D'),
    'Texto1': [f'Texto1_dato{i}' for i in range(1, 21)],
    'Texto2': [f'Texto2_dato{i}' for i in range(1, 21)],
    'Texto3': [f'Texto3_dato{i}' for i in range(1, 21)],
    'Entero1': np.random.randint(1, 100, size=20),
    'Entero2': np.random.randint(1, 100, size=20),
    'Entero3': np.random.randint(1, 100, size=20)
}

df_pandas = pd.DataFrame(datos)

df_spark = spark.createDataFrame(df_pandas)

 

 2. Write it into a lakehouse partitioned by 1 string field an 1 integer field:

 

df_spark.write.format('delta').partitionBy('Texto1','Entero1').saveAsTable('tmp_1t_1e')

 

 3. Write it also into a lakehouse partitioned by same columns in different order:

 

df_spark.write.format('delta').partitionBy('Entero1','Texto1').saveAsTable('tmp_1e_1t')

 

4. In a Data Pipeline, I try to COPY them as new delta table in the same lakehouse:

amaaiia_3-1741686678110.png

amaaiia_8-1741687013676.png

 

If I partition the table with string field first, it works. If I partition with integer field first, it fails with the provided error.

 

I've checked the source table files and are correctly partitioned:

tmp_1e_1t:

amaaiia_4-1741686796585.png

amaaiia_5-1741686824357.png

 

tmp_1t_1e:

amaaiia_6-1741686917556.png

 

amaaiia_7-1741686944406.png

 

Why is this happening?

 

 

12 REPLIES 12
V-yubandi-msft
Community Support
Community Support

Hello @amaaiia ,

 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

No, I haven't. I have a ticket opened with support. They've been able to reproduce the same issue in their environment, but I'm still waiting the solution.

Okay, thank you for your response. Once you receive any updates, if possible, please share them here - it could be helpful for others as well.

V-yubandi-msft
Community Support
Community Support

Hello @amaaiia ,

Thank you for reaching out to the Microsoft Fabric Community. When you partition with the string field first, the Copy operation proceeds smoothly as everything is in the expected order. However, starting with the integer field causes errors due to a mismatch in partition order.

 

1. Delta Lake requires consistency in partition column order for both reads and writes. The Copy activity in Fabric Data Pipeline encounters issues when partition orders differ, resulting in errors.

 

2. To resolve this, ensure the partition column order matches between the source and destination tables. If a change in partition order is necessary, write the data to a non-partitioned table first, then use a CTAS statement to create a properly partitioned table.

 

I hope this clarifies the process. Please let me know if you need further assistance.

Source table is partitioned, but destination table is not partitioned. In fact, destination table doesn't exist before COPYing it, it's a new table.

 

The issue is caused when reading data from source, I guess.

Hello @amaaiia ,

Thank you for the clarification. Since the destination table does not exist before the COPY operation, Fabric should create it automatically. The issue appears to occur when reading from the source, indicating that the organization of the source table might be causing the problem.

 

Could you please attempt to query the source table directly in a Spark notebook to verify if it reads without any issues? Additionally, reviewing the partition metadata in the Delta table could help identify any inconsistencies in partition handling.

 

Regards,

Yugandhar.

Source table can be read by spark notebook:

amaaiia_2-1741763591799.png

 

And also by SQL Analytics Endpoint:

amaaiia_1-1741763515908.png

And I don't see anything weird in the metadata. As you can see, the source table it's been created right before trying to COPY it. You can reproduce the same issue with the provided code.

Hello @amaaiia ,

Thank you for the classification.  Based on your previous response, it appears the issue occurs while reading the source table. Since the problem arises specifically when the first partition column is an integer, it is possible that Fabric’s COPY operation struggles with interpreting partition metadata in Delta Lake. Could you please try using a different read method in the COPY activity to see if it resolves the issue?

 

I hope this is helpful.

What do you mean with different read method?

If possible, consider loading the data into a temporary unpartitioned table first, then using the COPY operation. This may not directly solve the issue, but it could help avoid the problem and improve data transfer efficiency.


data is partitioned in source, not in destination. tmp_1e_1t and tmp_1t_1e tables are source tables, they're partitioned by string and integer columns. What I'm trying is to COPY them into a new table (not partitioned).

Finally, The problem appears to occur during the data reading process from the source.

If the issue continues after attempting the recommended solutions, I advise submitting a support ticket with Microsoft. This will enable their support team to conduct a deeper investigation and offer a more customized resolution.
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

 

Regards,

Yugandhar.

avatar user

Helpful resources

Announcements
MarchFBCvideo - carousel

Fabric Monthly Update - March 2025

Check out the March 2025 Fabric update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)