The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Using Fabric, I created a dataset stored in delta parquet format and partitiones by EventData=YYYY-MM-DD. Then, I'm running a Pyspark script to load this data into "Tables". It will generate a table named "pageview_delta_small" but without any columns. If I create my data without partitions, it will work. What am I doing wrong?
Solved! Go to Solution.
Hi @Krumelur ,
Thanks for using Fabric Community.
You can explicitly define the schema for your DataFrame before writing it to the table. This ensures Spark uses the correct schema regardless of the partition data.
# Define schema as a list of tuples (column_name, data_type)
schema = [("column1", "string"), ("column2", "int"), ...]
# Load your delta parquet data
df = spark.read.format("delta").load("path/to/your/data")
# Write data with explicit schema
df.write.format("delta").option("partitionBy", "EventData").saveAsTable("pageview_delta_small", schema=schema)
Can you please try above code?
Hope this is helpful. Please do let me know incase of further queries.
Your solution uses Python and I can confirm that it's working fine there, even without specifying a schema.
However, using the SQL syntax, the table will be empty.
Hi @Krumelur ,
Can you please try below Spark SQL code?
CREATE TABLE IF NOT EXISTS pageview_delta_small -- Ensure this matches the expected table name
USING DELTA
PARTITIONED BY (EventData) -- Specify the partitioning column
LOCATION '/data/pageviews'; -- Location of your Delta table data
This won't work because: It is not allowed to specify partitioning when the table schema is not defined.
At this point, I just use Python. 🙂
Hi @Krumelur ,
Thanks for your reply.
Glad to know that you were able to get to a resolution using pyspark. Please continue using Fabric Community on your further queries.
Hi @Krumelur ,
Thanks for using Fabric Community.
You can explicitly define the schema for your DataFrame before writing it to the table. This ensures Spark uses the correct schema regardless of the partition data.
# Define schema as a list of tuples (column_name, data_type)
schema = [("column1", "string"), ("column2", "int"), ...]
# Load your delta parquet data
df = spark.read.format("delta").load("path/to/your/data")
# Write data with explicit schema
df.write.format("delta").option("partitionBy", "EventData").saveAsTable("pageview_delta_small", schema=schema)
Can you please try above code?
Hope this is helpful. Please do let me know incase of further queries.
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
20 | |
18 | |
6 | |
5 | |
4 |