Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hi there - trying to lload a tabe from a semantic model to a lakehouse but the issue is that the table in the semantic model I want to load from has column names that start with Capital letters (for ease of use).
Obviously this breaks the rules of lakehouse tables - how do i fix this - ideally dynamically ?
Code below
Cell In[8], line 13
10 workspace_Safe_ID = workspace_ID.replace("-","_")
11 df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)
---> 13 df_table.to_lakehouse_table("Environments", "overwrite")
File ~/cluster-env/trident_env/lib/python3.10/site-packages/sempy/_utils/_log.py:273, in mds_log.<locals>.get_wrapper.<locals>.log_decorator_wrapper(*args, **kwargs)
270 raise
272 try:
--> 273 result = func(*args, **kwargs)
275 # The invocation for get_message_dict moves after the function
276 # so it can access the state after the method call
277 message.update(extractor.get_completion_message_dict(result, arg_dict))
File ~/cluster-env/trident_env/lib/python3.10/site-packages/sempy/fabric/_dataframe/_fabric_dataframe.py:559, in FabricDataFrame.to_lakehouse_table(self, name, mode, spark_schema)
550 with SparkConfigTemporarily(spark, "spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED"):
551 spark_df = spark.createDataFrame(df_converted, schema=converted_schema)
553 (spark_df.write
554 .option("parquet.vorder.enabled", True)
555 .mode(mode)
556 .format("delta")
557 # enable column mapping to support special characters common w/ Power BI (e.g. [])
558 .option("delta.columnMapping.mode", "name")
--> 559 .saveAsTable(name))
File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:1521, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options)
1519 if format is not None:
1520 self.format(format)
-> 1521 self._jwrite.saveAsTable(name)
File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
171 converted = convert_exception(e.java_exception)
172 if not isinstance(converted, UnknownException):
173 # Hide where the exception came from that shows a non-Pythonic
174 # JVM exception message.
--> 175 raise converted from None
176 else:
177 raise
Did you have a chance to try @frithjof_v 's solution? Does it work or not? Or did you find any other solution? Let us know if you have any questions.
Best Regards,
Jing
ChatGPT helped me with this. The main item in the code is this:
df_lower = df.toDF(*[c.lower() for c in df.columns])
ChatGPT created this example:
# Example DataFrame
data = [("John", 28), ("Alice", 23), ("Bob", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
# Convert all column names to lowercase
df_lower = df.toDF(*[c.lower() for c in df.columns])
# Show the resulting DataFrame
df_lower.show()
Explanation:
This will rename all columns in the DataFrame to their lowercase equivalents.
Here is a StackOverflow thread mentioning the same method. Perhaps ChatGPT learned from it:
https://stackoverflow.com/questions/43005744/convert-columns-of-pyspark-data-frame-to-lowercase
So in your case, the code could be like this:
import sempy.fabric as fabric
#Dev
#workspace_ID = "XXXXXX"
#Stage
#workspace_ID = "XXXXXX"
#Prod
workspace_ID = "XXXXXX"
workspace_Safe_ID = workspace_ID.replace("-","_")
df_table = fabric.read_table("PSG_CTD_GDS_OMNI_Production", "_Environments",workspace=workspace_ID)
df_lower = df_table.toDF(*[c.lower() for c in df_table.columns])
df_lower.to_lakehouse_table("Environments", "overwrite")
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
Check out the November 2024 Fabric update to learn about new features.