Solved: Re: Lakehouse Table Generate Create Table

JonBFabric · ‎12-10-2025

Hi,

I maintain server lakehouses, and due to issues with the deployment pipelines tend to apply schema changes through script, and to retain the data in the table use the following process:

Create a new table with the desired structure
Insert all records from oringinal table in to new table
After checking rounts drop the original table and rename the new table.

For Step 1, to be able base the new table on the existing table I need to be able to identify existing data types for all existing fields, and it appears there is no reliable way of doing this. The main issue relates to Char and Varchar fields, as there appears to be no current mothod for determining the appropriate character length.

I have tried various methods on the lakehouse, but those always show the fields to be String, with no maximum size.

I have also tried querying INFORMATION_SCHEMA COLUMNS through the sql endpoint, and the problem here is that value for CHARACTER_MAXIMUM_LENGTH appears to be 4 times the actual defined maximum number of characters, up to a maximum of 8000. I.e. A character length of 100 is shown as 400, 1000 is shown as 4000, 2000 and higher are always shown as 8000.

Does anyone know of a reliable way of generating a create statement for an existing lakehouse table?

JonBFabric · ‎12-15-2025

One final update on this.

The logfile containing the schema is not necessarily the most recent, and there is not necessarily only one version of the schema. There is a schema associated with every modification made to the table structure, be that the original creation or subsequent alterations. Consequently, the logfile we need is the most recent with a schema.

from delta.tables import DeltaTable

import json

# Note that the table name must be lowercase

table_path = '<Path_To_Table>'

# Identify log files

log_dir = f"{table_path}/_delta_log"

files = [f.path for f in notebookutils.fs.ls(log_dir) if f.name.endswith(".json")]

# Identify log files with a schema

filesWithSchema = []

for file in sorted(files, reverse=True) :

content = notebookutils.fs.head(file, 5000000)

JSONdocs = content.split('\n')

for doc in JSONdocs:

if 'schemaString' in doc:

filesWithSchema.append(file)

# Load the header for the latest log file containing a schema

latest = sorted(filesWithSchema, reverse=True)[0]

content = notebookutils.fs.head(latest, 5000000)

# Extract the schema

JSONdocs = content.split('\n')

for doc in JSONdocs:

if 'schemaString' in doc:

schemaString = json.loads(doc).get("metaData", {}).get("schemaString")

# Extract Field Metadata

FieldList = []

OrdinalPosition = 0

for field in json.loads(schemaString).get("fields") :

OrdinalPosition += 1

FieldDetails = {}

FieldDetails['FieldName'] = field.get("name")

FieldDetails['Nullable'] = field.get("nullable")

FieldDetails['OrdinalPosition'] = OrdinalPosition

match field.get("type").split('(')[0]:

case 'string':

FieldDetails['SQLType'] = field.get("metadata").get("__CHAR_VARCHAR_TYPE_STRING")

case 'timestamp':

FieldDetails['SQLType'] = 'timestamp'

case 'date':

FieldDetails['SQLType'] = 'date'

case 'integer':

FieldDetails['SQLType'] = 'int'

case 'short':

FieldDetails['SQLType'] = 'smallint'

case 'long':

FieldDetails['SQLType'] = 'bigint'

case 'decimal':

FieldDetails['SQLType'] = field.get("type")

case 'boolean':

FieldDetails['SQLType'] = 'boolean'

FieldList.append(FieldDetails)

display(FieldList)

View solution in original post

Zanqueta · ‎12-10-2025

Hi @JonBFabric,

✅ Why this happens

Delta tables in Fabric do not enforce fixed-length character types like CHAR(n) or VARCHAR(n); they store text as variable-length strings.
The SQL endpoint maps these to STRING for compatibility, so the original length constraint is not preserved.

✅ Why CHARACTER_MAXIMUM_LENGTH shows 4x

The SQL endpoint assumes UTF-16 encoding internally, so the reported length is multiplied by 4.
This is a known limitation and does not affect actual storage or query behaviour.

Official References:

What is a lakehouse? - Microsoft Fabric | Microsoft Learn

Table utility commands | Delta Lake

If this response was helpful in any way, I’d gladly accept a 👍much like the joy of seeing a DAX measure work first time without needing another FILTER.

Please mark it as the correct solution. It helps other community members find their way faster (and saves them from another endless loop 🌀.

JonBFabric · ‎12-10-2025

Thanks. Great to get the explanation as to what is happening and why. But going back to the original question...

Is it possible to identify the SQL statement used to originally create a table? I'm getting the impression that the answer is no. And given that the maximum record length that can be handled by the SQL endpoint is 8060 bytes, those character limits are crucial and need to be tightly controlled.

v-hashadapu · ‎12-11-2025

Hi @JonBFabric , Thank you for reaching out to the Microsoft Community Forum.

Fabric lakehouse cannot return the original CREATE TABLE statement because that information is never stored. Delta Lake only keeps a structural schema in its transaction log and all string columns are recorded simply as string without any notion of the original CHAR(n) or VARCHAR(n) definitions. Because the lakehouse storage layer does not preserve fixed length constraints, there is no system level metadata you can query later to recover them.

The SQL analytics endpoint also cannot help because it exposes a compatibility projection rather than the real underlying schema. Its inflated character lengths, including the 4× multiplier and the cap at 8000 are generated by the endpoint itself and do not represent the actual table definition or any original DDL. That surface is designed for querying, not schema reconstruction and therefore does not retain the information you are looking for.

Given these constraints, there is no reliable way to extract the exact SQL used to create a lakehouse table after the fact. If those character limits matter for downstream SQL workloads, the only viable approach is to rebuild the DDL by measuring actual data lengths in the table and defining controlled column sizes going forward. For future tables, the only dependable method is to version control the DDL at creation time or store it explicitly as metadata, because the platform does not preserve it automatically.

What Is Data Warehousing in Microsoft Fabric? - Microsoft Fabric | Microsoft Learn

What is a lakehouse? - Microsoft Fabric | Microsoft Learn

Data Types in Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn

What is the SQL analytics endpoint for a lakehouse? - Microsoft Fabric | Microsoft Learn

Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn

JonBFabric · ‎12-11-2025

Whilst I understand everything that you are saying, I would like to describe another scenario which suggests that some of the above is not actually correct.

Using only SparkSQL in a notebook I have created a lakehouse table with a single varchar(10) field. If I try to insert any value with more than 10 characters I get the following error:

[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull('String) OR (length('String) <= 10)).

Obviously something in the spark engine or the delta table metadata ia storing the size restriction

v-hashadapu · ‎12-11-2025

Hi @JonBFabric , Thank you for reaching out to the Microsoft Community Forum.

Yes, Spark/Delta can record and enforce CHAR(n) / VARCHAR(n) when a table is created through Spark or other Delta-aware APIs; the engine stores that constraint in the Delta metadata and will reject writes that exceed the declared width (hence the DELTA_EXCEED_CHAR_VARCHAR_LIMIT error). The authoritative place to get that declaration is the Spark/Delta surface (for example, run SHOW CREATE TABLE or DESCRIBE TABLE EXTENDED in a Spark notebook or read the Delta transaction log/Delta Table API). Those commands return the DDL/metadata that Spark/Delta actually enforces.

Do not rely on the Fabric SQL analytics endpoint or INFORMATION_SCHEMA alone to recover declared widths. Those surfaces present a T-SQL compatibility projection that can inflate, cap or otherwise transform reported lengths (the 4×/8000 behaviour you saw) and therefore are not a trustworthy source of the original Spark declared sizes. If you cannot run Spark against the table, your fallback is to inspect the _delta_log or compute observed max character/byte lengths and reconstruct conservative VARCHAR widths and for long term safety you must version-control the DDL or persist it as table metadata at creation time.

JonBFabric · ‎12-12-2025

Good Morning,

I'm not looking for a way to access the metadata through the SQL endpoint, by using INFORMATION_SCHEMA or any other objects/functions, I just used it as an example of the only place that displayed anything other than string. I have already tried both SHOW CREATE TABLE and DESCRIBE TABLE EXTENDED, the first is not supported by Fabric ([DELTA_OPERATION_NOT_ALLOWED] Operation not allowed: `SHOW CREATE TABLE` is not supported for Delta tables) and the 2nd only shows string.

Please could you provide me with an example of how to access the delta metadata responsible for enforcing the DELTA_EXCEED_CHAR_VARCHAR_LIMIT error. It doesn't need to be pretty.

Thanks again

v-hashadapu · ‎12-12-2025

Hi @JonBFabric , Thank you for reaching out to the Microsoft Community Forum.

If Fabric is blocking SHOW CREATE TABLE and DESCRIBE TABLE EXTENDED only shows string, the next step is to read the Delta metadata directly through Spark, because the enforcement you are seeing (DELTA_EXCEED_CHAR_VARCHAR_LIMIT) comes from the schema stored in the Delta transaction log, not from the SQL endpoint. The length constraint is kept in the Delta log under metaData.schemaString and Spark/Delta will surface it correctly when you query the table through the Delta APIs. The simplest approach is to use a Spark notebook and load the table with DeltaTable.forPath(...).toDF(), which will show VarcharType(n) in the schema if the table was created with varchar(n). If that surface is not available, you can read the latest commit JSON in the _delta_log folder and print the metaData.schemaString field; that text contains the exact schema Spark is enforcing, including declared lengths.

Spark example you can run in a Fabric notebook to retrieve the metadata responsible for the enforcement:

from delta.tables import DeltaTable import json
table_path = "/lakehouses/<your-lakehouse>/Tables/<your-table>" # update this
dt = DeltaTable.forPath(spark, table_path) print(dt.toDF().schema) # shows VarcharType(n) if declared
log_dir = f"{table_path}/_delta_log" files = [f.path for f in dbutils.fs.ls(log_dir) if f.name.endswith(".json")] latest = sorted(files)[-1]
content = dbutils.fs.head(latest, 500000) commit = json.loads(content) print(commit.get("metaData", {}).get("schemaString"))

This will show you the exact schema stored in Delta and the constraint that triggers the length violation error. If the table is large and uses checkpoint parquet files, the same field appears in the checkpoint’s metaData struct. In short, the SQL endpoint cannot return the declared widths, but Spark and the Delta log always can.

Explore the lakehouse data with a notebook - Microsoft Fabric | Microsoft Learn

Data Types in Fabric Data Warehouse - Microsoft Fabric | Microsoft Learn

Delta Lake Logs in Warehouse - Microsoft Fabric | Microsoft Learn

What is a lakehouse? - Microsoft Fabric | Microsoft Learn

JonBFabric · ‎12-12-2025

Thanks.

This didn't quite work out of the box, possibly because it was originally written for DataBricks rather than Fabric, but I have got it working. I will explan the differences as I go:

from delta.tables import DeltaTable

import json

table_path = "<Path_To_Table>" # update this

dt = DeltaTable.forPath(spark, table_path)

print(dt.toDF().schema) # shows VarcharType(n) if declared

.schema shows only string as type, so the last 2 lines can be removed.

log_dir = f"{table_path}/_delta_log"

files = [f.path for f in notebookutils.fs.ls(log_dir) if f.name.endswith(".json")]

latest = sorted(files)[-1]

content = notebookutils.fs.head(latest, 5000000)

Note that dbutils is now replaced by notebookutils.

content can not however be read as JSON, as it is infact 3 JSON documents seperated by

'\n', and not just 1, and the document which contains the field metadata is the 2nd.

schemaString = json.loads(content.split('\n')[1]).get("metaData", {}).get("schemaString")

schemaString is actually an embedded JSON document held as a string, and so has to be converted also. The following snippet then prints the field name and the actual sql type.

for field in json.loads(schemaString).get("fields"):

print("FieldName:", field.get("name"), ", Type:", field.get("metadata").get("__CHAR_VARCHAR_TYPE_STRING"))

Certainly not a finished article, but it gives me what I need to build around.

Thanks for your help and patience.

v-hashadapu · ‎12-14-2025

Hi @JonBFabric , Thanks for the update and the insights on how to solve this issue. We really appreciate it.

If you have any queries, please feel free to create a new post, we are always happy to help.

JonBFabric · ‎12-15-2025

One final update on this.

The logfile containing the schema is not necessarily the most recent, and there is not necessarily only one version of the schema. There is a schema associated with every modification made to the table structure, be that the original creation or subsequent alterations. Consequently, the logfile we need is the most recent with a schema.

from delta.tables import DeltaTable

import json

# Note that the table name must be lowercase

table_path = '<Path_To_Table>'

# Identify log files

log_dir = f"{table_path}/_delta_log"

files = [f.path for f in notebookutils.fs.ls(log_dir) if f.name.endswith(".json")]

# Identify log files with a schema

filesWithSchema = []

for file in sorted(files, reverse=True) :

content = notebookutils.fs.head(file, 5000000)

JSONdocs = content.split('\n')

for doc in JSONdocs:

if 'schemaString' in doc:

filesWithSchema.append(file)

# Load the header for the latest log file containing a schema

latest = sorted(filesWithSchema, reverse=True)[0]

content = notebookutils.fs.head(latest, 5000000)

# Extract the schema

JSONdocs = content.split('\n')

for doc in JSONdocs:

if 'schemaString' in doc:

schemaString = json.loads(doc).get("metaData", {}).get("schemaString")

# Extract Field Metadata

FieldList = []

OrdinalPosition = 0

for field in json.loads(schemaString).get("fields") :

OrdinalPosition += 1

FieldDetails = {}

FieldDetails['FieldName'] = field.get("name")

FieldDetails['Nullable'] = field.get("nullable")

FieldDetails['OrdinalPosition'] = OrdinalPosition

match field.get("type").split('(')[0]:

case 'string':

FieldDetails['SQLType'] = field.get("metadata").get("__CHAR_VARCHAR_TYPE_STRING")

case 'timestamp':

FieldDetails['SQLType'] = 'timestamp'

case 'date':

FieldDetails['SQLType'] = 'date'

case 'integer':

FieldDetails['SQLType'] = 'int'

case 'short':

FieldDetails['SQLType'] = 'smallint'

case 'long':

FieldDetails['SQLType'] = 'bigint'

case 'decimal':

FieldDetails['SQLType'] = field.get("type")

case 'boolean':

FieldDetails['SQLType'] = 'boolean'

FieldList.append(FieldDetails)

display(FieldList)

Lakehouse Table Generate Create Table

✅ Why this happens

✅ Why CHARACTER_MAXIMUM_LENGTH shows 4x

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Lakehouse Table Generate Create Table

✅ Why this happens

✅ Why CHARACTER_MAXIMUM_LENGTH shows 4x

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026