Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
When uploading CSV files to Microsoft Fabric's Open Mirroring landing zone with a correctly configured _metadata.json (specifying types like datetime2 and decimal(18,2)), why are columns consistently being created as int or varchar in the mirrored database, even when the source CSV data strictly conforms to the declared types? Are there known limitations with type inference for delimited text in Open Mirroring beyond _metadata.json specifications?
Are there specific, unstated requirements or known limitations for type inference and conversion from delimited text files in Fabric's Open Mirroring that go beyond the _metadata.json specification, or are there additional properties we should be using within _metadata.json to force these specific non-string/non-integer data types?
Solved! Go to Solution.
Hi @fabricpoweruser,
The varchar columns in the mirrored table that you see even though you specified datetime2 and decimal(18,2) in _metadata.json because Open Mirroring for CSV uses basic type inference and does not enforce complex SQL types. This is a known and by design behaviour as per this documentation acknowledged by Microsoft: open-mirroring-landing-zone-format
To avoid this, When working with Microsoft Fabric’s Open Mirroring and you require exact schema mapping particularly for precision data types like decimal(18,2) or datetime2 the most reliable approach is to use Parquet or Delta file formats. These formats support embedded schema definitions, allowing the Fabric ingestion engine to preserve the declared types accurately without depending on type inference.
When working with CSV files, you can still define your schema using the _metadata.json file, but it’s important to understand that these definitions are interpreted as best-effort suggestions rather than strict enforcement. Due to this, you should be prepared to post-process the ingested tables by casting the columns to their correct data types using SQL, or by creating views that apply the appropriate type conversions after the data is loaded.
To align more closely with how Fabric handles delimited text, it’s advisable to use only the core set of supported data types in your schema definitions. For example, you should use Double instead of decimal(18,2) and DateTime instead of datetime2. These types are more consistently interpreted during ingestion, reducing the likelihood of columns defaulting to generic types like varchar.
We understand this might not meet every use case. If schema strictness for CSVs is critical to your workflow, we encourage submitting feedback through the Microsoft Fabric Ideas portal so it can be prioritized by the product team.
Thanks,
Prashanth
MS Fabric community support
Hi @fabricpoweruser,
The varchar columns in the mirrored table that you see even though you specified datetime2 and decimal(18,2) in _metadata.json because Open Mirroring for CSV uses basic type inference and does not enforce complex SQL types. This is a known and by design behaviour as per this documentation acknowledged by Microsoft: open-mirroring-landing-zone-format
To avoid this, When working with Microsoft Fabric’s Open Mirroring and you require exact schema mapping particularly for precision data types like decimal(18,2) or datetime2 the most reliable approach is to use Parquet or Delta file formats. These formats support embedded schema definitions, allowing the Fabric ingestion engine to preserve the declared types accurately without depending on type inference.
When working with CSV files, you can still define your schema using the _metadata.json file, but it’s important to understand that these definitions are interpreted as best-effort suggestions rather than strict enforcement. Due to this, you should be prepared to post-process the ingested tables by casting the columns to their correct data types using SQL, or by creating views that apply the appropriate type conversions after the data is loaded.
To align more closely with how Fabric handles delimited text, it’s advisable to use only the core set of supported data types in your schema definitions. For example, you should use Double instead of decimal(18,2) and DateTime instead of datetime2. These types are more consistently interpreted during ingestion, reducing the likelihood of columns defaulting to generic types like varchar.
We understand this might not meet every use case. If schema strictness for CSVs is critical to your workflow, we encourage submitting feedback through the Microsoft Fabric Ideas portal so it can be prioritized by the product team.
Thanks,
Prashanth
MS Fabric community support
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.