Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Learn more
why? if I have Chinese characters, I need that, else VARCHAR is very not optimized.
What are the workarounds?
Solved! Go to Solution.
Hi @tan_thiamhuat , Thank you for reaching out to the Microsoft Community Forum.
You need Unicode to store Chinese characters. If your engine supports UTF-8 collations, the most space-efficient option is to use VARCHAR with a _UTF8 collation. This lets you store full Unicode while keeping ASCII at 1 byte per character and Chinese at about 3 bytes. If UTF-8 collations aren’t supported in your environment, use NVARCHAR for any column that might store Chinese. It’s fully Unicode-compliant, based on UTF-16 encoding and safely handles all major language scripts. To support emoji or rare characters outside the basic Unicode range such as extended CJK ideographs, use an _SC collation, which enables proper storage, sorting and comparison of those symbols.
When using UTF-8, watch index byte limits since character size varies and ensure your client drivers fully support UTF-8 collations. If you stick with NVARCHAR, limit it to columns that truly require it, consider applying row compression to reduce storage overhead and avoid storing text as bytes (VARBINARY) or relying on legacy code pages, as these approaches break sorting, searching and proper collation behaviour.
In Microsoft Fabric, Lakehouse (Spark) strings are UTF-8 by default, so mixed-language text works out of the box, no need to choose between VARCHAR and NVARCHAR. Just make sure your ingested files are properly UTF-8 encoded. In Fabric Warehouse, treat it like Azure SQL, use a _UTF8 collation at the database or column level if it’s available in your tenant, otherwise, fall back to NVARCHAR. In Power BI, Import and Direct Lake models use Unicode and dictionary encoding, so the source type doesn’t materially affect model size. However, in DirectQuery, sorting and filtering follow the source collation, so it's critical to configure the correct Unicode or UTF-8 collation at the source.
nchar and nvarchar (Transact-SQL) - SQL Server | Microsoft Learn
char and varchar (Transact-SQL) - SQL Server | Microsoft Learn
COLLATE (Transact-SQL) - SQL Server | Microsoft Learn
Data compression - SQL Server | Microsoft Learn
Data types (Transact-SQL) - SQL Server | Microsoft Learn
Hi @tan_thiamhuat , Thank you for reaching out to the Microsoft Community Forum.
You need Unicode to store Chinese characters. If your engine supports UTF-8 collations, the most space-efficient option is to use VARCHAR with a _UTF8 collation. This lets you store full Unicode while keeping ASCII at 1 byte per character and Chinese at about 3 bytes. If UTF-8 collations aren’t supported in your environment, use NVARCHAR for any column that might store Chinese. It’s fully Unicode-compliant, based on UTF-16 encoding and safely handles all major language scripts. To support emoji or rare characters outside the basic Unicode range such as extended CJK ideographs, use an _SC collation, which enables proper storage, sorting and comparison of those symbols.
When using UTF-8, watch index byte limits since character size varies and ensure your client drivers fully support UTF-8 collations. If you stick with NVARCHAR, limit it to columns that truly require it, consider applying row compression to reduce storage overhead and avoid storing text as bytes (VARBINARY) or relying on legacy code pages, as these approaches break sorting, searching and proper collation behaviour.
In Microsoft Fabric, Lakehouse (Spark) strings are UTF-8 by default, so mixed-language text works out of the box, no need to choose between VARCHAR and NVARCHAR. Just make sure your ingested files are properly UTF-8 encoded. In Fabric Warehouse, treat it like Azure SQL, use a _UTF8 collation at the database or column level if it’s available in your tenant, otherwise, fall back to NVARCHAR. In Power BI, Import and Direct Lake models use Unicode and dictionary encoding, so the source type doesn’t materially affect model size. However, in DirectQuery, sorting and filtering follow the source collation, so it's critical to configure the correct Unicode or UTF-8 collation at the source.
nchar and nvarchar (Transact-SQL) - SQL Server | Microsoft Learn
char and varchar (Transact-SQL) - SQL Server | Microsoft Learn
COLLATE (Transact-SQL) - SQL Server | Microsoft Learn
Data compression - SQL Server | Microsoft Learn
Data types (Transact-SQL) - SQL Server | Microsoft Learn
We have to understand the difference between Linux Operating System and Windows System. Previously, NVARCHAR supported in SQL Server on premises data. With the new data technology, Microsoft has already moved to Linux Operating Systems and there are some of the NVARCHAR is not supported in Fabric Warehouse, they are instead called STRING data type (Represents character string values) in Delta Lake ( ABC ). Instead you should ask everyone to move to Notebooks.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Fabric update to learn about new features.
| User | Count |
|---|---|
| 15 | |
| 5 | |
| 3 | |
| 2 | |
| 2 |