Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
fredforest
New Member

Data types in Delta

In the olden days of data warehousing, a developper/modeler would be careful to optimize data types (especially on large fact tables) for a variety of reasons.

I'm wondering if there's still a reason to care about this is a Fabric world. 

So... any reasons to keep learning about data type optimization or should we just drop this whole topic altogether?

1 ACCEPTED SOLUTION

Well, I did offer some advice as to why we still want to optimise data types.  The engine will provide the resources necesarry to process data operations based on data types and precision.  The bigger the data type and precision, the bigger the resources are used to process.  If you keep your data type sizes to a minimum, this will help the engine only provision what it needs.  

 

Also, if we define our tables with data type sizes in Delta, it will stick to the data type size (e.g. cannot declare a string as 100 and then insert 200 chars in)

View solution in original post

8 REPLIES 8
v-cboorla-msft
Community Support
Community Support

Hi @fredforest 

 

Glad that your query got resolved.

Please continue using Fabric Community for any help regarding your queries.

v-cboorla-msft
Community Support
Community Support

Hi @fredforest 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.


Thanks

Hi,

 

No, beside @AndyDDC who shared the same question I had, I have not had any further information on the topic sadly.

Hi @fredforest 

In addition to @AndyDDC, Microsoft Fabric is a unified platform that allows you to store, process, and analyze data using various compute engines, such as Power BI, SQL, Spark, and others. All these engines can access the same data stored in a central lakehouse, which uses the Delta Lake table format. Delta Lake is an open-source storage layer that supports ACID transactions, schema enforcement, and time travel on top of Parquet files.

When you create a table in Microsoft Fabric, you can specify the data types for each column using T-SQL syntax. However, not all T-SQL data types are supported by Parquet, and some of them may have different precisions or representations. Therefore, Microsoft Fabric will map the T-SQL data types to the corresponding Parquet types under the hood and apply some optimizations to improve the performance and compression of the data.

Microsoft Fabric offers significant optimizations, data type optimization remains a valuable practice for storage efficiency, query performance, data integrity, compatibility, and best practices. Understanding these implications will enable you to make informed decisions for your data warehouse design and ensure optimal performance and data quality within Microsoft Fabric.

For more details please refer: Link

I hope this helps. Please do let us know if you have any further questions.

Hi @fredforest 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.


Thanks

Hi @fredforest 


We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread.


Thanks

Well, I did offer some advice as to why we still want to optimise data types.  The engine will provide the resources necesarry to process data operations based on data types and precision.  The bigger the data type and precision, the bigger the resources are used to process.  If you keep your data type sizes to a minimum, this will help the engine only provision what it needs.  

 

Also, if we define our tables with data type sizes in Delta, it will stick to the data type size (e.g. cannot declare a string as 100 and then insert 200 chars in)

AndyDDC
Solution Sage
Solution Sage

That's a great question @fredforest personally I'm going to keep optimising data types even in this new world of delta and parquet. One of the best practices I worked with in synapse serverless (which the lakehouse endpoint and warehouse endpoint are built from) was optimising data types. Reason for this was that the data types of the columns being used in a workload would directly impact the amount of cluster resources was given to process the workload. The larger the data type, the more resources were provided. Also with joining data together across different delta tables, still can't beat integers.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

MayFabricCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.