Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
GulianiG
Helper II
Helper II

Should a text column having only 2 values be a candidate for binary data type?

Should a text column having only 2 values be a candidate for binary data type?

I have a column named Ee Status- it has 2 values only Full-Time & Part-Time
My query is should I get source files to call Full-Time as 1 and Part-Time as 0
That ways I can change this column from text to binary saving storage space
Is there any benefit in this approach?

1 ACCEPTED SOLUTION
Greg_Deckler
Super User
Super User

@GulianiG I don't believe switching to binary will work. Switching to True/False should if you first convert the column to a whole number and then change it's type to True/False. The benefits of this however are probably near zero because of columnar compression within tabular analysis services. The column is going to get compressed regardless if it originally contained text or whole number, etc. And since in all cases it only contains 2 values, it's unlikely you would get any benefit.

 

In fact, I tested this with a 1M row dataset. I had a column that contained text "Full Time"/"Part Time". The dataset size was 11,662 KB. I then converted the column to 0's and 1's, still 11,662 KB. I then converted this to True/False. Dataset size remained at 11,662 KB. So, you would just be adding steps for no good reason and losing the friendly name for the attribute to boot. 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

2 REPLIES 2
Tom-Garvin-2023
Advocate I
Advocate I

I certainly agree with @Greg_Deckler 's analysis, no significant savings in "space". It may be relevant in a source system where the values are explicity stored and repeated in each row. But there are other considerations beyond sheer performance.

 

I assume that "binary" is referring to a bit column (as in SQL Server) or a Yes/No column in other contexts. The question is whether you can really apply two-valued logic, or three-valued if null is in play. Employee status of Living or Deceased seems to be a clear binary distinction. Employment status might eventually be a lower-level of granularity in many instances. For example, Full-Time and Not-Full-Time pretty much covers your use-case. But there are cases where less than F/T has legal and financial implications. There may be a threshold where benefits and unemployment coverage apply. Similarly, a "temp" might be working 40 hours/week, usually considered F/T, except for benefits. In think in this case you really want to create a Dimension which reflects the spectrum of employment status. Use a surrogate key in your fact table. Column compression will continue to be highly efficient, but your application may have much greater information and analytical content. 

Greg_Deckler
Super User
Super User

@GulianiG I don't believe switching to binary will work. Switching to True/False should if you first convert the column to a whole number and then change it's type to True/False. The benefits of this however are probably near zero because of columnar compression within tabular analysis services. The column is going to get compressed regardless if it originally contained text or whole number, etc. And since in all cases it only contains 2 values, it's unlikely you would get any benefit.

 

In fact, I tested this with a 1M row dataset. I had a column that contained text "Full Time"/"Part Time". The dataset size was 11,662 KB. I then converted the column to 0's and 1's, still 11,662 KB. I then converted this to True/False. Dataset size remained at 11,662 KB. So, you would just be adding steps for no good reason and losing the friendly name for the attribute to boot. 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.