Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
GulianiG
Helper II
Helper II

Should a text column having only 2 values be a candidate for binary data type?

Should a text column having only 2 values be a candidate for binary data type?

I have a column named Ee Status- it has 2 values only Full-Time & Part-Time
My query is should I get source files to call Full-Time as 1 and Part-Time as 0
That ways I can change this column from text to binary saving storage space
Is there any benefit in this approach?

1 ACCEPTED SOLUTION
Greg_Deckler
Super User
Super User

@GulianiG I don't believe switching to binary will work. Switching to True/False should if you first convert the column to a whole number and then change it's type to True/False. The benefits of this however are probably near zero because of columnar compression within tabular analysis services. The column is going to get compressed regardless if it originally contained text or whole number, etc. And since in all cases it only contains 2 values, it's unlikely you would get any benefit.

 

In fact, I tested this with a 1M row dataset. I had a column that contained text "Full Time"/"Part Time". The dataset size was 11,662 KB. I then converted the column to 0's and 1's, still 11,662 KB. I then converted this to True/False. Dataset size remained at 11,662 KB. So, you would just be adding steps for no good reason and losing the friendly name for the attribute to boot. 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

2 REPLIES 2
Tom-Garvin-2023
Advocate I
Advocate I

I certainly agree with @Greg_Deckler 's analysis, no significant savings in "space". It may be relevant in a source system where the values are explicity stored and repeated in each row. But there are other considerations beyond sheer performance.

 

I assume that "binary" is referring to a bit column (as in SQL Server) or a Yes/No column in other contexts. The question is whether you can really apply two-valued logic, or three-valued if null is in play. Employee status of Living or Deceased seems to be a clear binary distinction. Employment status might eventually be a lower-level of granularity in many instances. For example, Full-Time and Not-Full-Time pretty much covers your use-case. But there are cases where less than F/T has legal and financial implications. There may be a threshold where benefits and unemployment coverage apply. Similarly, a "temp" might be working 40 hours/week, usually considered F/T, except for benefits. In think in this case you really want to create a Dimension which reflects the spectrum of employment status. Use a surrogate key in your fact table. Column compression will continue to be highly efficient, but your application may have much greater information and analytical content. 

Greg_Deckler
Super User
Super User

@GulianiG I don't believe switching to binary will work. Switching to True/False should if you first convert the column to a whole number and then change it's type to True/False. The benefits of this however are probably near zero because of columnar compression within tabular analysis services. The column is going to get compressed regardless if it originally contained text or whole number, etc. And since in all cases it only contains 2 values, it's unlikely you would get any benefit.

 

In fact, I tested this with a 1M row dataset. I had a column that contained text "Full Time"/"Part Time". The dataset size was 11,662 KB. I then converted the column to 0's and 1's, still 11,662 KB. I then converted this to True/False. Dataset size remained at 11,662 KB. So, you would just be adding steps for no good reason and losing the friendly name for the attribute to boot. 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.