Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Should a text column having only 2 values be a candidate for binary data type?
I have a column named Ee Status- it has 2 values only Full-Time & Part-Time
My query is should I get source files to call Full-Time as 1 and Part-Time as 0
That ways I can change this column from text to binary saving storage space
Is there any benefit in this approach?
Solved! Go to Solution.
@GulianiG I don't believe switching to binary will work. Switching to True/False should if you first convert the column to a whole number and then change it's type to True/False. The benefits of this however are probably near zero because of columnar compression within tabular analysis services. The column is going to get compressed regardless if it originally contained text or whole number, etc. And since in all cases it only contains 2 values, it's unlikely you would get any benefit.
In fact, I tested this with a 1M row dataset. I had a column that contained text "Full Time"/"Part Time". The dataset size was 11,662 KB. I then converted the column to 0's and 1's, still 11,662 KB. I then converted this to True/False. Dataset size remained at 11,662 KB. So, you would just be adding steps for no good reason and losing the friendly name for the attribute to boot.
I certainly agree with @Greg_Deckler 's analysis, no significant savings in "space". It may be relevant in a source system where the values are explicity stored and repeated in each row. But there are other considerations beyond sheer performance.
I assume that "binary" is referring to a bit column (as in SQL Server) or a Yes/No column in other contexts. The question is whether you can really apply two-valued logic, or three-valued if null is in play. Employee status of Living or Deceased seems to be a clear binary distinction. Employment status might eventually be a lower-level of granularity in many instances. For example, Full-Time and Not-Full-Time pretty much covers your use-case. But there are cases where less than F/T has legal and financial implications. There may be a threshold where benefits and unemployment coverage apply. Similarly, a "temp" might be working 40 hours/week, usually considered F/T, except for benefits. In think in this case you really want to create a Dimension which reflects the spectrum of employment status. Use a surrogate key in your fact table. Column compression will continue to be highly efficient, but your application may have much greater information and analytical content.
@GulianiG I don't believe switching to binary will work. Switching to True/False should if you first convert the column to a whole number and then change it's type to True/False. The benefits of this however are probably near zero because of columnar compression within tabular analysis services. The column is going to get compressed regardless if it originally contained text or whole number, etc. And since in all cases it only contains 2 values, it's unlikely you would get any benefit.
In fact, I tested this with a 1M row dataset. I had a column that contained text "Full Time"/"Part Time". The dataset size was 11,662 KB. I then converted the column to 0's and 1's, still 11,662 KB. I then converted this to True/False. Dataset size remained at 11,662 KB. So, you would just be adding steps for no good reason and losing the friendly name for the attribute to boot.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
113 | |
94 | |
88 | |
35 | |
32 |
User | Count |
---|---|
153 | |
101 | |
82 | |
63 | |
52 |