Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Geradav

Data Types, Data Formats, and Data Category

What's your type?

For a beginner in Power BI and in business intelligence in general, it is not easy to understand data types and differentiate them from data format. And then comes data category... So, what's all that about, really?


It's true that in spreadsheet applications like Excel, data type is not really of a bother to the user, as there is no functionality to define the type of the data present in cells. We just enter values and it's all fine (somehow). In fact, I only started to bother about data types when starting to use Power Query, Power BI and Power Pivot (data modelling tool in Excel).


Power Query is less friendly with typeless (or rather untyped) columns. Try creating a dataflow in the Power BI Service without specifying a data type for each column, and you will get the following message.

fFdrnujrRxGIum1Fdgdn_temp-M

 

Not cool, right? What's wrong with my data? It all looks good to me after all.

So, the Power Query team tried to make it easy by automatically detecting column types from the imported data.

0lSHgaipTj2AUbTQmzy8_temp-M

Though, sometimes it is more disturbing than anything else and you end up deleting that step Power Query creates for you. After all, it's my data and I'll give it the type I want.

8NPahQMS7CNeexPDPwPY_temp-M

Back in Excel you can only disguise your data, playing with formatting options. Formatting is just a way of altering your data appearance, without changing anything of its value.

T72hl1LQhe1GIh4dBrZp_temp-M

How does Excel represent data types?

XBUcNRGkSU69woy7gszd_temp

In Excel, data are naturally aligned based on their type. It gives a visual clue of the data type we are dealing with.

- Left aligned => text
- Right aligned => numbers and date and time
- Center aligned => Boolean (logical, true/false)

NOTE: It is a good practice to NOT alter the natural alignment of data in Excel.
39kfkcerQlCUNkAhm33n_temp

Notice, here I have altered the format of numbers, forcing them into text format, and thus naturally aligning to the left. Though I haven't altered its type. Beneath the disguise, they remain numbers, as we can see at the bottom that aggregation like SUM() and AVERAGE() are well performed.


Again, in Excel, data type is not well emphasized and enforced.

Why bother?

So, ok, in Excel we didn't really care about types, and Power Query takes care about it (in its own way), so, really, why bother?
Well, because you want control over your data, yep, total control. So, you get to know about data types.
And because you don't want to end up with errors like this in your data set. Your analysis would be affected.

Q4py35hATAqrADkDG9av_temp

And just in case you try to load your data into the data model of Excel...

7od9LTRnRVmwdIoZFL3f_temp

And, actually, when looking at the data model in Power Pivot, well Excel (or Power Pivot, who knows...) got rid of your errors.

q3JLJRTqrdOM3ktDklAB_temp

Furthermore, data type errors are common, and you will often find an error message similar to this one

nfQysx92QmSeIk2LehyE_temp

In Excel, adding a number to a date would have worked without any problem. So, when used to Excel and its way of handling data, you don't really care about types, but with Power Query, you have to very much care about what type it is.

So then what are data types?

There are several types of data, but let's say the most common ones are:

  • Text (or sometimes referred to as string)

  • Number (numeric values)

  • Date and time

  • Boolean, or logical (also referred to as true/false types)

Data type is very important, and most often you are required to declare the data type you are working with.

  • Text data type is very simple and straight forward. It's just plain text. Nothing much to say.

  • With number data types, it becomes more interesting because with numbers you can use it for aggregation and extract meaningful information. You can sum it, add or subtract a value, calculate averages, round it up or down, and many more things can be done.

  • With date and time data type, it's a wealth of information, because you can extract the time (hours and minutes), the date alone, the month, the year, the quarter, the day, the week number, the day of the week... And, whether in Excel, in Power BI or in any other tool you always find date and time specific functions (MONTH(), YEAR(), WEEKDAY(), WEEKNUM()...)

    You can add or subtract days, months, years, hours to a date.
    In fact, a date is just a number disguised into a date and time. If you change the format of a date in Excel, you will get a bunch of meaningless digits.
    The numbers to the right of the decimal represent the time. The numbers to the left of the decimal represent the number of days starting from midnight of the 1 January of 1900. And today is the 43,969 after the start of the 1st day of the first month of 1900.

    A lot of information can be extracted from a date and time data type.

    NOTE: Dates can be formatted differently depending on your regional settings, American standards are different from European standards for instance. => And this is often source of errors and maybe source of misinterpretation.

  • For Boolean values, well, those are pretty simple, it's either TRUE, either FALSE. 👌🏼

Be my type!

In, Power Query, one of the very first functions (and step) you see appearing in the formula bar is Table.TransformColumnTypes() (though you never really write it manually)

346qobUUS4qxOOQpshzO_temp

And, whether in Excel, or in Power BI (Power Query, Power Pivot), it is a good practice to NOT mix data types. Don't even dare messing up with types! Kind warning 😋
JTW1Fy3mQOqUKJaCWJJ7_temp

 

For that, Power Query offers you a variety of data types to choose from.

Notice that here we have more than the 4 different data types we discussed earlier.

Numbers are sub-divided into:

  • Whole number

  • Decimal Number

  • Currency (Fixed decimal number in Power Query for Power BI => Yep! Go wonder why 😕)

  • and Percentage

Date and Time format is also sub-divided into:

  • Date/Time

  • Date

  • Time

  • Date/Time/Timezone

  • and Duration

We can also notice a new data type => "Binary" -> 😳 What on earth is that -> Let's keep that for another time maybe. 😋

And, also Using Locale, as if we didn't have enough data types, right? We can review that one when we discuss date and time data types

What about data category?

Wait! What? Data what again? You're kidding right?

Nope, not kidding. It's a real thing 😁
PIn0NpCTSNyKSFm84qvq_temp

 



The list of categories available actually depends on the column data type selected.

jBaOFzorSJC3lILYokgH_temp

Alright, cool! That's in Power BI. What about Power Pivot, or Excel?
Power Pivot doesn't offer the option to categorize data, but Excel... hmmmm... does try something similar, though it is just bringing more to the confusion.

AvCpnOIRSCKBeobxKG1w_temp

Yep, that's right, the guys of the Excel team decided to name that group "Data Types" => Really confusing, right? Anyway, the majority of users don't even know those exist 😝

Enough!

That is it for now. That's enough for a good start. Hope you find it useful. Let me know if you have any questions, comments, or suggestions.


Thanks for reading!

 

 

Comments