Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
When you import data into Power BI, the VertiPaq engine compresses it using a columnar storage format built for analytics. This greatly reduces memory usage, so datasets often shrink to a fraction of their original size, making it possible to handle millions of rows while still getting very fast query performance. The limitation is that VertiPaq treats text as case-insensitive and keeps only the first version it encounters for identical strings, which can sometimes lead to issues with how the data is represented.
In this blog, we’ll walk through a way to keep the original letter casing when loading data, so your column values appear exactly as they do in the source.
We will use the example below (full data in the attached Excel file). Notice that names of fruits have different cases - some all caps, some lower caps and some a combination of both.
If we load this directly into the model, the VertiPaq engine will keep the first version of the identical strings it encounters. For instance, Row 1 should be APPLE and Row 6 should be Apple, but after loading, both rows appear as APPLE in all uppercase.
What we need to do now is programmatically distinguish each instance of identical strings so that APPLE and Apple are loaded exactly as they appear, rather than being treated as the same. Since VertiPaq is case-insensitive, we’ll handle this visually by adding invisible characters to the strings. The approach is to concatenate a zero-width space a certain number of times, with the count determined by Table.AddRankColumn M function. This way, only duplicates within the same string group get adjusted. We can’t use an index column for this purpose, because that would make every row unique, which is not what we want. Because M is case-sensitive, we can rely on it to respect the original casing while still distinguishing duplicates.
Since Table.AddRankColumn is a table function, we’ll be adding a custom step. Right click the latest applied step in the query editor and add the following code. Source is the name of the previous step and Text Rank is the name of the new column. With RankKind.Dense rank kind type, items which compare as equal receive the same ranking number and the next item is numbered consecutively. With no gap. Whether the order should be descending or ascending doesn’t matter as we only need to create a rank column.
= Table.AddRankColumn(
Source,
"Text Rank",
{"Text", Order.Descending},
[RankKind = RankKind.Dense]
)
Rename the step from Custom to Added Rank Column.
Next is to add a custom column concatenating the Text column values with a no width-space repeated several times based on the value in Text Rank column. The custom column name in this example is Text 2.
[Text] & Text.Repeat(Character.FromNumber(8203),[Text Rank])
Delete the Text Rank column and load the data into the model.
Considerations:
A vlog version of this tutorial is available on YouTube - https://youtu.be/zucfRfE2MEA
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.