Data reduction techniques for high cardinality tex...

delaneyjwh · ‎04-13-2023

I have some columns in different tables that take up a LOT of space. One column specifically consumes more than 40% of our data model size.

I know that the typical data reduction methods are either:

- Remove columns you don't need

- Remove rows you don't need

- Convert data types to numeric values when possible

I do need these columns, I have already reduced the number of rows as much as I can, and the data types for these columns are text because the values are in this format: "a36be-f3c5-d293f93da2-f03df-a49f".

The high cardinality of the data for these columns is blowing up our model size. What would be the best way to reduce our data size without removing data from our model entirely?

lbendlin · ‎04-17-2023

You cannot apply techniques like separation of date and time parts to GUIDs. GUIDs by their very nature have to have high cardinality. You could theoretically replace the GUID with an integer index column but that would only reduce the storage needs, not the cardinality.

Data reduction techniques for high cardinality text column?

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly

Join us at FabCon Vienna from September 15-18, 2025

Data reduction techniques for high cardinality text column?

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly