Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
I have some columns in different tables that take up a LOT of space. One column specifically consumes more than 40% of our data model size.
I know that the typical data reduction methods are either:
- Remove columns you don't need
- Remove rows you don't need
- Convert data types to numeric values when possible
I do need these columns, I have already reduced the number of rows as much as I can, and the data types for these columns are text because the values are in this format: "a36be-f3c5-d293f93da2-f03df-a49f".
The high cardinality of the data for these columns is blowing up our model size. What would be the best way to reduce our data size without removing data from our model entirely?
You cannot apply techniques like separation of date and time parts to GUIDs. GUIDs by their very nature have to have high cardinality. You could theoretically replace the GUID with an integer index column but that would only reduce the storage needs, not the cardinality.