The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have some columns in different tables that take up a LOT of space. One column specifically consumes more than 40% of our data model size.
I know that the typical data reduction methods are either:
- Remove columns you don't need
- Remove rows you don't need
- Convert data types to numeric values when possible
I do need these columns, I have already reduced the number of rows as much as I can, and the data types for these columns are text because the values are in this format: "a36be-f3c5-d293f93da2-f03df-a49f".
The high cardinality of the data for these columns is blowing up our model size. What would be the best way to reduce our data size without removing data from our model entirely?
You cannot apply techniques like separation of date and time parts to GUIDs. GUIDs by their very nature have to have high cardinality. You could theoretically replace the GUID with an integer index column but that would only reduce the storage needs, not the cardinality.