Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!View all the Fabric Data Days sessions on demand. View schedule
I am looking for advice on how best to model the data in the scenario outlined below. The core of this problem is to do with the scalability and simplicity in mind.
Consider a scenario where have three tables (this is a simplified use case). The first is an object table that contains 3 things unique object id, name and object type. The second is a feature value table that contains unique object id, feature id and feature value. The third and final table is feature table that contains unique feature id and feature name.
In terms of scale a customer may typically have 100,000s of objects with 1,000s od features. Some customers much more and some customers much less. It is likely the data is extremely sparse, by that I mean an object may only 40 or 50 feature values but, these will be spread across the many different features.
I am sure this row type structure of data storage has a name but it eludes me at the moment.
The problem is how to analyze the data according to specific features. Consider as an example the features activity type, launch date, col and weight. I want to use the structure above to report on all objects of the object type footwear and create a heat map by activity type with drill down to color. Review this by product at different points in time before such as before 2024 or after 2023.
The challenge is how to report on the feature value table that has values that could be of type date, integer, float, list of values, text, etc. For a given feature all values will be of the same type but the feature values table contains 1,000s of features of various types for 100,000s of objects. The featureValues table can be millions of rows.
Obviously I would wish to avoid creating separate tables for each feature as this has scalability and manageability issues.
What is the best approach to modeling this data to allow business analysts work with it easily in Power BI. The actual data source is not excel but JSON loaded into me MS Fabric Lakehouse and transformed using PySpark.
Thoughts and pointers on how to do this appreciated, I have a very limited scope of remodeling the data to simplify the problem if needed.
Sample data below
object
| id | name | type |
| A0001 | Sneaker 1 | footwear |
| A0002 | Sneaker 2 | footwear |
| A0003 | Sneaker 3 | footwear |
| A0004 | Sneaker 4 | footwear |
| A0101 | Ski Glove 1 | gloves |
| A0102 | Ski Glove 2 | gloves |
featureValues
| id | featureId | value |
| A0001 | FID00001 | 2024-01-01 |
| A0001 | FID00002 | Tan |
| A0001 | FID00003 | Running |
| A0001 | FID00004 | 900 |
| A0002 | FID00001 | 2024-01-31 |
| A0002 | FID00002 | Blue |
| A0002 | FID00003 | Hiking |
| A0002 | FID00004 | 1200 |
| A0003 | FID00001 | 2023-12-21 |
| A0003 | FID00002 | White |
| A0003 | FID00003 | Tennis |
| A0003 | FID00004 | 950 |
| A0004 | FID00001 | 2023-03-01 |
| A0004 | FID00002 | Blue |
| A0004 | FID00003 | Running |
| A0004 | FID00004 | 775 |
| A0101 | FID00001 | 2024-01-01 |
| A0101 | FID00003 | Skiing |
| A0101 | FID00004 | 267 |
| A0102 | FID00001 | 2023-10-01 |
| A0102 | FID00003 | Sailing |
| A0102 | FID00004 | 201 |
feaureNames
| featureId | name |
| FID00001 | Launch Date |
| FID00002 | Color |
| FID00003 | Activity |
| FID00004 | Weight (g) |
Solved! Go to Solution.
Hi @Anonymous ,
Thanks for lbendlin reply.
Based on your description, your data is eligible to create an EVA model, and since feature_value can be of various types (date, integer, float, text, etc.), it is possible to store all the values as strings and convert them to the appropriate type when analyzing. Make sure to index object_id and feature_id in the feature_value table to speed up queries. Import the converted data into Power BI. you can use DirectQuery if the dataset is too large.
The EAV data model: when to use it (inviqa.com)
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hi @Anonymous ,
Thanks for lbendlin reply.
Based on your description, your data is eligible to create an EVA model, and since feature_value can be of various types (date, integer, float, text, etc.), it is possible to store all the values as strings and convert them to the appropriate type when analyzing. Make sure to index object_id and feature_id in the feature_value table to speed up queries. Import the converted data into Power BI. you can use DirectQuery if the dataset is too large.
The EAV data model: when to use it (inviqa.com)
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
I guess as an obvious follow up question what is the best appraoch to transforming EAV data for anlysis in Power BI.
My inclanation right now is to avoid doing this at an sql level due to the overhead of dealing with the high volumes of data and updates.
I am considering whether it could be best to do this using dataframes and pyspark.
Any thoughts, opinions and / or referencese on this?
dataframes are kinda the opposite of key/value pairs. I would think that in-memory columnar data stores are better suited at handling that kind of data, especially when it is presorted (resulting in substantial compression). The question will be how much the presentation layer (and I include the Vertipaq engine here) can help mitigate the additional administrative effort to transpose the key/values back into columns and rows.
This is a great reference and helps me dig deeper to implement a solution using EAV models.
I am sure this row type structure of data storage has a name
It's called "the dimension table with (too) high cardinality primary key"
Power BI struggles with any relationship where the key column has a cardinality over 50000. The issue is that the query (each query) will enumerate all values in the key column.
I am not aware of a remedy. Maybe Qlik with its associative model is a better fit.
Check out the November 2025 Power BI update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
| User | Count |
|---|---|
| 10 | |
| 9 | |
| 6 | |
| 5 | |
| 3 |