Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

View all the Fabric Data Days sessions on demand. View schedule

Reply
Anonymous
Not applicable

Best practice to model large volumes row orientated data of multiple data types

I am looking for advice on how best to model the data in the scenario outlined below. The core of this problem is to do with the scalability and simplicity in mind.

 

Consider a scenario where have three tables (this is a simplified use case). The first is an object table that contains 3 things unique object id, name and object type. The second is a feature value table that contains unique object id, feature id and feature value. The third and final table is feature table that contains unique feature id and feature name.

 

situ042_0-1728314765426.png

 

In terms of scale a customer may typically have 100,000s of objects with 1,000s od features. Some customers much more and some customers much less. It is likely the data is extremely sparse, by that I mean an object may only 40 or 50 feature values but, these will be spread across the many different features.

 

I am sure this row type structure of data storage has a name but it eludes me at the moment.

 

The problem is how to analyze the data according to specific features. Consider as an example the features activity type, launch date, col and weight. I want to use the structure above to report on all objects of the object type footwear and create a heat map by activity type with drill down to color.  Review this by product at different points in time before such as before 2024 or after 2023.

 

The challenge is how to report on the feature value table that has values that could be of type date, integer, float, list of values, text, etc. For a given feature all values will be of the same type but the feature values table contains 1,000s of features of various types for 100,000s of objects. The featureValues table can be millions of rows.

 

Obviously I would wish to avoid creating separate tables for each feature as this has scalability and manageability issues.

 

What is the best approach to modeling this data to allow business analysts work with it easily in Power BI. The actual data source is not excel but JSON loaded into me MS Fabric Lakehouse and transformed using PySpark.

 

Thoughts and pointers on how to do this appreciated, I have a very limited scope of remodeling the data to simplify the problem if needed.

 

Sample data below

 

object

idnametype
A0001Sneaker 1footwear
A0002Sneaker 2footwear
A0003Sneaker 3footwear
A0004Sneaker 4footwear
A0101Ski Glove 1gloves
A0102Ski Glove 2gloves

 

featureValues

idfeatureIdvalue
A0001FID000012024-01-01
A0001FID00002Tan
A0001FID00003Running
A0001FID00004900
A0002FID000012024-01-31
A0002FID00002Blue
A0002FID00003Hiking
A0002FID000041200
A0003FID000012023-12-21
A0003FID00002White
A0003FID00003Tennis
A0003FID00004950
A0004FID000012023-03-01
A0004FID00002Blue
A0004FID00003Running
A0004FID00004775
A0101FID000012024-01-01
A0101FID00003Skiing
A0101FID00004267
A0102FID000012023-10-01
A0102FID00003Sailing
A0102FID00004201

 

feaureNames

featureIdname
FID00001Launch Date
FID00002Color
FID00003Activity
FID00004Weight (g)
1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @Anonymous ,
Thanks for lbendlin reply.

Based on your description, your data is eligible to create an EVA model, and since feature_value can be of various types (date, integer, float, text, etc.), it is possible to store all the values as strings and convert them to the appropriate type when analyzing. Make sure to index object_id and feature_id in the feature_value table to speed up queries. Import the converted data into Power BI. you can use DirectQuery if the dataset is too large.

The EAV data model: when to use it (inviqa.com)

Best regards,
Albert He


If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Hi @Anonymous ,
Thanks for lbendlin reply.

Based on your description, your data is eligible to create an EVA model, and since feature_value can be of various types (date, integer, float, text, etc.), it is possible to store all the values as strings and convert them to the appropriate type when analyzing. Make sure to index object_id and feature_id in the feature_value table to speed up queries. Import the converted data into Power BI. you can use DirectQuery if the dataset is too large.

The EAV data model: when to use it (inviqa.com)

Best regards,
Albert He


If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

Anonymous
Not applicable

I guess as an obvious follow up question what is the best appraoch to transforming EAV data for anlysis in Power BI.

My inclanation right now is to avoid doing this at an sql level due to the overhead of dealing with the high volumes of data and updates.

I am considering whether it could be best to do this using dataframes and pyspark.

Any thoughts, opinions and / or referencese on this?  

dataframes are kinda the opposite of key/value pairs.  I would think that in-memory columnar data stores are better suited at handling that kind of data, especially when it is presorted (resulting in substantial compression). The question will be how much the presentation layer (and I include the Vertipaq engine here) can help mitigate the additional administrative effort to transpose the key/values back into columns and rows.

Anonymous
Not applicable

This is a great reference and helps me dig deeper to implement a solution using EAV models.

lbendlin
Super User
Super User

I am sure this row type structure of data storage has a name 

It's called "the dimension table with (too) high cardinality primary key"

 

Power BI struggles with any relationship where the key column has a cardinality over 50000.  The issue is that the query (each query) will enumerate all values in the key column.

 

I am not aware of a remedy.  Maybe Qlik with its associative model is a better fit.

Helpful resources

Announcements
November Power BI Update Carousel

Power BI Monthly Update - November 2025

Check out the November 2025 Power BI update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.