Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
mynameisvincent
Regular Visitor

Model & measure design for large survey data

Dears,

 

I've encountered an issue where PowerBI desktop is running in Memory Error at Power Query stage because it contains a lot of duplicate tables containing all my survey data. I am seraching for a more efficient design, however, I'm at the end of my wits here. I'd appreciate any hints, ideas, or maybe it is not possible at all?

 

So, about my data.

I have survey data for, currently, 6 years. Each year has ~3000 respondents. For each respondent I have around 500 questions. Due to how the questions are represented, in the long data format, the data for one year contains ~14 million rows. So for 6 years, it's ~84 million rows. The largest proporiton of this data & all the labels, all years combined is 200 MB in csv format using gzip compression, but this is only for context, currently I don't see an issue with that.

About my use-case.

I am building an app-like dashboard where the user can interact with all this data. The problematic requirement – the one for which I am asking for help – is that the user should be able to select respondent samples. So, for example, the user should be able to select respondent sample based on Category1_Question2_Answer5, AND, e.g., Category2_Question1_Answer5. For this respondent (as opposed to dimension) sample the user then can view what proportion of it responded Answer5 OR Answer4 for Category1_QuestionX (including Question2 (in this case, the proportion is 100%)), Category2_QuestionX (including Question1 (in this case, also the proporiton is 100%)), and any arbitrary CategoryX_QuestionX.

About my current solution.

I have a star schema. With data tables only. This creates the memory error because each table is so big. Is there any better design possible? Is there a model & measure design combination that would allow me to use only one or two data tables and some dimension tables?

About the example in the link.

I have entered dummy data and recreated my model, visual, and measure examples in a way that reflects my current situation with the sample selection. There are some more comments in the file.

https://we.tl/t-M9Dxv5KVZN 

Pictures from real solution.

In the first picture the user can define desired respondent samples based on any available dimension. In the second picture the user can view any dimension data for that respondent sample.

sample_selection.pngviewing_sample_data.png

1 REPLY 1
lbendlin
Super User
Super User

Please familiarize yourself with the concept of incremental refresh. Your data seems ideally suited for that.

Helpful resources

Announcements
Sticker Challenge 2026 Carousel

Join our Community Sticker Challenge 2026

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

January Power BI Update Carousel

Power BI Monthly Update - January 2026

Check out the January 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.