Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a matrix table with over 500K rows and over 100 columns. A few columns have TRUE FALSE based values
I am trying to use a slicer filter by condition, where if selected then the table gets filtered to rows that have a TRUE or FALSE value for a column based on the slicer value. So the slicer filters columns and gets me the population that for that column are applicable
Then based on the selction for example if MUSCULOSKELETAL is selected then I want to know for persons that are a TRUE for MUSCULOSKELETAL how many have chd, and how many have anxiety etc. The prevalence of chd and anxiety have already been created as measures like CHD Prev = COUNT(Epi[chd active])/COUNT(Epi[ID]) and Anxiety Prev = COUNT(Epi[anxiety active])/COUNT(Epi[ID]) respectively.
What I tried was
COMORBIDITY = IF(ISCROSSFILTERED(Diagnosis[Diagnosis]),
VAR
depression = CALCULATE(COUNTROWS(Epi),Epi[depression active] = "true")
VAR
diabetes = [DIABETES]
VAR
etc. many VAR like the above and then
RETURN
SWITCH(ALLSELECTED(Diagnosis[Diagnosis]), "acute renal failure", acute_renal_failure, "anxiety", anxiety, "asthma", asthma, "bipolar", bipolar, "chd", chd, "chronic renal failure", etc.
While this worked to give me the total number of the selcted slicer it does not filter the prevalence measures - rightly so as the row context and filter context are not being met.
So any suggestions of how I can look at the co-morbidity of selected groups - i.e. use the slicer to filter though the table for a particular condition and for patients of that condition find the prevalence (measures) of other conditions.
Your scenario requires a deeper analysis, which I cannot perform using the forum, but at first sight, I would say that you have a wrong model. Well, wrong is probably too bad, let us say "suboptimal". In other words, the code tend to be very complex to write.
Instead of having 100 columns, most of which are empty, I would create 100 rows, removing the empty ones. You will increase the number of rows from 500K to several millions, but the size is still very manageable and the model probably results smaller.
Then, using techniques like the one described here for basket analysis, you can achieve the results you need.
I worked on several healthcare models like yours, and I have always been impressed by the complexity of the queries that you guys generate. On the wrong (suboptimal) model they become a nightmare. Anyway, there's no quick solution, as I said, it would require more dedicated time to produce a good result. Not that I want to sell you my time, it is only that a discussion in the forum is likely not enough for your needs, you will probably need to roll up your sleeves and be prepared to change the model and learn some advanced DAX.
A scenari similar to this can also be found in The Definitive Guide to DAX, on page 392 where we speak about "Frequent itemset search", which is a relaxed version of basket analysis.
Have fun with DAX!
Alberto Ferrari
http://www.sqlbi.com
Many thanks for that @AlbertoFerrari
I have heard several of your sessions and will try to get your book. I actually did purchase your initial book "Excel 2013 Building Data Models with PowerPivot". So I am very aware of the amazing work you and Mark do.
I will look into your trainings/mentorship/consultation.
In general what would be your advise as in ideal steps in setting up models for cases like these where the fact table (currently just the one called Epi) is 550k rows x 100 columns - should I split it into 3-4 smaller fact tables like conditions, investigations, intervention dates and patient details all lonked via the same primary key such as patient ID??? And have all smaller filter tables like geography, time, age all linked to all the patient ID column of all these smaller fact tables??? That way I will be able to pivot columns and hence not have as many columns as you have suggested
Currently I have the typical star schema as follows
Abhijeet
PS: Would you revommend a mentoring arrangement with SQLBI or just the gebric trainibgs that you guys have?
Hi,
It is very hard to give you any advice without analyzing much deeper your requirements, sorry for that but whatever I tell you would likely be a wrong answer.
As a general idea, for the kind of queries that I think you are trying, I would go much farther in normalizing the model.
Instead of having 100 columns, I think you can end up with a small set of columns (patient, date, doctor, hospital and stuff like this) to identify the event (a visit, perhaps) and then another table where your 100 columns become 100 rows and the column names become a new "Symptom" dimension.
So, something like:
Patient Date Symptom Measure Alberto 09/11 Headache YES Alberto 09/11 Pain YES Alberto 09/11 Dead NO
This lets you compute questions like: give the percentage of patients that had a headache and were not dead, or show me the patients without a headache, with pain and died. or any combination of AND, OR and NOT conditions. The code is not easy to write, but can be very generic and driven by the user through slicers.
Of course, the size of the table increases by a lot, but this is not an issue at all, a few hundreds of millions of rows are not scary at all.
Alberto
Hi,
It is very hard to give you any advice without analyzing much deeper your requirements, sorry for that but whatever I tell you would likely be a wrong answer.
As a general idea, for the kind of queries that I think you are trying, I would go much farther in normalizing the model.
Instead of having 100 columns, I think you can end up with a small set of columns (patient, date, doctor, hospital and stuff like this) to identify the event (a visit, perhaps) and then another table where your 100 columns become 100 rows and the column names become a new "Symptom" dimension.
So, something like:
Patient
Any assistance please
@Eric_Zhang, @kolson256, @konstantinos, @Phil_Seamark, @ImkeF, @v-ljerr-msft, @MattAllington, @Sean
Just to give everyone a bit more clarity what I am trying to do is
I have prevalence of conditions in a chart like this
and I am trying to use a slicer to automatically pick up a section of the population as if I was creating a chart like this
and then use a selction of the 2nd chart to find prevalence in the 1st chart like this
BUT INSTAED OF CHART 1 which is based on one column of the table I am trying to use a slicer to automatically pick up the right column. And also the slicer fields to have composite columns too as in if the slicer value is DIABETES then the population selcted is FILTER(Epi, NOT(ISBLANK(Epi[diabetes type1 active])) || NOT(ISBLANK(Epi[diabetes type2 active])) || NOT(ISBLANK(Epi[diabetes undefined active])))
You may learn to use the SELECTEDVALUE function.
https://www.sqlbi.com/blog/marco/2017/08/01/new-function-and-the-importance-of-variables-in-dax-dax/
Not sure how this helps in my model.
Coukd you please elaborate on how you'd expect me to use SELECTEDVALUE for my particular scenario
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
108 | |
105 | |
88 | |
74 | |
66 |
User | Count |
---|---|
126 | |
111 | |
100 | |
83 | |
71 |