Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
I have a 700x800 dataset coming from a long questionnaire (700 questions x 800 submissions).
To work with the data in Power BI, I need to unpivot every few columns, that represent a question, together.
I thought it would be a bad idea to do all that unpivoting in one query, so I started referencing the main cleaned query and using "Choose Columns" to keep only a few related columns to unpivot in a separate query.
It worked great for a while, but as the number of referencing queries increased, I started noticing some performance issues, which raised several questions in my head:
1- Is power BI really copying the whole table and then deleting the columns I exclude in every sub-query? and if so, is there a way to just copy\reference only a selected group of columns? (maybe something like the DAX SelectColumns) without so much waste of processing time.
2- Am I mistaken to think that this segmentation is my only option here?
Solved! Go to Solution.
Hi, @Nart , you're only one inch away from goal. After unpivoting the dataset, extract the serial no of questions. i.e. "Q1" from "Q1-A1", "Q1-A2", "Q1-A3" ... a transformed dataset is like this and it's ready for slice and dice from quite a few dimensions, such as index, gender, age, question, etc.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfIFYkMzIOEM4hmAeFAamW1IBBvBj9WJVjICstxAPAuQ2UYEzMNlrwGGepDZxlCzjYxgZmMzC7vbsPMRZpvAwsQSZLYxHrOwmYnbj7GxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, gender = _t, age = _t, camp = _t, #"Q1-A1" = _t, #"Q1-A2" = _t, #"Q1-A3" = _t, #"Q1-A4" = _t, #"Q1-A5" = _t, #"Q2-A1" = _t, #"Q2-A2" = _t, #"Q2-A3" = _t, #"Q2-A4" = _t, #"Q2-A5" = _t, #"Q2-A6" = _t, #"Q3-A1" = _t, #"Q3-A2" = _t, #"Q3-A3" = _t, #"Q4-A1" = _t, #"Q4-A2" = _t, #"Q4-A3" = _t, #"Q4-A4" = _t]),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Index", "gender", "age", "camp"}, "Opt", "Answer"),
#"Inserted Text Before Delimiter" = Table.AddColumn(#"Unpivoted Other Columns", "Question", each Text.BeforeDelimiter([Opt], "-"), type text),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Text Before Delimiter",{{"Answer", Int64.Type}, {"Index", Int64.Type}})
in
#"Changed Type"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Hi @Nart ,
Please let us know if the replies above are helpful.
If they are, please always accept the replies making sense as solution to your question so that people who may have the same question can get the solution directly.
If not, please let us know.
Best Regards,
Icey
Nevertheless, I'm still interested in knowing whether there's a way to pick a group of columns in a new query without pulling the whole original query first.
Hi, @Nart , sure thing, you might want to use Table.SelectColumns() to extract specific columns from the original dataset.
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Hi, @Nart
How do you define correlations among questions? I think another table is necessary for the dataset regardless of its form (one-dimenstional table after unpivoting or original two-dimensional table).
As you know, DAX runs on VertiPaq Engine, which is in-memory columnar database. Here's a brief introduction for your reference.
https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=2
To my superfacial understanding, VertiPaq favors one-d table.
"...
Single-column access is very fast, because it reads a single block of memory, and then it computes whatever aggregation you need on that memory block.
If an expression uses many columns, the algorithm is more complex because it requires the engine to access different memory areas at different times, keeping track of the progress in some temporary area.
The more columns you need to compute an expression, the harder it becomes to produce a final value, up to a point where it is easier to rebuild the row storage out of the column store to compute the expression.
..."
(qutotation from afore-mentioned article)
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
@CNENFRNL Thanks a lot! that was right to my point.
I guess I'll just create a long table and use a duplicate with questions that require cross-question filtering.
I'll let you know how that works.
I am using the reference feature as I said, not duplicating. But still, the referencing step seems to pull the entire table into the new query while I just need a few columns.
The reason I went with segmenting is that I need to unpivot groups of columns together. so If I do this while all columns are in the same query, I'd be creating thousands of new rows every time.
Here's how my dataset looks like.
Each question is represented across several columns.
To viaualize and analyze the answers, I'm unpivoting the columns for each question separately to get all its answers in one column.
Hi, @Nart , you're only one inch away from goal. After unpivoting the dataset, extract the serial no of questions. i.e. "Q1" from "Q1-A1", "Q1-A2", "Q1-A3" ... a transformed dataset is like this and it's ready for slice and dice from quite a few dimensions, such as index, gender, age, question, etc.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfIFYkMzIOEM4hmAeFAamW1IBBvBj9WJVjICstxAPAuQ2UYEzMNlrwGGepDZxlCzjYxgZmMzC7vbsPMRZpvAwsQSZLYxHrOwmYnbj7GxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, gender = _t, age = _t, camp = _t, #"Q1-A1" = _t, #"Q1-A2" = _t, #"Q1-A3" = _t, #"Q1-A4" = _t, #"Q1-A5" = _t, #"Q2-A1" = _t, #"Q2-A2" = _t, #"Q2-A3" = _t, #"Q2-A4" = _t, #"Q2-A5" = _t, #"Q2-A6" = _t, #"Q3-A1" = _t, #"Q3-A2" = _t, #"Q3-A3" = _t, #"Q4-A1" = _t, #"Q4-A2" = _t, #"Q4-A3" = _t, #"Q4-A4" = _t]),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Index", "gender", "age", "camp"}, "Opt", "Answer"),
#"Inserted Text Before Delimiter" = Table.AddColumn(#"Unpivoted Other Columns", "Question", each Text.BeforeDelimiter([Opt], "-"), type text),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Text Before Delimiter",{{"Answer", Int64.Type}, {"Index", Int64.Type}})
in
#"Changed Type"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Thanks @CNENFRNL , I did try that earlier, but what I've lost is the ability to filter questions by other questions to see how they correlate. Plus, wouldn't the whole (very long) database have to be loaded and filtered in every visual in that case? wouldn't that cause report pages and interactive filters to load much more slowly?
Do you think it would be a reasonable performance tradeoff?
Power Query can handle much more than 700 columns. Why did you initially decide to segment? In any case, are you using the Reference feature between your queries?
Reference vs Duplicate in Power BI; Power Query Back to Basics - RADACAD
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.