March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
Assume I have like 5 Excel tables I have imported into Power BI. Now these tables share many columns and they also contain unique columns. Every week more data will be appended to these tables. Suppose I want to normalize the tables to the 3rd normal form, what would be the best way of doing this?.
Thanks in advance.
Solved! Go to Solution.
We would need more details specifically of the data and information those tables contain but it would need to be done in Power Query. The techniques will vary on the quality of the data but essentially you would be duplicating/referencing queries, removing columns, removing duplicate rows, etc.
Proud to be a Super User!
We would need more details specifically of the data and information those tables contain but it would need to be done in Power Query. The techniques will vary on the quality of the data but essentially you would be duplicating/referencing queries, removing columns, removing duplicate rows, etc.
Proud to be a Super User!
I have a follow up question. If I have imported these Excel files (which don't support query folding) and let's say I do perform the normalization process in Power Query. Wouldn't that then lead to loads of duplicated tables that have been imported because even if I filter the table, delete columns and so forth, the data has all been imported. And as I append more rows it will drastically increase the size of the model. At one end I'm normalzing the tables to make queries perform more efficiently but at the same time I'm drastically increasing the size of the data by duplicating tables. Please let me know if I've misunderstood. Thanks.
Also I'm keen on hearing what others have to say about this. @amitchandak, @lbendlin , @parry2k.
Like with any other ETL tool there is no free lunch. The price must be paid either by you (the developer) or by them (the end users). General guidance is that you have to pay, so that the users have an enjoyable UX.
Thanks. I appreciate your point about the necessary effort on the developer's part to ensure a good user experience. However, I'm still seeking specific guidance on the initial question regarding the normalization of tables in Power BI to the 3rd normal form. Could you provide more detailed advice or steps on how to effectively normalize multiple Excel tables within Power BI, particularly considering the challenges of data duplication and model size increase as new data is appended weekly? Any specific techniques or best practices in Power Query for handling this type of data transformation would be greatly appreciated. I appreciate @ray_aramburo mentioned data duplication would be involved, removing columns and rows etc. If this were to be done, would the model size not then affect query performance?. Or it would this have negligble effect?.
There's a balance of normalization form and business scenario. The extremes would be one giant flat table vs 6NF. You will have to find your own balance. 3NF sounds reasonable.
The more you normalize the lower the model's storage requirements. You have to pay the price for that in Cartesian currency later.
Normalize until your model is usable, but not more.
If you were to normalize would you duplicate or reference your tables in Power Query?.
Doesn't matter for Power BI. Power BI doesn't know what Power Query did. All it knows is partitions.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
115 | |
76 | |
57 | |
52 | |
44 |
User | Count |
---|---|
164 | |
116 | |
63 | |
57 | |
50 |