Solved: Star Schema - Lines in my fact table increase trem...

Iva_C_Maia · ‎07-21-2022

Hi,

New to power bi. I have an initial csv file with 45 million rows x 14 columns (10 GB file). After adding dimension tables and creating a star schema, my fact table gets loaded with 400+ million rows! What am I doing wrong? Really appreciate your help.

I describe what've done below:

I import the csv file to pbi desktop/power query editor and do some minor transformations (substitute values, change data type, and filter), in a small number of steps.

Then I start building a star schema and creating dimension tables: I create the dimension tables by duplicating the original fact table, removing columns not needed, removing duplicates, adding an index column, renaming column names; then I go back to the fact table, and link the two; finally I remove from the fact table the columns made redundant. I do this 4 times - I have created 4 dimension tables - but by the end the process is very, very, very slow and when I finally close and apply, after loading the model, my fact table has 400+ milion rows!!!! Thanks for your help.

Iva

amitchandak · ‎07-21-2022

@Iva_C_Maia , In such case DAX can be better option

Use distinct for single column

Tab = distinct(Table[Col])

Tab2 = Summarize( Table, Table[Col], Table[Col2])

For single column

In power query use

List.Distinct(Table[Col])

In a blank query and then convert to a table

List.Distinct: https://youtu.be/zNREVnoAHwM

!! Power BI 101 Interview questions !! !! Master Microsoft Fabric- 36 Videos !!
Microsoft Power BI Learning Resources, 2023 !!
Learn Power BI - Full Course with Dec-2022, with Window, Index, Offset, 100+ Topics !!
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !! Proud to be a Super User! !!

View solution in original post

amitchandak · ‎07-21-2022

@Iva_C_Maia , In such case DAX can be better option

Use distinct for single column

Tab = distinct(Table[Col])

Tab2 = Summarize( Table, Table[Col], Table[Col2])

For single column

In power query use

List.Distinct(Table[Col])

In a blank query and then convert to a table

List.Distinct: https://youtu.be/zNREVnoAHwM

!! Power BI 101 Interview questions !! !! Master Microsoft Fabric- 36 Videos !!
Microsoft Power BI Learning Resources, 2023 !!
Learn Power BI - Full Course with Dec-2022, with Window, Index, Offset, 100+ Topics !!
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !! Proud to be a Super User! !!

Iva_C_Maia · ‎07-24-2022

Hi,

I used the summarize dax function to create my dimension tables in the data view (and not on the query editor), and it was super easy. However I needed to create index columns in those dim tables and I didn't find how to load them in to my model.

Could have used rankx for the index, but instead I simple copied the dim tables from the table view to the query editor using the option of inserting data.

Eventually I realised what was causing that strange an tremendous multiplication of rows on my fact table once I loaded the model, after connecting fact and dim tables... It had nothing to do with the way I was creating my dim tables, but with one particular dim table.

One of my dimension tables had 3 columns and the "remove duplicates" was made selecting all 3 columns which left me with duplicate values in my bottom level column (used to relate to my fact table). Once I removed duplicates selecting only that column, I had no more issues, and my fact table was correctly loaded.

Thanks for your reply!

Star Schema - Lines in my fact table increase tremendously after adding dimension tables

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly