March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi there Super Users,
I'm trying to import a dataset with just over 1million rows of data and it is taking hours (2.5hrs now). I am concerned it may have something to do with the way I've structured my query, particularly in relation ot the merged queries.
In essence, there are some rows that have to be merged and the way I've achieved this is to import the data set twice, applying an index to the first dataset (inpatient_all), then also applying an index to the second dataset (inpatient), with the second dataset being the one I am manipulating (merging the rows).
If you're able to make sence of the M code (pasted below the signoff) and provide any suggestions on how I might make this more efficient, I'd really appreciate it.
Thanks heaps in advance for your time and assistance (really appreciate it as I've only been using Power BI for a week).
Cheers
Bec
let
Source = Csv.Document(File.Contents("C:\Users\rlindberg\Desktop\DoH Data\hmdc_event_data_v1.csv"),[Delimiter=",", Columns=48, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"enc_person", type text}, {"age", Int64.Type}, {"hospital", Int64.Type}, {"mh_unit", Int64.Type}, {"sor_prof", Int64.Type}, {"sor_loc", Int64.Type}, {"afrom", Int64.Type}, {"diag", type text}, {"dagger", type text}, {"ediag1", type text}, {"ediag2", type text}, {"ediag3", type text}, {"ediag4", type text}, {"ediag5", type text}, {"ediag6", type text}, {"ediag7", type text}, {"ediag8", type text}, {"ediag9", type text}, {"ediag10", type text}, {"ediag11", type text}, {"ediag12", type text}, {"ediag13", type text}, {"ediag14", type text}, {"ediag15", type text}, {"ediag16", type text}, {"ediag17", type text}, {"ediag18", type text}, {"ediag19", type text}, {"ediag20", type text}, {"ecode1", type text}, {"ecode2", type text}, {"ecode3", type text}, {"ecode4", type text}, {"mdc_c", Int64.Type}, {"mhlstat", Int64.Type}, {"los", Int64.Type}, {"adm", type date}, {"sep", type date}, {"mos", Int64.Type}, {"dischto", Int64.Type}, {"epicar", Int64.Type}, {"mh_pdx", Int64.Type}, {"mh_adx", Int64.Type}, {"readm", Int64.Type}, {"Ever_Inpatient_MH", Int64.Type}, {"Ever_ED_MH", Int64.Type}, {"Ever_Community_MH", Int64.Type}, {"inpatient_mh", Int64.Type}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"adm", "START_DATE"}, {"sep", "END_DATE"}}),
#"Added Custom2" = Table.AddColumn(#"Renamed Columns", "EVENT_SUB_CATEGORY", each if ([diag] <> null and Text.Start([diag],1) = "F") then "HOSP_MH_DIAGNOSIS_F"
else if ([diag] = "R45.81" or [diag] = "Z00.4" or [diag] = "Z03.2" or [diag] = "Z04.6" or [diag] = "Z09.3" or [diag] = "Z13.3" or [diag] = "Z13.4" or [diag] = "Z50.2" or [diag] = "Z50.3" or [diag] = "Z50.4" or [diag] = "Z54.3" or [diag] = "Z86.5" or [diag] = "Z91.4" or [diag] = "Z91.5") then "HOSP_MH_DIAGNOSIS_Z"
else if ([mdc_c] = "19" or [mdc_c] = "20") then "HOSP_MH_MDC"
else if [mh_unit] = 1 then "HOSP_MH_UNIT"
else if [mhlstat] = 1 then "HOSP_MH_INVOLUNTARY"
else if [mhlstat] = 2 then "HOSP_MH_VOLUNTARY"
else if ([ecode1] <> null and (
Text.Start([ecode1],2) = "X6" or
Text.Start([ecode1],2) = "X7" or
Text.Start([ecode1],3) = "X80" or
Text.Start([ecode1],3) = "X81" or
Text.Start([ecode1],3) = "X82" or
Text.Start([ecode1],3) = "X83" or
Text.Start([ecode1],3) = "X84" ) ) then "HOSP_MH_INTHARM"
else if ([ecode2] <> null and (
Text.Start([ecode2],2) = "X6" or
Text.Start([ecode2],2) = "X7" or
Text.Start([ecode2],3) = "X80" or
Text.Start([ecode2],3) = "X81" or
Text.Start([ecode2],3) = "X82" or
Text.Start([ecode2],3) = "X83" or
Text.Start([ecode2],3) = "X84" ) ) then "HOSP_MH_INTHARM"
else if ([ecode3] <> null and (
Text.Start([ecode3],2) = "X6" or
Text.Start([ecode3],2) = "X7" or
Text.Start([ecode3],3) = "X80" or
Text.Start([ecode3],3) = "X81" or
Text.Start([ecode3],3) = "X82" or
Text.Start([ecode3],3) = "X83" or
Text.Start([ecode3],3) = "X84" ) ) then "HOSP_MH_INTHARM"
else if ([ecode4] <> null and (
Text.Start([ecode4],2) = "X6" or
Text.Start([ecode4],2) = "X7" or
Text.Start([ecode4],3) = "X80" or
Text.Start([ecode4],3) = "X81" or
Text.Start([ecode4],3) = "X82" or
Text.Start([ecode4],3) = "X83" or
Text.Start([ecode4],3) = "X84" ) ) then "HOSP_MH_INTHARM"
else if ([dagger] <> null and Text.Start([dagger],1) = "F") then "HOSP_MH_CODIAGNOSIS_F"
else if ( ([ediag1] <> null and Text.Start([ediag1],1) = "F") or ([ediag2] <> null and Text.Start([ediag2],1) = "F") or ([ediag3] <> null and Text.Start([ediag3],1) = "F") or ([ediag4] <> null and Text.Start([ediag4],1) = "F") or ([ediag5] <> null and Text.Start([ediag5],1) = "F") or ([ediag6] <> null and Text.Start([ediag6],1) = "F") or ([ediag7] <> null and Text.Start([ediag7],1) = "F") or ([ediag8] <> null and Text.Start([ediag8],1) = "F") or ([ediag9] <> null and Text.Start([ediag9],1) = "F") or ([ediag10] <> null and Text.Start([ediag10],1) = "F") or ([ediag11] <> null and Text.Start([ediag11],1) = "F") or ([ediag12] <> null and Text.Start([ediag12],1) = "F") or ([ediag13] <> null and Text.Start([ediag13],1) = "F") or ([ediag14] <> null and Text.Start([ediag14],1) = "F") or ([ediag15] <> null and Text.Start([ediag15],1) = "F") or ([ediag16] <> null and Text.Start([ediag16],1) = "F") or
([ediag17] <> null and Text.Start([ediag17],1) = "F") or ([ediag18] <> null and Text.Start([ediag18],1) = "F") or
([ediag19] <> null and Text.Start([ediag19],1) = "F") or
([ediag20] <> null and Text.Start([ediag20],1) = "F") ) then "HOSP_MH_ADDDIAG_F"
else "HOSP_OTHER"),
#"Sorted Rows" = Table.Sort(#"Added Custom2",{{"enc_person", Order.Ascending}, {"START_DATE", Order.Ascending}, {"END_DATE", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Added Conditional Column" = Table.AddColumn(#"Added Index", "HomeIndex", each if [mos] = 5 then null else [Index]),
#"Grouped Rows" = Table.Group(#"Added Conditional Column", {"enc_person"}, {{"SecondTable", each _, type table}}),
#"Added Custom3" = Table.AddColumn(#"Grouped Rows", "Custom4", each Table.FillUp ([SecondTable],{"HomeIndex"})),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Added Custom3", "Custom4", {"enc_person", "START_DATE", "END_DATE", "mos", "los", "Index", "HomeIndex"}, {"enc_person.1", "START_DATE", "END_DATE", "mos", "los", "Index", "HomeIndex"}),
#"Added Conditional Column1" = Table.AddColumn(#"Expanded Custom1", "Custom", each if [HomeIndex] = null then [Index] else [HomeIndex]),
#"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column1",{"SecondTable", "enc_person.1", "HomeIndex"}),
#"Grouped Rows1" = Table.Group(#"Removed Columns", {"Custom", "enc_person"}, {{"START_DATE", each List.Min([START_DATE]), type anynonnull}, {"END_DATE", each List.Max([END_DATE]), type anynonnull}, {"los", each List.Sum([los]), type number}, {"min_INDEX", each List.Min([Index]), type number}, {"max_INDEX", each List.Max([Index]), type number}, {"S", each _, type table}}),
#"Added Custom4" = Table.AddColumn(#"Grouped Rows1", "No_Stat_Sep", each Table.RowCount( Table.SelectRows([S], each [mos]=5))),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom4",{"S", "Custom"}),
#"Merged Queries" = Table.NestedJoin(#"Removed Columns2",{"min_INDEX"},inpatient_all,{"Index"},"NewColumn",JoinKind.LeftOuter),
#"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"hospital", "mh_unit", "EVENT_SUB_CATEGORY","age"}, {"NewColumn.hospital", "NewColumn.mh_unit", "NewColumn.EVENT_SUB_CATEGORY","NewColumn.age"}),
#"Merged Queries1" = Table.NestedJoin(#"Expanded NewColumn",{"max_INDEX"},inpatient_all,{"Index"},"NewColumn",JoinKind.LeftOuter),
#"Expanded NewColumn1" = Table.ExpandTableColumn(#"Merged Queries1", "NewColumn", {"EVENT_SUB_CATEGORY"}, {"NewColumn.EVENT_SUB_CATEGORY.1"}),
#"Renamed Columns1" = Table.RenameColumns(#"Expanded NewColumn1",{{"NewColumn.hospital", "Location"}, {"NewColumn.mh_unit", "Dedicated_MH_Unit"}, {"NewColumn.EVENT_SUB_CATEGORY", "EVENT_SUB_CATEGORY"}, {"NewColumn.EVENT_SUB_CATEGORY.1", "EVENT_SUB_CATEGORY_SEP"}, {"NewColumn.age", "age"}}),
#"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns1",{"los", "min_INDEX", "max_INDEX"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"enc_person", "age", "Location", "START_DATE", "END_DATE", "EVENT_SUB_CATEGORY", "EVENT_SUB_CATEGORY_SEP", "No_Stat_Sep", "Dedicated_MH_Unit"}),
#"Merged Queries2" = Table.NestedJoin(#"Reordered Columns",{"Location"},#"WA Health Establishments",{"Estab ID"},"NewColumn",JoinKind.LeftOuter),
#"Expanded NewColumn2" = Table.ExpandTableColumn(#"Merged Queries2", "NewColumn", {"Establishment Name", "Suburb"}, {"NewColumn.Establishment Name", "NewColumn.Suburb"}),
#"Removed Columns3" = Table.RemoveColumns(#"Expanded NewColumn2",{"Location"}),
#"Reordered Columns1" = Table.ReorderColumns(#"Removed Columns3",{"enc_person", "age", "NewColumn.Suburb", "NewColumn.Establishment Name", "START_DATE", "END_DATE", "EVENT_SUB_CATEGORY", "EVENT_SUB_CATEGORY_SEP", "No_Stat_Sep", "Dedicated_MH_Unit"}),
#"Renamed Columns3" = Table.RenameColumns(#"Reordered Columns1",{{"NewColumn.Establishment Name", "Location"}, {"NewColumn.Suburb", "Region"}}),
#"Added Custom" = Table.AddColumn(#"Renamed Columns3", "EVENT_CATEGORY", each "INPATIENT"),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "EVENT_GROUP_CAT", each if [EVENT_SUB_CATEGORY] = "HOSP_MH_DIAGNOSIS_F" then "HOSP_MH_1"
else if [EVENT_SUB_CATEGORY] = "HOSP_OTHER" then "HOSP_OTHER"
else "HOSP_MH_OTHER"),
#"Reordered Columns2" = Table.ReorderColumns(#"Added Custom1",{"enc_person", "age", "Region", "Location", "START_DATE", "END_DATE", "EVENT_CATEGORY", "EVENT_SUB_CATEGORY", "EVENT_GROUP_CAT", "EVENT_SUB_CATEGORY_SEP", "No_Stat_Sep", "Dedicated_MH_Unit"})
in
#"Reordered Columns2"
Solved! Go to Solution.
It's a bit tricky to go through your M, but here are a few tips that might help.
Get your intitial data set loaded and as small as possible e.g. remove unwanted columns get your datatypes correct e.g. date not datetime. You can then create a 2nd data set with a reference so you don't do that work twice.
There is quite a bit of conditional and string manipliation. Maybe try and create a new column for the Text.Start([ecode1],2) & Text.Start([ecode1],3) items and replace nulls with a dummy value. Then create a Lookup Query that can be joined in the Model rather that trying to do it for each row.
There is probably scope for other lookups that might have a composite key. I tend concatinate fields with a | to get my keys.
Lookups can be generated from your source data but they may easier to create a manual lookup feed that is much quicker to load that getting a distinct list of values from 1mill rows.
Try moving some of your Column expressions to DAX see if that performs better.
Can the merge be moved into the model as a relationship? If you need to see it a single item then you can use DAX to combine tables e.g. NATURALINNERJOIN(<leftJoinTable>, <rightJoinTable>) then hide the 2 source tables.
Other options are to do some preprocessing outside of M. I have a really really nasty file with 25000 columns and I do preprocessing in R before loading a nice neat set of files into PowerBI.
It's a bit tricky to go through your M, but here are a few tips that might help.
Get your intitial data set loaded and as small as possible e.g. remove unwanted columns get your datatypes correct e.g. date not datetime. You can then create a 2nd data set with a reference so you don't do that work twice.
There is quite a bit of conditional and string manipliation. Maybe try and create a new column for the Text.Start([ecode1],2) & Text.Start([ecode1],3) items and replace nulls with a dummy value. Then create a Lookup Query that can be joined in the Model rather that trying to do it for each row.
There is probably scope for other lookups that might have a composite key. I tend concatinate fields with a | to get my keys.
Lookups can be generated from your source data but they may easier to create a manual lookup feed that is much quicker to load that getting a distinct list of values from 1mill rows.
Try moving some of your Column expressions to DAX see if that performs better.
Can the merge be moved into the model as a relationship? If you need to see it a single item then you can use DAX to combine tables e.g. NATURALINNERJOIN(<leftJoinTable>, <rightJoinTable>) then hide the 2 source tables.
Other options are to do some preprocessing outside of M. I have a really really nasty file with 25000 columns and I do preprocessing in R before loading a nice neat set of files into PowerBI.
Hi @stretcharm
Thanks very much for the suggestion to use R. With your inspiration, we are now developing a solution by creating a SQL database importing the 5 flat files, completing the "wrapping up" of rows and any other manipulation and analysis in the database, and then importing the final product to Power BI to allow the use of the custom visuals. Thanks again. Cheers 🙂
Thanks heaps for the tips. I appreciate that the M code is rather difficult to decipher so thanks heaps for taking the time to look at it. I'll trying performing some of the transformations using DAX and see if that helps. Cheers and thanks
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
124 | |
89 | |
84 | |
70 | |
51 |
User | Count |
---|---|
206 | |
146 | |
97 | |
79 | |
69 |