Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
SignorSoprano
Helper I
Helper I

Appending queries containing random number generation is super slow (never finishes)

I am trying to append multiple queries together as one big table. There are 6 tables to append and 5 of them are created by randomly choosing rows from the first table and then doing some minor transformations to a couple of columns afterwards.

 

My main table query is this:

 

 

 

let
      Source = Locations
    , AddRowForEachDepth =
        Table.AddColumn(
            Source
          , "DepthDesignations"
          , each dimDepth[LetterDesignation]
        )
    , ExpandDepths =
        Table.ExpandListColumn(
            AddRowForEachDepth
          , "DepthDesignations"
        )
    , AddedSampleType =
        Table.AddColumn(
            ExpandDepths
          , "SampleType"
          , each "REG"
        )
in
    AddedSampleType

 

 

 

(Locations is the result of a query just pulling in a two column table existing in the Excel workbook.) The additional tables built from this one [above] by choosing random rows are as such:

 

 

 

let
      Source = generatedSampleIds_Reg
    , AddSampleType =
        Table.ReplaceValue(
            Source
          , "REG"
          , "FD"
          , Replacer.ReplaceText
          , {"SampleType"}
        )
    , numRows = Table.RowCount(AddSampleType)
    , randNums =
        List.Transform(
            List.Random(
                Number.RoundDown((0.1 * numRows))
              , 1324654366
            )
          , each Number.RoundDown((_) * numRows)
        )
    , AddIndex =
        Table.AddIndexColumn(
            AddSampleType
          , "Index"
          , 1
          , 1
        )
    , MarkSampling =
        Table.AddColumn(
            AddIndex
          , "TheChosen"
          , each List.Contains(randNums, [Index])
          , Logical.Type
        )
    , ChosenSamples = Table.SelectRows(MarkSampling, each ([TheChosen] = true))
    , RemoveColumns = Table.RemoveColumns(ChosenSamples,{"Index", "TheChosen"})
in
    RemoveColumns

 

 

 

I then combine all 6 of the tables and do some grouping (for sequential numbering starting from 1 for each grouping) like this:

 

 

 

let
Source =
Table.Combine(
{
generatedSampleIds_Reg
, generatedSampleIds_FD
, generatedSampleIds_CF
, generatedSampleIds_MS
, generatedSampleIds_MSD
, generatedSampleIds_ER
}
)
, GroupedParcels =
Table.Group(
Source
, {"ppin"}
, {
{
"GroupedParcels"
, each _
, type table [ppin=nullable number, locationcode=nullable text, DepthDesignations=text, SampleType=text]
}
}
)
, AddIndex =
Table.AddColumn(
GroupedParcels
, "GroupedCount"
, each
Table.AddIndexColumn(
[GroupedParcels]
, "Index"
, 1
, 1
)
)
, ExpandGroupedCount =
Table.ExpandTableColumn(
AddIndex
, "GroupedCount"
, {
"ppin"
, "locationcode"
, "DepthDesignations"
, "SampleType"
, "Index"
}
, {
"PPIN"
, "LocationCode"
, "DepthDesignations"
, "SampleType"
, "Index"
}
)
, RemoveHelperColumns = Table.RemoveColumns(ExpandGroupedCount, {"ppin", "GroupedParcels"})
, SetDataType =
Table.TransformColumnTypes(
RemoveHelperColumns
, {
{"PPIN", type text}
, {"LocationCode", type text}
, {"DepthDesignations", type text}
, {"SampleType", type text}
, {"Index", Int64.Type}
}
)
, AddSampleNumber =
Table.AddColumn(
SetDataType
, "SampleNumber"
, each (if [SampleType] = "ER" then "W" else "C") & Text.PadStart(Text.From([Index]), 4, "0")
)
, AddSampleId =
Table.AddColumn(
AddSampleNumber
, "SampleId"
, each [LocationCode] & "-" & [DepthDesignations] & "C" & "-" & [SampleNumber] & "-" & [SampleType]
)
in AddSampleId

 

 

 

This query has been running for over 30 minutes and hasn't finished. The Locations table has 26k rows and the first query built from this table multiplies that by 4, so that one has 104k rows. The other 5 queries are only 5 to 10% of that 104k, so this is not a large amount of data by any means. Why is this this query not finishing? Does it have to do with the random number generation or the way I am filtering the table by the random numbers?

 

Thanks for any help.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2 REPLIES 2
v-xinruzhu-msft
Community Support
Community Support

Hi @SignorSoprano 

You can consider to use Table.Buffer to store the table data in memory, you can refer to the following link about the function.

Table.Buffer - PowerQuery M | Microsoft Learn

 

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Yea. I have seen many a person suggest this and I don't know if it actually works for most people, but every time I have used this buffering, it slows down my queries tremendously! It seems useless in my use cases. But, just for kicks, I tried it and it does nothing to help speed up my query.

 

Perhaps I should move the work to a Python script?

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors