Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
I am migrating an Alteryx workflow with an iterative macro to a Dataflow Gen 2 equivalent, utilizing a delta lake table from the lakehouse as the data source.
This function carries out the required transformations on the table; however, the table's structure - such as column names, the number of columns, the order of columns, and the number of rows - remains unchanged. The transformation only impacts the underlying data within the table. Here is the function:
Suppose the name of this function is fxMyFunc
(tbl as table) =>
let
Source = tbl,
AddedCustom = Table.TransformColumnTypes(
Table.AddColumn(
Source,
"On-Hand_Inventory_Quantity_Inter",
each
if [RecordID] = 1 then
[#"On-Hand_Inventory_Quantity"]
else
[Transaction_Quantity__NOS_] + Source[#"On-Hand_Inventory_Quantity"]{[RecordID] - 2}
),
{{"On-Hand_Inventory_Quantity_Inter", type number}}
),
RemovedCols = Table.RemoveColumns(AddedCustom, {"On-Hand_Inventory_Quantity"}),
AddedCustom1 = Table.TransformColumnTypes(
Table.AddColumn(
RemovedCols,
"On-Hand_Inventory_Quantity",
each
if [RecordID] = 1 then
[#"On-Hand_Inventory_Quantity_Inter"]
else
List.Sum(
List.Range(RemovedCols[#"On-Hand_Inventory_Quantity_Inter"], 1, [RecordID] - 1)
)
),
{{"On-Hand_Inventory_Quantity", type number}}
),
RemovedCols1 = Table.RemoveColumns(
AddedCustom1,
{"On-Hand_Inventory_Quantity_Inter", "Inbound/Outbound_Value_Adjustment"}
),
AddedCustom2 = Table.TransformColumnTypes(
Table.AddColumn(
RemovedCols1,
"Inbound/Outbound_Value_Adjustment",
each
if [Type_of_Entry__RECPT_SUB_TYP_CD_] = "TOUT" then
RemovedCols1[Unit_Cost]{[RecordID] - 2} * [Transaction_Quantity__NOS_]
else
[#"Inbound/Outbound_Value"]
),
{{"Inbound/Outbound_Value_Adjustment", type number}}
),
AddedCustom3 = Table.TransformColumnTypes(
Table.AddColumn(
AddedCustom2,
"On-Hand_Inventory_Value_Inter",
each
if [RecordID] = 1 then
[#"On-Hand_Inventory_Value"]
else
[#"Inbound/Outbound_Value_Adjustment"]
+ AddedCustom2[#"On-Hand_Inventory_Value"]{[RecordID] - 2}
),
{{"On-Hand_Inventory_Value_Inter", type number}}
),
RemovedCols2 = Table.RemoveColumns(AddedCustom3, {"On-Hand_Inventory_Value"}),
#"Added custom 4" = Table.TransformColumnTypes(
Table.AddColumn(
RemovedCols2,
"On-Hand_Inventory_Value",
each
if [RecordID] = 1 then
[#"On-Hand_Inventory_Value_Inter"]
else
List.Sum(List.Range(RemovedCols2[#"On-Hand_Inventory_Value_Inter"], 1, [RecordID] - 1))
),
{{"On-Hand_Inventory_Value", type number}}
),
#"Removed columns 3" = Table.RemoveColumns(
#"Added custom 4",
{"On-Hand_Inventory_Value_Inter", "Unit_Cost"}
),
#"Added custom" = Table.TransformColumnTypes(
Table.AddColumn(
#"Removed columns 3",
"Unit_Cost",
each
if [#"On-Hand_Inventory_Quantity"] = 0 then
0
else
[#"On-Hand_Inventory_Value"] / [#"On-Hand_Inventory_Quantity"]
),
{{"Unit_Cost", type number}}
),
#"Rounded off" = Table.TransformColumns(
#"Added custom",
{{"Unit_Cost", each Number.Round(_, 2), type number}}
),
#"Reordered columns" = Table.ReorderColumns(
#"Rounded off",
{
"RecordID",
"Posted_Date",
"Type_of_Entry__RECPT_SUB_TYP_CD_",
"Transaction_Quantity__NOS_",
"Inbound/Outbound_Value",
"Inbound/Outbound_Value_Adjustment",
"Unit_Cost",
"On-Hand_Inventory_Quantity",
"On-Hand_Inventory_Value"
}
)
in
#"Reordered columns"
The iterative macro in Alteryx can repeat a transformation 57 times, corresponding to the maximum value of the [RecordID] column, and it does this within seconds.
Similarly, I attempted to replicate this process using the List.Generate() function in PowerQuery. I'm performing the transformation 5 times for testing purposes, although I need to execute it 57 times. Below is the M-query for this operation:
Suppose the name of this function is OutputQuery
let
Source = List.Generate(
() => [x=1, y=fxMyFunc(BaseTable)],
each [x] <= 5,
each [x = [x] + 1, y = fxMyFunc([y])],
each [y]
)
in
Source
Accessing Source{0} in the Navigation step takes approximately 17 to 18 seconds. However, accessing Source{1} can take up to 10 minutes, after which Dataflow Gen2 indicates that the evaluation has stopped. It is unclear why there is such a significant time difference just to access the subsequent step, Source{1}.
Here is the query for BaseTable:
let
Source = Lakehouse.Contents(null){[workspaceId = "xxxxxx-xxx-xxxx-xxx"]}[Data]{[lakehouseId = "xxxx-xxxx-xxx-xxx-xx-xx"]}[Data],
Navigation = Source{[Id = "MyTableName", ItemKind = "Table"]}[Data]
in
Navigation
To provide context regarding the dataset, it comprises merely 57 rows and 9 columns. You may refer to the fxMyFunc to ascertain the specific column names and the transformations being applied.
I have attempted recursion in addition to List.Generate(), but it also proved unhelpful. Could you assist me with this? I am uncertain whether I am making a mistake or facing optimization issues. Your guidance would be appreciated.
Thanks,
Udit
Solved! Go to Solution.
Instead of wrapping all of your Table.AddColumns with Table.TransformColumnTypes, you can just end your Table.AddColumns with ", type number". In other words, define your column type as you make the column.
That being said, there's a lot of work going on for each of these custom columns.
--Nate
Thank you, @lbendlin, for your response to this topic.
I understand the pitfalls of recursion. Initially, I attempted it, but soon realized it was not efficient for even 57 rows of data, which led me to choose List.Generate(). Although I have used List.Buffer() in some instances, I never fully explored it because there was no necessity. However, it appears this is the situation where it might be required.
Here is a sample dataset with 57 rows for your reference (This is the DeltaLake Table, the BaseTable) :
I would greatly value your assistance.
Thanks,
Udit
Table.Buffer seems to work just fine
Thanks @lbendlin
I had not envisioned the solution being so straightforward. The capabilities of Table.Buffer() and List.Buffer() appears to surpass my expectations. I am eager to understand their workings and the reasons for their superior efficiency over standard operations.
Could you share resources such as blogs or videos to help me understand these functions?
Thanks,
Udit
You can assume that Power Query runs "on disk", interacting with storage media for each iteration. In contrast, Power BI (DAX) for the most part runs "in memory", ie much faster. Table.Buffer loads data into memory for faster re-use.
Recursion carries (literally carries) a penalty of having to lug your results around. Try to avoid if possible.
Read about the buffering functions like Table.Buffer and List.Buffer and evaluate if you can apply them to your scenario.
Can you provide a sample of the Delat lake table?