Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
saturated_snail
Frequent Visitor

Extracting all related materials from description strings

Sample File

PBIX file: Sample File 

 

Overview

I have 1 table (50k rows) with [Material], [Description] and [Creation Date].

 

I need to:

1) Extract the related materials from the description column. The description may contain some (not all) related versions of the material within its string. The description can also contain multiple related materials. These related materials also appear elsewhere in the table, in the material column.

MaterialDescriptionCreation DateStep 1
FA37000DISC-PRINT HEAD,IG148V XP-6000(FA37020)23.08.2017FA37020
FA37010DISC-PRINT HEADIG148V-1 XP-6000[FA37000]24.10.2019FA37000
FA37020PRINT HEAD,IG148V-2[FA37000]30.07.2019FA37000

 

2) Find unique related materials. This is the complicated bit I need help with, it is like a massive web of related materials! There is no limit to how many related materials there are and no reliable rule to determine how/when they appear in the description. eg material (a) could link to part (b), which in turn links to (c), which in turn links (d) etc... meaning (a)(b)(c)(d)... are all related.

saturated_snail_0-1656370341605.png

 

3) Sort resulting list by [Creation Date]. If created on same date, sort by a-z.

Step 3
FA37000, FA37020, FA37010

 

4)On my original table, add a column to lookup results of step3, returning blank if no related materials found.

Sample data:

 

MaterialDescriptionCreation DateExpected Result
1489361DISC-PULLEY MID /B-11 FRAME A4(1540962)16.01.20151489361, 1489366, 1489370, 1540962, 1588714
1489366DISC-HOLDER DU /B-11 DU A1(1540962)16.01.20151489361, 1489366, 1489370, 1540962, 1588714
1489370DISC-SHAFT PULLEY /B-11 FRAME (1540962)16.01.20151489361, 1489366, 1489370, 1540962, 1588714
1540962DISC-PARTS HOLDER [1489366](1588714)16.01.20151489361, 1489366, 1489370, 1540962, 1588714
1588714PARTS HOLDER DU ASSY SP[1540962]16.01.20151489361, 1489366, 1489370, 1540962, 1588714
2204648DISC-SMF1;EU1Android TV(2208947)19.11.20192204648, 2208947, 2209340
2208947DISC-SMF1;EU2[2204648](2209340)28.07.20202204648, 2208947, 2209340
2209340SMF1;EU3[2208947]22.02.20212204648, 2208947, 2209340
F179000DISC-LFP PRINT HEAD(F179010)16.01.2015F179000, F179010, F179020, F179030
F179010DISC-LFP PRINT HEAD,[F179000](F179020)16.01.2015F179000, F179010, F179020, F179030
F179020DISC-LFP PRINT HEAD,-[F179010](F179030)06.08.2015F179000, F179010, F179020, F179030
F179030RES LFP PRINT HEAD,I0G65V-3[F179020]15.03.2016F179000, F179010, F179020, F179030
FA37000DISC-PRINT HEAD,IG148V XP-6000(FA37020)23.08.2017FA37000, FA37020, FA37010
FA37010DISC-PRINT HEADIG148V-1 XP-6000[FA37000]24.10.2019FA37000, FA37020, FA37010
FA37020PRINT HEAD,IG148V-2[FA37000]30.07.2019FA37000, FA37020, FA37010
1001346DETEC ASSY PULL TRA16.01.2015 
1001728OIL PAD PLAIN UPPER16.01.2015 
1001740COMPRESSION SPRING 64516.01.2015 
1001108TRACTOR LEFT[1010272]16.01.2015 

 

What I've tried

Power Query:

 Queries:

Material Master - Final table to actually load with [Material], [Description], [Creation Date] and [Expected Results]

 

Material List - Temporary table with list of [Material] from original source. This is a filtered version of the original material list as I exclude certain materials for business purposes.

 

Output - This contains 1 column, [Custom], which is the result of extracting related materials from the description and concatenating it with the original material and then removing duplicates.

Custom

1489361, 1540962
1489366, 1540962
1489370, 1540962
1489366, 1540962, 1588714
1540962, 1588714
2204648, 2208947
2204648, 2208947, 2209340
2208947, 2209340
F179000, F179010
F179000, F179010, F179020
F179010, F179020, F179030
F179020, F179030
FA37000, FA37020
FA37000, FA37010

 

Iterate Function - A function that iterates through the Output table above to find all related materials until it can no longer find any more.

This is only usable on very small datasets. When using this on 50k rows, I gave up waiting when I left it overnight and it still had not refreshed by morning.

 

 

(Tbl as table) =>
let
    x = Table.RowCount(Tbl),
    y =
        Table.Buffer(
            Table.Distinct(
                Table.RemoveColumns(
                    Table.AddColumn(
                        Table.RemoveColumns(
                            Table.Group(
                                Table.ExpandTableColumn(
                                    Table.NestedJoin(
                                        Table.ExpandListColumn(
                                            Table.AddColumn(
                                                Tbl,
                                                "Custom.1",
                                                each Text.Split(
                                                    [Custom],
                                                    ", "
                                                )
                                            ),
                                            "Custom.1"
                                        ),
                                        {"Custom.1"},
                                        Table.ExpandListColumn(
                                            Table.AddColumn(
                                                Tbl,
                                                "Custom.1",
                                                each Text.Split(
                                                    [Custom],
                                                    ", "
                                                )
                                            ),
                                            "Custom.1"
                                        ),
                                        {"Custom.1"},
                                        "Expanded Custom.1",
                                        JoinKind.LeftOuter
                                    ),
                                    "Expanded Custom.1",
                                    {"Custom"},
                                    {"Expanded Custom.1.Custom"}
                                ),
                                {"Custom"},
                                {{"Result",
                                each Text.Combine(
                                    [Expanded Custom.1.Custom],
                                    ", "
                                ),
                                type nullable text}}
                            ),
                            {"Custom"}
                        ),
                        "Custom",
                        each Text.Combine(
                            List.Sort(
                                List.Distinct(
                                    Text.Split(
                                        [Result],
                                        ", "
                                    )
                                )
                            ),
                            ", "
                        )
                    ),
                    {"Result"}
                )
            )
        ),
    z = Table.RowCount(y),
    Output =
        if z = x
        then y
        else 
            let Tbl = y
            in
            @#"Iterate Function"(Tbl)
in
    Output

 

 

 

Invoked Function - Result of the Iterate Function. This can then be sorted and linked back to the Material Master query.

Custom

1489361, 1489366, 1489370, 1540962, 1588714
2204648, 2208947, 2209340
F179000, F179010, F179020, F179030
FA37000, FA37010, FA37020

 

DAX:

I've attempted to write DAX queries to replace the custom function in power query. I  managed to replicate certain steps (like merges), which action 100x faster than in power query, suggesting this may be the better route. However I do not have enough knowledge to be able to do this.

 

Issue Summary

The function I used to loop through my data to get the desired output is not practical as it takes too long to run on larger datasets. How can I achieve my desired outcome without waiting potentially days? Is a DAX solution bettter/possible?

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi  @saturated_snail ,

Here are the steps you can follow:

1. Create calculated column.

Step1 =
var _search1=
    SEARCH("(",'Material Master'[Description],1,BLANK())
var _search2=
    SEARCH("[",'Material Master'[Description],1,BLANK())
    var _if1=
IF(
    _search1=BLANK(),BLANK(),
MID('Material Master'[Description],_search1+1,7))
var _if2=
IF(
    _search2=BLANK(),BLANK(),
MID('Material Master'[Description],_search2+1,7))
var _switch=
SWITCH(
    TRUE(),
    _if1= BLANK() ,_if2,
    _if2=BLANK(), _if1,
CONCATENATE(_if2, CONCATENATE("、", _if1))
)
return
_switch
Rank123 =
  VAR __Item =[Description]
  VAR __Text = CONCATENATEX('Material Master',[Description],"|")
  VAR __Count = PATHLENGTH(__Text)
  VAR __Table =
    ADDCOLUMNS(
      GENERATESERIES(1,__Count,1),
      "__Item",PATHITEM(__Text,[Value])
    )
  VAR __TableFinal =
    SUMMARIZE(__Table,[__Item],"Index",MINX(FILTER(__Table,[__Item]=EARLIER([__Item])),[Value]))
RETURN
  MINX(FILTER(__TableFinal,[__Item] = __Item),[Index])
group =
LEFT(
    'Material Master'[Material],2)
Step2.1 =
CONCATENATEX(
    FILTER(ALL('Material Master'),'Material Master'[Rank123]<=EARLIER('Material Master'[Rank123])+1&&'Material Master'[group]=EARLIER('Material Master'[group])),
    [Material],
 CONCATENATE("、", " "))
Step2.2 =
CALCULATE(MAX('Material Master'[Step2.1]),FILTER(ALL('Material Master'),
'Material Master'[group]=EARLIER('Material Master'[group])&&'Material Master'[Rank123]=MAXX(FILTER(ALL('Material Master'),'Material Master'[group]=EARLIER('Material Master'[group])),[Rank123])))

2. Result:

vyangliumsft_0-1656577582710.png

 

Best Regards,

Liu Yang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

View solution in original post

1 REPLY 1
Anonymous
Not applicable

Hi  @saturated_snail ,

Here are the steps you can follow:

1. Create calculated column.

Step1 =
var _search1=
    SEARCH("(",'Material Master'[Description],1,BLANK())
var _search2=
    SEARCH("[",'Material Master'[Description],1,BLANK())
    var _if1=
IF(
    _search1=BLANK(),BLANK(),
MID('Material Master'[Description],_search1+1,7))
var _if2=
IF(
    _search2=BLANK(),BLANK(),
MID('Material Master'[Description],_search2+1,7))
var _switch=
SWITCH(
    TRUE(),
    _if1= BLANK() ,_if2,
    _if2=BLANK(), _if1,
CONCATENATE(_if2, CONCATENATE("、", _if1))
)
return
_switch
Rank123 =
  VAR __Item =[Description]
  VAR __Text = CONCATENATEX('Material Master',[Description],"|")
  VAR __Count = PATHLENGTH(__Text)
  VAR __Table =
    ADDCOLUMNS(
      GENERATESERIES(1,__Count,1),
      "__Item",PATHITEM(__Text,[Value])
    )
  VAR __TableFinal =
    SUMMARIZE(__Table,[__Item],"Index",MINX(FILTER(__Table,[__Item]=EARLIER([__Item])),[Value]))
RETURN
  MINX(FILTER(__TableFinal,[__Item] = __Item),[Index])
group =
LEFT(
    'Material Master'[Material],2)
Step2.1 =
CONCATENATEX(
    FILTER(ALL('Material Master'),'Material Master'[Rank123]<=EARLIER('Material Master'[Rank123])+1&&'Material Master'[group]=EARLIER('Material Master'[group])),
    [Material],
 CONCATENATE("、", " "))
Step2.2 =
CALCULATE(MAX('Material Master'[Step2.1]),FILTER(ALL('Material Master'),
'Material Master'[group]=EARLIER('Material Master'[group])&&'Material Master'[Rank123]=MAXX(FILTER(ALL('Material Master'),'Material Master'[group]=EARLIER('Material Master'[group])),[Rank123])))

2. Result:

vyangliumsft_0-1656577582710.png

 

Best Regards,

Liu Yang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors