Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hi Guys,
I am trying to retrive a hierarchy beween P/N from a Website by scraping data each individual P/N html page.
I have a list of all the P/N that are active with their name and their unique #id on an Excel spreadsheet like the table below :
| P/N - Name | id |
| Cylinder head | 12 |
| Rod (bearing) | 35 |
| liner type B | 27 |
The html pages of each P/N are build like this : https://<my_domain>/<random_name>/PN=id
and on each page I can easily retrive the P/N that are directly linked to this P/N (see function Get_hierarchy)
On the table below, I would have scrapped the P/N Actuator (in green) which would then have given me the 3 sub components (in blue).
| level | P/N - Name |
| 0 | Actuator |
| 1 | Cylinder |
| 1 | Rod |
| 1 | Sealing system |
I would then need to retrive the ID of each sub component to issue the same request for each of those P/N and get the list of P/N that are directly linked to each each sub component. The end table will look a bit like this :
| level | P/N - Name | id |
| 0 | Actuator | 3 |
| 1 | Cylinder | 46 |
| 2 | Cylinder body | 78 |
| 2 | Cylinder head | 12 |
| 1 | Rod | 33 |
| 2 | Rod (bar) | 35 |
| 2 | Rod (bearing) | 40 |
| 1 | Sealing system | 112 |
Everything is working fine with the code below except when I have more that 8 row to scrap on the same level where the Query freezes and I can't get a decent result. Issue being, I have at least 100.000 P/N to scrap on 15 hierarchy levels so I can't do it manualy.
Code on the Hierachy_full table (for the 1st 2 levels but the remaining lignes are similar) :
let
Source = <Top P/N with its ID>
#"Fonction personnalisée appelée" = Table.AddColumn(Source, "N-1", each Get_hierarchy([N.ID])),
#"N-1 développé" = Table.ExpandTableColumn(#"Fonction personnalisée appelée", "N-1", {"P/N"}, {"N-1.P/N"}),
#"Requêtes fusionnées" = Table.NestedJoin(#"N-1 développé", {"N-1.P/N"}, #"P/N actifs", {"P/N"}, "P/N actifs", JoinKind.LeftOuter),
#"P/N actifs développé" = Table.ExpandTableColumn(#"Requêtes fusionnées", "P/N actifs", {"P/N"}, {"N-1.ID"}),
#"Fonction personnalisée appelée1" = Table.AddColumn(#"P/N actifs développé", "N-2", each #"Get_hierarchy"([#"N-1.ID"])),
#"N-2 développé" = Table.ExpandTableColumn(#"Fonction personnalisée appelée1", "N-2", {"P/N"}, {"N-2.P/N"}),
in
#"N-2 développé"
And the code of the function Get_hierarchy :
let
Source = (#"ID" as any) => let
Source = Web.Page(Web.Contents("https://<my_domain>/<random_name>/PN= " & #"id")),
Data = Source{2}[Data],
#"Type modifié" = Table.TransformColumnTypes(Data,{{"Column1", type text}, {"Column2", type text}}),
#"Diviser la colonne selon les transitions de caractères" = Table.SplitColumn(#"Type modifié", "Column2", Splitter.SplitTextByCharacterTransition({"a".."z"}, {"A".."Z"}), {"Column2.1", "Column2.2"}),
#"Colonnes renommées" = Table.RenameColumns(#"Diviser la colonne selon les transitions de caractères",{{"Column2.1", "P/N"}}),
#"Autres colonnes supprimées" = Table.SelectColumns(#"Colonnes renommées",{"P/N"})
in
#"Autres colonnes supprimées"
in
Source
Does any of you have an idea of what I did wrong or where does the problem could came from ?
And how to fix it of course.
Many thanks,
Romuald.
Is it possible that the web server is blocking your scraping/slowing you down because you cause too much traffic for them? Maybe you should ask them if they would be willing to give you a data extract.
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 19 | |
| 10 | |
| 9 | |
| 8 | |
| 7 |