Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Query freezes while web scraping

Hi Guys,

 

I am trying to retrive a hierarchy beween P/N from a Website by scraping data each individual P/N html page.

 

I have a list of all the P/N that are active with their name and their unique #id on an Excel spreadsheet like the table below :

P/N - Nameid
Cylinder head12
Rod (bearing)35
liner type B27

 

The html pages of each P/N are build like this : https://<my_domain>/<random_name>/PN=id 

and on each page I can easily retrive the P/N that are directly linked to this P/N (see function Get_hierarchy)

On the table below, I would have scrapped the P/N Actuator (in green) which would then have given me the 3 sub components (in blue).

levelP/N - Name
0Actuator
1Cylinder
1Rod
1Sealing system

 

I would then need to retrive the ID of each sub component to issue the same request for each of those P/N and get the list of P/N that are directly linked to each each sub component. The end table will look a bit like this :

levelP/N - Nameid
0Actuator3
1Cylinder46
2Cylinder body78
2Cylinder head12
1Rod33
2Rod (bar)35
2Rod (bearing)40
1Sealing system112

 

Everything is working fine with the code below except when I have more that 8 row to scrap on the same level where the Query freezes and I can't get a decent result. Issue being, I have at least 100.000 P/N to scrap on 15 hierarchy levels so I can't do it manualy.

 

Code on the Hierachy_full table (for the 1st 2 levels but the remaining lignes are similar) : 

 

let
Source = <Top P/N with its ID>
#"Fonction personnalisée appelée" = Table.AddColumn(Source, "N-1", each Get_hierarchy([N.ID])),
#"N-1 développé" = Table.ExpandTableColumn(#"Fonction personnalisée appelée", "N-1", {"P/N"}, {"N-1.P/N"}),
#"Requêtes fusionnées" = Table.NestedJoin(#"N-1 développé", {"N-1.P/N"}, #"P/N actifs", {"P/N"}, "P/N actifs", JoinKind.LeftOuter),
#"P/N actifs développé" = Table.ExpandTableColumn(#"Requêtes fusionnées", "P/N actifs", {"P/N"}, {"N-1.ID"}),
#"Fonction personnalisée appelée1" = Table.AddColumn(#"P/N actifs développé", "N-2", each #"Get_hierarchy"([#"N-1.ID"])),
#"N-2 développé" = Table.ExpandTableColumn(#"Fonction personnalisée appelée1", "N-2", {"P/N"}, {"N-2.P/N"}),
in
#"N-2 développé"

 

And the code of the function Get_hierarchy :

 

let
Source = (#"ID" as any) => let
Source = Web.Page(Web.Contents("https://<my_domain>/<random_name>/PN= " & #"id")),
Data = Source{2}[Data],
#"Type modifié" = Table.TransformColumnTypes(Data,{{"Column1", type text}, {"Column2", type text}}),
#"Diviser la colonne selon les transitions de caractères" = Table.SplitColumn(#"Type modifié", "Column2", Splitter.SplitTextByCharacterTransition({"a".."z"}, {"A".."Z"}), {"Column2.1", "Column2.2"}),
#"Colonnes renommées" = Table.RenameColumns(#"Diviser la colonne selon les transitions de caractères",{{"Column2.1", "P/N"}}),
#"Autres colonnes supprimées" = Table.SelectColumns(#"Colonnes renommées",{"P/N"})
in
#"Autres colonnes supprimées"
in
Source

 

Does any of you have an idea of what I did wrong or where does the problem could came from ?

And how to fix it of course.

 

Many thanks,

Romuald.

1 REPLY 1
lbendlin
Super User
Super User

Is it possible that the web server is blocking your scraping/slowing you down because you cause too much traffic for them? Maybe you should ask them if they would be willing to give you a data extract.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors