Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
I'm working with a data reporting platform that provides outputs as HTML documents. I'd like to import those documents into Power BI to work with the data.
Each document has a series of forms and subforms (think workbook and worksheets) which are produced using HTML such as the following:
<div id='formID' class='title formTitle'>
<hr>FORM NAME<hr>
</div>
<div class='title miniTitle'>SubForm_1_Name</div>
<table class='topTable'>
<thead>
<tr class='underlined'>
<th scope='col' class='underlined'>Field</th>
<th scope='col' class='underlined'>Column1</th>
<th scope='col' class='underlined'>Column2</th>
<th scope='col' class='underlined'>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td class='label labelWidth'>Field_Value</td>
<td class='number numberWidth'>Column1_Value</td>
<td class='number numberWidth'>Column2_Value</td>
<td class='number numberWidth'>Total_Value</td>
</tr>
</tbody>
</table>
<div class='title miniTitle'>SubForm_2_Name</div>
<table> ... </table>
<div class='title miniTitle'>SubForm_3_Name</div>
<table> ... </table>
As you can see, the only place the name of the forms and subforms appears is in seperate divs before each table. Using the "Web" connector, its easy to get the actual content of the tables. Does anyone know of a way I can access the names of the forms?
For completeness, the eventual goal is to unpivot everything and wind up with a data structure like:
Form_Name | Subform_Name | Field | Column_Name | Value |
I don't anticipate callenges with the unpivot portion as soon as I can figure out how to access the Form and Subform names.
Thanks for your help!
Solved! Go to Solution.
Hi @JrBIAnalyst
Open the file as text, filter for the titles then extract between delimiters to remove any html.
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("d:\temp\forms.htm"), null, null, 1252)}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "miniTitle") or Text.Contains([Column1], "<hr>")),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, ">", "<"), type text}})
in
#"Extracted Text Between Delimiters"
regards
Phil
Proud to be a Super User!
Hi @JrBIAnalyst
Open the file as text, filter for the titles then extract between delimiters to remove any html.
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("d:\temp\forms.htm"), null, null, 1252)}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "miniTitle") or Text.Contains([Column1], "<hr>")),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, ">", "<"), type text}})
in
#"Extracted Text Between Delimiters"
regards
Phil
Proud to be a Super User!
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
14 | |
13 | |
10 | |
8 | |
7 |
User | Count |
---|---|
17 | |
13 | |
7 | |
6 | |
6 |