The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi,
I'm trying to get 13 years of financial data for Apple from this page https://www.macrotrends.net/stocks/charts/AAPL/apple/income-statement .
When I'm accessing the web page in browser I'm able to see almost all data:
But when I'm trying to scrap de data using PowerBI I only get the first 6 years of data.
Although in browser I see details from 2021 back to 2009, in PowerBI I see only 2021 to 2016
Is there a solution to be able to take at least 10 years of data?
Solved! Go to Solution.
If you look in the HTML code, there's a section that looks like
var originalData = [{"field_name": ...bunch of data... }];
This contains all of the data in the table and is in a JSON format, so we can write the full query along these lines:
let
#"HTML Code" = Web.BrowserContents("https://www.macrotrends.net/stocks/charts/AAPL/apple/income-statement"),
#"Split Text" = "[" & Text.BetweenDelimiters(List.Single(List.Select(Text.Split(#"HTML Code", "var "), each Text.StartsWith(_, "originalData = "))),"[","]") & "]",
#"Parsed JSON" = Json.Document(#"Split Text"),
#"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}, {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Column1", {{"field_name", each Text.BetweenDelimiters(_, ">", "<"), type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Extracted Text Between Delimiters", {"field_name"}, "Date", "Value"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Date", type date}, {"Value", Currency.Type}})
in
#"Changed Type"
This results in a nice clean unpivoted tabular format.
If you look in the HTML code, there's a section that looks like
var originalData = [{"field_name": ...bunch of data... }];
This contains all of the data in the table and is in a JSON format, so we can write the full query along these lines:
let
#"HTML Code" = Web.BrowserContents("https://www.macrotrends.net/stocks/charts/AAPL/apple/income-statement"),
#"Split Text" = "[" & Text.BetweenDelimiters(List.Single(List.Select(Text.Split(#"HTML Code", "var "), each Text.StartsWith(_, "originalData = "))),"[","]") & "]",
#"Parsed JSON" = Json.Document(#"Split Text"),
#"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}, {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Column1", {{"field_name", each Text.BetweenDelimiters(_, ">", "<"), type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Extracted Text Between Delimiters", {"field_name"}, "Date", "Value"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Date", type date}, {"Value", Currency.Type}})
in
#"Changed Type"
This results in a nice clean unpivoted tabular format.
When you need to scrap data from tables which are constructed from html pages using <div> tags, then situation becomes challenging. Html.Table function is used by Power BI to extract data from children and it can go to a depth of 7 children only. That is the limit. Hence, only 7 columns are shown and you will not get columns beyond first 7 columns.