Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
clongaw
Frequent Visitor

Custom Connector ignores XML Fields that appear in later results when not using XML Schema

 I have  a Custom Connector built that uses an API to bring in XML results. However, the XML I'm returned only contains Fields for data that has a value. For example, if I have Field1, Field2, and Field3, my XML may look like:

<XML>
<Row Field1="Test" Field3="NewTest/>
<Row Field1="Test" Field2="SecondTest" Field3="FinalTest"/>
</XML>

 This works if I have the Fields designated in first few calls, but if my first several pages don't have a field value for Field2, but page 5 has a result with this value, Power BI ignores the value and doesn't create a column for this. Is this some setting I can turn off in Power BI, or is this something I need to alter in my code?

 

Here's my handling for the XML results and pagination:

 

// This function grabs a single page using the API
// XML is parsed with the startRow and pageSize variables to determine which records to detect
GetPage = (url as text, xml as text, company as text, pageSize as number, startRow as number) as table =>
    let
        // There have been issues automatically UrlEncoding, so it is being done in a manual function
        adjustedxml = ModifiedParams(xml, pageSize, startRow),        
        textResult = Text.FromBinary(Web.Contents(url,
                    [
                        Headers=[#"accept"="*/*",
                            #"Company"=company
                        ],
                        Content=Text.ToBinary(adjustedxml)
                    ])),

        cleanData = Text.Replace(textResult, "&nbsp;", " "),
        Data = Xml.Tables(cleanData),
        NullTest = Data{0}?,
        TableInfo = if (NullTest = null) then
            Table.FromRows({})
        else Data{0}[Table],

        // Adapt XML to new startRow to determine startpoint as well as potentially setting pageSize
        ModifiedParams = (inputXmlText as text, pageSize as number, startRow as number) as text =>
            let
                modifiedXml = 
                if (startRow <> 1) then
                    Text.Replace(inputXmlText, "RowLimitsFirst Value=""" & Text.From(startRow - pageSize) & """", "RowLimitsFirst Value=""" & Text.From(startRow) & """")
                else
                    inputXmlText
            in
                modifiedXml,

        // Encoding the XML special characters
        UrlEncode = (xmlText as text) as text =>
            let
                encodedText = Text.Replace(xmlText, "&", "%26"),
                encodedText1 = Text.Replace(encodedText, "<", "%3C"),
                encodedText2 = Text.Replace(encodedText1, ">", "%3E"),
                encodedText3 = Text.Replace(encodedText2, "/", "%2F"),
                encodedText4 = Text.Replace(encodedText3, """", "%22"),
                encodedText5 = Text.Replace(encodedText4, " ", "%20"),
                encodedText6 = Text.Replace(encodedText5, "=", "%3D"),
                encodedText7 = Text.Replace(encodedText6, "#", "%23"),
                encodedText8 = Text.Replace(encodedText7, "?", "%3F")
            in
                encodedText8
    in
        TableInfo meta [startRow = startRow + pageSize, Xml = adjustedxml];

// Read all pages of data.
// Table.GenerateByPage will keep asking for more pages until we return null or </XML>.
// Function declaration starts at first page, then keeps iterating prevset until null
GetAllPages = (url as text, xml as text, company as text, pageSize as number) as table =>
    let
        Source = Table.GenerateByPage(
            (previous) =>
                let
                    // startRow determines which row we start on
                    // This is recorded in MetaData from the previous results
                    newStartRow = if (previous = null) then 1 else Value.Metadata(previous)[startRow],

                    // inputXml is preserved each time before encoding for easier value replacement
                    // This is recorded in MetaData from the previous results
                    inputXml = if (previous = null) then xml else Value.Metadata(previous)[Xml],

                    // Obtain a new page of records if the previous was null (first page) or the previous was a full call
                    // This allows returning null quicker in the case that the previous call was only a partial page
                    //page = if (previous = null) or Table.RowCount(previous) = pageSize then GetPage(url, inputXml, company, pageSize, newStartRow) else null
                    page = 
                        if (previous = null) or (Table.RowCount(previous) = pageSize and not Table.IsEmpty(previous)) then 
                            let
                                newPage = GetPage(url, inputXml, company, pageSize, newStartRow)
                            in
                                if Table.IsEmpty(newPage) then null else newPage
                        else 
                            null
                in 
                    page
        )
    in 
        Source;

// The getNextPage function takes a single argument and is expected to return a nullable table
Table.GenerateByPage = (getNextPage as function) as table =>
    let        
        listOfPages = List.Generate(
            () => getNextPage(null),            // get the first page of data
            (lastPage) => lastPage <> "<XML/>" and lastPage <> null,     // stop when the function returns <XML/> or null (Should always be null)
            // (lastPage) => lastPage <> null,     // stop when the function returns <XML/> or null (Should always be null)
            (lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
        ),
        // concatenate the pages together
        tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
        firstRow = tableOfPages{0}?
    in
        // if we didn't get back any pages of data, return an empty table
        // otherwise set the table type based on the columns of the first page
        if (firstRow = null) then
            Table.FromRows({})
        // check for empty first table
        else if (Table.IsEmpty(firstRow[Column1])) then
            firstRow[Column1]
        else
            Value.ReplaceType(
                Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
                Value.Type(firstRow[Column1])
            );

 

 

I would use an XML Schema to define these columns, but I do not have access to a schema and the schema is dynamic according to user setup.

1 ACCEPTED SOLUTION
clongaw
Frequent Visitor

Just in case this problem occurs for others, the solution is to create a list of all possible columns grabbed from all pages and use that as your column declaration. The previous code was using only the first page of results.

 // Filter out null and empty tables
        validPages = List.Select(listOfPages, each _ <> null and not Table.IsEmpty(_)),

        // Accumulate all unique column names from valid pages
        allColumns = List.Accumulate(
            validPages,
            {},
            (state, currentPage) => List.Union({state, Table.ColumnNames(currentPage)})
        ),

        // Function to add missing columns with null values
        AddMissingColumns = (table as table, columns as list) as table =>
            let
                currentColumns = Table.ColumnNames(table),
                missingColumns = List.Difference(columns, currentColumns),
                tableWithMissingColumns = List.Accumulate(
                    missingColumns,
                    table,
                    (tbl, col) => Table.AddColumn(tbl, col, each null)
                )
            in
                tableWithMissingColumns,

        // Add missing columns to each page
        standardizedPages = List.Transform(validPages, each AddMissingColumns(_, allColumns)),

        // Combine all pages into one table
        combinedTable = Table.Combine(standardizedPages),

        // Check if there are any pages and handle accordingly
        finalTable = 
            if (Table.IsEmpty(combinedTable)) then
                Table.FromRows({}, allColumns)  // Return an empty table with all columns if no data
            else
                combinedTable
    in
        finalTable;

View solution in original post

2 REPLIES 2
clongaw
Frequent Visitor

Just in case this problem occurs for others, the solution is to create a list of all possible columns grabbed from all pages and use that as your column declaration. The previous code was using only the first page of results.

 // Filter out null and empty tables
        validPages = List.Select(listOfPages, each _ <> null and not Table.IsEmpty(_)),

        // Accumulate all unique column names from valid pages
        allColumns = List.Accumulate(
            validPages,
            {},
            (state, currentPage) => List.Union({state, Table.ColumnNames(currentPage)})
        ),

        // Function to add missing columns with null values
        AddMissingColumns = (table as table, columns as list) as table =>
            let
                currentColumns = Table.ColumnNames(table),
                missingColumns = List.Difference(columns, currentColumns),
                tableWithMissingColumns = List.Accumulate(
                    missingColumns,
                    table,
                    (tbl, col) => Table.AddColumn(tbl, col, each null)
                )
            in
                tableWithMissingColumns,

        // Add missing columns to each page
        standardizedPages = List.Transform(validPages, each AddMissingColumns(_, allColumns)),

        // Combine all pages into one table
        combinedTable = Table.Combine(standardizedPages),

        // Check if there are any pages and handle accordingly
        finalTable = 
            if (Table.IsEmpty(combinedTable)) then
                Table.FromRows({}, allColumns)  // Return an empty table with all columns if no data
            else
                combinedTable
    in
        finalTable;
lbendlin
Super User
Super User

You need to alter that in your code, and you need to read about how to handle missing columns in Power Query.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.