Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Einomi
Helper V
Helper V

Delete Blank Columns in Nested Tables

Hi,

I have quickly searched on the forum but did not find what I need, I want to combine multiple PDF, but I prefer to write my own M code than using the UI, which is sometime limited and maybe my M code will be more performant than the UI (maybe)

Here is my M code so far

 

 

 

let
    Source = Folder.Files(Ffiles),
    FilterPDF = Table.SelectRows(Source, each ([Extension] = ".pdf")),
    BinaryToTables = Table.TransformColumns(FilterPDF, {"Content", each Pdf.Tables(_)}),
    KeepContent = Table.SelectColumns(BinaryToTables,{"Content"}),
    KeepPages = Table.TransformColumns(KeepContent,{"Content", each Table.SelectRows(_, each [Kind] = "Page")}),
    CombineData = Table.TransformColumns(KeepPages, {"Content", each Table.Combine(_[Data])}),
    SkipInvoice = Table.TransformColumns(CombineData,{"Content", each Table.Skip(_, each [Column1] <> "Invoice")}),
    ColumnCount = Table.AddColumn(SkipInvoice,"Columns", each List.Count (Table.ColumnNames([Content]))),
    SortRows = Table.Sort(ColumnCount,{{"Columns", Order.Ascending}})
in
    SortRows

 

 

 

I did a count on the columns of each table and it goes from 12 columns to 20 columns...
My approach is to remove all blank columns - and this is my question because I am not sure how to do it
then combine some columns together to get the same number of columns in each table

@lbendlin @dufoq3 @AlienSx @ManuelBolz @jennratten  @ronrsnfld @m_dekorte @Rickmaurinus 

 

 

thanks for your time and for your help

1 ACCEPTED SOLUTION

@Einomi my bad - wrong name after "in". I'll correct that soon. You may create blank query and place the following code in there (half of job is done by @dufoq3 - check his query as well, it's also works)

(tbl) => 
    [ReplaceBlankToNull = Table.TransformColumns(tbl, {}, each if _ = "" then null else _),
    profile = Table.Buffer(Table.Profile(ReplaceBlankToNull)),
    columns_to_delete = Table.SelectRows(profile, each [Count] = [NullCount])[Column],
    delete_columns = Table.RemoveColumns(ReplaceBlankToNull, columns_to_delete)][delete_columns]

name the query as you wish. E.g. "fxRemoveBlankColumns". This is now a custom function that removes all null/blank columns from the table. In your "main" column make the following statement: 

remove_blanks = Table.TransformColumns(ref_to_last_step, {"Content",  fxRemoveBlankColumns})

Let us know if it worked. 

View solution in original post

12 REPLIES 12
ManuelBolz
Responsive Resident
Responsive Resident

Hello @Einomi 

If my post helped you, please give me a 👍kudos and mark this post with Accept as Solution.

 

When it comes to my solutions, I don't focus 100% on performance optimization. I always try to develop a compromise between performance and "code understanding". @dufoq3´s code is probably more performant than my solution. Your "problem" is multiple file reads (unknown number of files). Therefore, I would generally advise you to choose a high-performance solution.

 

let
    Source = Folder.Files(Ffiles),
    FilterPDF = Table.SelectRows(Source, each ([Extension] = ".pdf")),
    Content = Table.TransformColumns(FilterPDF, {"Content", each Pdf.Tables(_)}),
    Columns = Table.SelectColumns(Content,{"Content"}),
    Rows = Table.TransformColumns(Columns,{"Content", each Table.SelectRows(_, each [Kind] = "Page")}),
    Combine = Table.TransformColumns(Rows, {"Content", each Table.Combine(_[Data])}),
    
    Function = (table as table) as table =>
        let
            NonEmptyColumns = List.Select(Table.ColumnNames(table), each List.NonNullCount(Table.Column(table, _)) > 0),
            Result = Table.SelectColumns(table, NonEmptyColumns)
        in
            Result,

    ColumnsRemove = Table.TransformColumns(Combine, {"Content", each Function(_)}),
    ColumnsList = List.Accumulate(ColumnsRemove[Content], {}, (state, current) => List.Union({state, Table.ColumnNames(current)})),
    ColumnsNormalize = Table.TransformColumns(ColumnsRemove, {"Content", each Table.SelectColumns(_, List.Intersect({Table.ColumnNames(_), ColumnsList}))}),
    CombinedTables = Table.Combine(ColumnsNormalize[Content]),
    Headers = Table.PromoteHeaders(CombinedTables, [PromoteAllScalars=true])
in
    Headers


Best regards from Germany
Manuel Bolz


🟦Follow me on LinkedIn
🟨How to Get Your Question Answered Quickly
🟩Fabric Community Conference
🟪My Solutions on Github

Hi @ManuelBolz 

I would like to try your solution, but when I paste your M code into my Excel file, it just refreshing without stopping and I cannot see if this solution works or not, I have only 50 pdf files at the moment, each of them is less than 1 MB

If you have any clue 😉 except the source step I did not change anything

ManuelBolz
Responsive Resident
Responsive Resident

Hello @Einomi 

in my solution the source step is relatively important. Your example shows that you are using a Power Query parameter here.

 

I didn't fully understand your error with my solution. Here are a few comments/ideas:
- Make sure the "Ffiles" parameter points to the correct folder,
- First add only 2 - 4 PDF files to the folder to test the code,
- Make sure none of these files are open (PFD viewer) while testing my solution. A file handler is usually active here and blocks access to the file.

 

If my post helped you, please give me a 👍kudos and mark this post with Accept as Solution.

 

 

Best regards from Germany
Manuel Bolz


🟦Follow me on LinkedIn
🟨How to Get Your Question Answered Quickly
🟩Fabric Community Conference
🟪My Solutions on Github

@ManuelBolz thanks for the comment

 

- I use a path in my query, I replaced it with a parameter for confidentialy purposes in the forum

- No PDF file is open

 

I will test again your solution and post the error message I get (it is French so I need to change the language of PQ then send the screenshot)

dufoq3
Super User
Super User

Hi @Einomi, you can delete blank columns this way:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WUtIBIUcIpRSrAxMBIkNUEScgNsJQZIwq4ozNIBOwSCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t]),
    // You can probably delete this step.
    ReplaceBlankToNull = Table.TransformColumns(Source, {}, each if _ = "" then null else _),
    RemoveBlankColumns = Table.SelectColumns(ReplaceBlankToNull,  
        [ a = Table.ColumnNames(ReplaceBlankToNull),
          b = List.Select(a, (x)=> List.NonNullCount(Table.Column(ReplaceBlankToNull, x)) > 0)
        ][b] )
in
    RemoveBlankColumns

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

or 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WUtIBIUcIpRSrAxMBIkNUEScgNsJQZIwq4ozNIBOwSCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t]),
    // You can probably delete this step.
    ReplaceBlankToNull = Table.TransformColumns(Source, {}, each if _ = "" then null else _),
    profile = Table.Buffer(Table.Profile(ReplaceBlankToNull)),
    columns_to_delete = Table.SelectRows(profile, each [Count] = [NullCount])[Column],
    delete_columns = Table.RemoveColumns(ReplaceBlankToNull, columns_to_delete)
in
    delete_columns

 

Thanks @AlienSx  but I must miss something, when I paste your code, I get a list of Column1, Column 2 and Column3 (or something similar I did not pay attention to the column numbers)

I want to delete the columns containing only null values from my nested tables that are in the "Content" column 😉

@Einomi my bad - wrong name after "in". I'll correct that soon. You may create blank query and place the following code in there (half of job is done by @dufoq3 - check his query as well, it's also works)

(tbl) => 
    [ReplaceBlankToNull = Table.TransformColumns(tbl, {}, each if _ = "" then null else _),
    profile = Table.Buffer(Table.Profile(ReplaceBlankToNull)),
    columns_to_delete = Table.SelectRows(profile, each [Count] = [NullCount])[Column],
    delete_columns = Table.RemoveColumns(ReplaceBlankToNull, columns_to_delete)][delete_columns]

name the query as you wish. E.g. "fxRemoveBlankColumns". This is now a custom function that removes all null/blank columns from the table. In your "main" column make the following statement: 

remove_blanks = Table.TransformColumns(ref_to_last_step, {"Content",  fxRemoveBlankColumns})

Let us know if it worked. 

I understood what the issue was, I assumed that some columns are only null because when I have previewed my nested tables, I was quite sure that that the preview is really the full data because my pdf contains maximum 2 pages and there is not much in them
So, I duplicated my query and expanded one of the tables and ideed there was one row at the bottom which was not null...
I'll test it and will come back to you, thanks all for your time

Hi @dufoq3 

 

thannks for your proposition, I have tried this and it does not seem to work, do I miss something ?

 

let
    Source = Folder.Files(Ffiles),
    FilterPDF = Table.SelectRows(Source, each ([Extension] = ".pdf")),
    BinaryToTables = Table.TransformColumns(FilterPDF, {"Content", each Pdf.Tables(_)}),
    KeepContent = Table.SelectColumns(BinaryToTables,{"Content"}),
    KeepPages = Table.TransformColumns(KeepContent,{"Content", each Table.SelectRows(_, each [Kind] = "Page")}),
    CombineData = Table.TransformColumns(KeepPages, {"Content", each Table.Combine(_[Data])}),
    SkipInvoice = Table.TransformColumns(CombineData,{"Content", each Table.Skip(_, each [Column1] <> "Invoice")}),
    ColumnCount = Table.AddColumn(SkipInvoice,"Columns", each List.Count (Table.ColumnNames([Content]))),
    SortRows = Table.Sort(ColumnCount,{{"Columns", Order.Ascending}}),
    BlankCol = Table.TransformColumns(SortRows, {"Content", each Table.SelectColumns(_,  
        [ a = Table.ColumnNames(_),
          b = List.Select(a, (x)=> List.NonNullCount(Table.Column(_, x)) > 0)
        ][b] )})
in
    BlankCol

 

I do not see your data, but maybe you don't have blank values as null. Try this:

 

let
    Source = Folder.Files(Ffiles),
    FilterPDF = Table.SelectRows(Source, each ([Extension] = ".pdf")),
    BinaryToTables = Table.TransformColumns(FilterPDF, {"Content", each Pdf.Tables(_)}),
    KeepContent = Table.SelectColumns(BinaryToTables,{"Content"}),
    KeepPages = Table.TransformColumns(KeepContent,{"Content", each Table.SelectRows(_, each [Kind] = "Page")}),
    CombineData = Table.TransformColumns(KeepPages, {"Content", each Table.Combine(_[Data])}),
    SkipInvoice = Table.TransformColumns(CombineData,{"Content", each Table.Skip(_, each [Column1] <> "Invoice")}),
    ColumnCount = Table.AddColumn(SkipInvoice,"Columns", each List.Count (Table.ColumnNames([Content]))),
    SortRows = Table.Sort(ColumnCount,{{"Columns", Order.Ascending}}),
    BlankCol = Table.TransformColumns(SortRows, {"Content", each Table.SelectColumns(_,  
        [ a = Table.ColumnNames(_),
          b = List.Select(a, (x)=> List.NonNullCount(List.Select(Table.Column(_, x), (y)=> Text.Trim(y) <> "")) > 0)
        ][b] )})
in
    BlankCol

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

@dufoq3 

 

I copied paste your query into my PQ, and unfortunately as shown in the picture, Column18 and Column20 are still around 🤔

 

Null.png

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors