Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.

Reply
Antmkjr
Post Patron
Post Patron

PDF header inconstistency across pages

Hi ,

 

I have a pdf document in which my data spans across multiple tables in multiple pages. But the header in different pages , even though it appears exactly similar structure in the pdf document, when pulled in power query , they are bit inconsistent.

Example shown: is the header in Page 1 and Page 14. Page 14 is having null in Column 8 Vs D= A+B+C as header in Page 1.

 

Antmkjr_0-1736620640248.png

there are even more differences like this, how do I make all headers in all pages consistent .

8 REPLIES 8
v-shamiliv
Community Support
Community Support

Hi @Antmkjr 
I hope this information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If this answers your question, please Accept it as a solution and give it a 'Kudos' so others can find it easily.
Thank you.

v-shamiliv
Community Support
Community Support

Hi @Antmkjr 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

v-shamiliv
Community Support
Community Support


Hi  @Antmkjr 
Thank you for reaching out to the Microsoft Fabric Community Forum!

We just wanted to check in and see if your issue has been resolved. If it has, it would be great if you could mark the reply that helped as the accepted solution. This can be really helpful for other community members who might face similar issues in the future.

If you have any other questions or need further assistance, please feel free to reach us.

Thank you

Anonymous
Not applicable

True but now that you know that you are going to have all of the columns, you can just (before you expand the tables) you can either add a column or transform the existing table column like

 

= Table.AddColumn(PriorStepOrTableName, "Ordered", each Table.ReorderColumns(Table.PromoteHeaders([TableColumnName]), {"Column1", "Column2", "Column3"}, MissingFields.Ignore))

 

So you have to sort by number of columns, then promote the headers of the tables, and finally re-order the columns. Then expand the tables. 

 

The purpose of sorting descending by number of columns then promoting headers is to make sure that the table with the most columns can be at the top of the list of tables, thus ensuring that there are names for all of your columns when you expand.

 

--Nate

Hi Nate,

I am not sure if I understood you correctly. But in actually scenario there are around more than 10 pages, and unwanted columns(Columns with null values) can appear at any position. how does the reordering technique work when the position of columns with null value are uncertain? Would you be so kind to explain further.

 

The table with most columns -- there are more columns because those are unwanted columns which has to be removed. Eg: Column 8 in Page 14 is extra and is unwanted. So I dont understand the need for sorting also.

Antmkjr_0-1736692364582.png

 

Anonymous
Not applicable

It appears that you don't have any header rows. Were they taken out in the transform process or do they not exist? I would first select only the tables, and get rid of the pages. Then you should be able to promote the headers. Then you can choose the column names that you want to keep. Having the table with the greatest number of columns in the first row means that none of the subsequent tables' column names will be missing. After that, try to filter out any tables that you don't need--you know how sometimes on PDFs there will be some random three column table that you don't want as part of your dataset. Assuming the data you need is in table format in the PDFs, you should be able to filter to just the tables you need. Sometimes it's a pattern like every third table--you'll figure that part out. My point is that you need to get some headers in those tables before you expand them.

If there are missing fields when you reorder, the MissingFields.Ignore parameter handles that.

 

These are the steps I always take when combining PDFs.

 

--Nate

Anonymous
Not applicable

Before you expand the tables. You have to sort the tables by number of columns greatest to least. so before you expand the tables, add a custom column using each Table.ColumnCount. Then sort, the resulting column, descending. Then expand your columns.

 

--Nate

By doing this , I will be able to see that there is a difference in the no. of columns . But im still not able to align the coloums in the correct place.

Just showing expected output:

Antmkjr_0-1736656520917.png

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors