Solved: Re: Missing data when loading files from a folder

ripstaur · ‎02-25-2021

I have used the "Folder" option in "Get Data" with a lot of success before. I am attempting to build a report now that combines several files, all in exactly the same format (I was very careful about this). I have tried using "Load" and "Transform and Load" options, but the same thing happens...the columns all come in, but some of the data is missing.

I have tried looking at each file before I load them all, and the data are all there when I look at them individually. However, when I load the entire folder, some of the data is missing again.

Any ideas? One note - I cannot convert the individual worksheets to tables, because I need the dynamic headers to make the user inputs work correctly.

For now, I am just going to append. It's not the best or most efficient way to go, though, so if anyone has run across this problem (and a solution), it would be great! Thanks in advance.

PhilipTreacy · ‎02-25-2021

Hi @ripstaur

Your query steps may be inadvertently excluding some rows of data. The query editor only previews a max of 1000 rows. If there's data beyond that it can get left out of your steps exclude it. What rows are missing? What's the data in them? Check your steps to see what they are doing and make sure they don't exclude anything.

For testing purposes, you can also try creating a separate query for each file (rather than using the From Folder connector), load the file and then check if all your expected data is there. At least that may give you some pointers as to what rows are missing,and indicate why this is happening.

Not sure what you mean by I cannot convert the individual worksheets to tables, because I need the dynamic headers to make the user inputs work correctly - it's better to work with Excel tables. Maybe you could create the Excel tables from the ranges where this data is being input?

Regards

Phil

Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.

Blog :: YouTube Channel :: Connect on Linkedin

Proud to be a Super User!

View solution in original post

ripstaur · ‎02-26-2021

Phil,

Thanks for giving me some things to check..Your second paragraph helped a lot. Here's what ended up somehow working: I went in and loaded each file separately in PBI. I learned a couple of things:

1. I had some spurious hidden data in one of them (this seems to have been what kept the file load from working). So I went back into each file and deleted 3 (blank) rows from beneath the array I wanted to use.

2. If you have an array where some columns contain no data (only a header), PBI won't recognize your headers as headers.

After deleting those rows in each file, I went back to PBI and used the folder option. This time it loaded. I just had to click on "use first row as headers" to promote the header, then everything worked fine.

ripstaur · ‎02-26-2021

BTW, what I meant about the table is this: The data files in the folder are results from a questionnairre, which is sent out to respondents as an Excel workbook. They provide their answers on one sheet, and then the answers are compiled/simplified in another worksheet. The completed questionnaire workbooks are kept in the folder, and the compilation worksheets are what I select when I load the data from the folder. Each compilation sheet contains an array of 12 columns and 11 rows. This particular questionnaire is sent to up to 10 individuals.

The headers in the compilation worksheet come from formulas which bring product model names in from another sheet. I tried inserting a table, but I get an error telling me that the headers have to be text and cannot be formulas.

Best regards,

Rip

PhilipTreacy · ‎02-26-2021

Hi @ripstaur

Yes Excel table headers must be text. But if you were using tables you'd have avoided this particular issue with hidden data/blank rows.

You can create the tables with generic column names and then after import to PBI, rename the columns.

Regards

Phil

Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.

Blog :: YouTube Channel :: Connect on Linkedin

Proud to be a Super User!

PhilipTreacy · ‎02-25-2021

Hi @ripstaur

Your query steps may be inadvertently excluding some rows of data. The query editor only previews a max of 1000 rows. If there's data beyond that it can get left out of your steps exclude it. What rows are missing? What's the data in them? Check your steps to see what they are doing and make sure they don't exclude anything.

For testing purposes, you can also try creating a separate query for each file (rather than using the From Folder connector), load the file and then check if all your expected data is there. At least that may give you some pointers as to what rows are missing,and indicate why this is happening.

Not sure what you mean by I cannot convert the individual worksheets to tables, because I need the dynamic headers to make the user inputs work correctly - it's better to work with Excel tables. Maybe you could create the Excel tables from the ranges where this data is being input?

Regards

Phil

Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.

Blog :: YouTube Channel :: Connect on Linkedin

Proud to be a Super User!

SterlingTX · ‎04-16-2024

I recently experienced this when using the From Folder process in Excel and imagine the same would occur in Power BI when trying to consolidate Excel files where the first few rows contain non-column header information. In my case, the first two rows of data contained a report name and other ancillary information which sounds similar to your own. The third row contained column headers with data following thereafter. In my case, only the first column generated null values for all files other than the Sample File and the ancillary info in the first two rows was in the first column; this is a clue to the issue.

If the files were saved as CSV, the Folder process worked with no issues. If the content from the first two rows were removed where the first available row with data is the column headings, that resolved the issue as well. However, I did not want to need to manually touch any of the files in order for the process to work. It occurred to me that every time I have used the from Folder process with Excel files, the files always had the first-row containing column headers. Again, another clue. Here is what I discovered.

In the Transform Sample File associated with your query, check to see if it has a Promoted Headers step at the end as seen in the second pic.

My guess is you probably do which is what is creating this issue.

It seems that Power Query is expecting the first row of data in Excel to contain column headings. This works most of the time, however, when the first few rows contains ancillary information it promotes something that is not a usable column header. For this reason in my case, it explains why the Sample File had values in the first column but none of the subsequent files did because to do so, they would need their first row to match the promoted header shown above.

Therefore, remove the Promoted Headers step in the Transform Sample File. This will leave you with the first column being Column1 which can then align correctly with all other files Column1.

When you remove Promoted Headers, your query may now generate an error like below.

Now, simply remove the Changed Type step that is looking for the previously promoted Column1 or remove the portion of the formula that is looking for the column that is no longer there.

In retrospect, if there had been other non-header information on row 1 in columns 2, 3, etc., we could expect null values to be present in all those columns as well for all files except the sample file.

Hope this helps!

Sterling @ Ember Analytics