Import Report output file

sevenhills · ‎11-09-2021

Hi

I have snapshot report output files stored in a folder. Files in the folder:

ReportID_100 12012016.txt

...

ReportID_100 11012019.txt

ReportID_100 12012019.txt

ReportID_100 01012020.txt

...

Each file is like as below:

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/01/2019        Report Page 1

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Radiology                  1                     0                     1

Outpatient                 1                     0                     1

Pathology                  0                     0                     0

Surgery                    1                     0                     1

Medicine                   1                     0                     1

Orthopedics                1                     0                     1

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 2

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Adult/Family Medicine      1                     0                     1

Dermatology                1                     0                     1

Gastroenterology           1                     0                     1

...

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 3

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Urology                  1                     0                     1

Pulmonary Medicine       1                     0                     1

...

Total:                  11                    10                   222 
#END OF REPORT#

I am having multiple pages data, page numbers (dynamic) and date range in each file. I am trying to bring and model the data.

a) Any tips on modeling of importing this kind of files?

b)I am able to get the data from each file and parse. But, how to identify the header rows in between pages and strip them of? (as each file has first few rows dynamic and also comes in between pages)

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 2

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Thanks in advance

BA_Pete · ‎11-10-2021

Hi @sevenhills ,

I'm unable to go into too much detail around your specific scenario as it will require some quite creative transformations, I think. However, your starting point should be to use either the Folder or SharePointFolder data sources in Power Query. Once you've selected your folder with all these reports in it, you will get the option to Combine & Transform them:

This will allow you to select an example file (ideally one that includes all the issues you'd like to fix across the group), and perform transformations on it that will be automatically replicated across all the files in that folder, before they are all finally appended together into a single query.

While I can't specifically advise on the actual transformations you would need to do, my guess would be that you would use basic filters on certain words to remove header rows, use 'Fill Up' and 'Fill Down' functions to move values up and down to rows that aren't going to be filtered out, and functions like Text.Contains etc. to identify certain words/phrases to be retained.

Again, I think you're going to have to get pretty creative to knock these into a workable shape, but this is the process you need to complete what you require.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

sevenhills · ‎11-10-2021

Thanks for your reply.

My question was related to and adding more details:

a) Any tips on modeling of importing this kind of files?

More details:

I have 100,000 files for over 150 reports, in one linux folder. I got them to Azure.
I created power bi data flows, one per report type and contains multiple queries in each dataflow
Note: we can have all in one data flow instead of one per report type. But decided to go one flow per report type.

b)I am able to get the data from each file and parse. But, how to identify the header rows in between pages and strip them of? (as each file has first few rows dynamic and also comes in between pages)

More details:

Before posting and asking in the forum, I am able to do the combine and transform files per report.
Each snapshot report output file has header rows in between pages
Issue is dynamicness involved in each file to process the header rows in each page. I want to strip them.
After stripping all the header and footer rows, it will be easy create data as columns and use for visualization.

Hope this clarifies the needs 🙂

Thanks

sevenhills · ‎11-24-2021

For (b), using power query, I solved it as below

* Step 1: Analyze and Identified the sets as (hard coded for each report)

sets: page header, table header, footer row, page end row ...

This is more for categorizing

* Step 2:

Get data all the files, and line number in each file

File Name	Row Number
File 1	1
File 1	2
File 1	3
File 1	…
File 2	1
File 2	2
File 2	3
File 2	…
File 3	1
File 3	2
File 3	3
File 3	…

For each file, go through each row and based on the data identified first row of each set

* Step 3: Remove the dynamic text involved in these rows like removing page number. say,

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 2

as

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page

* Step 4: look the whole table with these category rows and mark the first row (and last row of each set) and create summary of rows for each file as that need to be deleted as

Delete Row Details as

File Name	Set Name	Row Begin	Row End
File 1	Page Header	1	6
File 1	Table Header	7	9
File 1	Page Header	22	27
File 1	Table Header	28	31
...
File 1	Report End	1011	1011

* Expand by adding custom column to the table:

{[Row Begin]..[Row End]}

* Later did left anti join to get only data rows.

🙂Hope it helps for others later

(moving this code logic to dataflow, which is outside this post, FYI)

Import Report output file

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Party with Power BI’s own Guy in a Cube

Import Report output file

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025