Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
sevenhills
Super User
Super User

Import Report output file

Hi

I have snapshot report output files stored in a folder. Files in the folder:

   ReportID_100 12012016.txt

   ...

   ReportID_100 11012019.txt

   ReportID_100 12012019.txt

   ReportID_100 01012020.txt

   ...

 

Each file is like as below:

 

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/01/2019        Report Page 1

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Radiology                  1                     0                     1

Outpatient                 1                     0                     1

Pathology                  0                     0                     0

Surgery                    1                     0                     1

Medicine                   1                     0                     1

Orthopedics                1                     0                     1

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 2

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Adult/Family Medicine      1                     0                     1

Dermatology                1                     0                     1

Gastroenterology           1                     0                     1

...

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 3

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

Urology                  1                     0                     1

Pulmonary Medicine       1                     0                     1

...

Total:                  11                    10                   222 
#END OF REPORT#

 

I am having multiple pages data, page numbers (dynamic) and date range in each file. I am trying to bring and model the data.

 

a) Any tips on modeling of importing this kind of files?

b)I am able to get the data from each file and parse. But, how to identify the header rows in between pages and strip them of? (as each file has first few rows dynamic and also comes in between pages)

 

 

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 2

                                         11/01/2019 - 11/30/2019

Prescriptions Group by Antibiotics by Department

Prescription       PSCP 2 Count	    PSCP 3 Count	PSCP 4 and above Count
---------------------------------------------------------------------------------

 

 

Thanks in advance

3 REPLIES 3
BA_Pete
Super User
Super User

Hi @sevenhills ,

 

I'm unable to go into too much detail around your specific scenario as it will require some quite creative transformations, I think. However, your starting point should be to use either the Folder or SharePointFolder data sources in Power Query. Once you've selected your folder with all these reports in it, you will get the option to Combine & Transform them:

BA_Pete_0-1636536655741.png

 

This will allow you to select an example file (ideally one that includes all the issues you'd like to fix across the group), and perform transformations on it that will be automatically replicated across all the files in that folder, before they are all finally appended together into a single query.

 

While I can't specifically advise on the actual transformations you would need to do, my guess would be that you would use basic filters on certain words to remove header rows, use 'Fill Up' and 'Fill Down' functions to move values up and down to rows that aren't going to be filtered out, and functions like Text.Contains etc. to identify certain words/phrases to be retained.

Again, I think you're going to have to get pretty creative to knock these into a workable shape, but this is the process you need to complete what you require.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thanks for your reply. 

 

My question was related to and adding more details:

a) Any tips on modeling of importing this kind of files?

 

More details:

  • I have 100,000 files for over 150 reports, in one linux folder. I got them to Azure.  
  • I created power bi data flows, one per report type and contains multiple queries in each dataflow
  • Note: we can have all in one data flow instead of one per report type. But decided to go one flow per report type.

b)I am able to get the data from each file and parse. But, how to identify the header rows in between pages and strip them of? (as each file has first few rows dynamic and also comes in between pages)

More details:

  • Before posting and asking in the forum, I am able to do the combine and transform files per report.
  • Each snapshot report output file has header rows in between pages
  • Issue is dynamicness involved in each file to process the header rows in each page. I want to strip them.
  • After stripping all the header and footer rows, it will be easy create data as columns and use for visualization.

Hope this clarifies the needs 🙂 

 

Thanks

For (b), using power query, I solved it as below

* Step 1: Analyze and Identified the sets as (hard coded for each report)

           sets: page header, table header, footer row, page end row ... 

           This is more for categorizing 

 

* Step 2:

       Get data all the files, and line number in each file

File NameRow Number
File 11
File 12
File 13
File 1
File 21
File 22
File 23
File 2
File 31
File 32
File 33
File 3

For each file, go through each row and based on the data identified first row of each set 

 

* Step 3: Remove the dynamic text involved in these rows like removing page number.  say, 

 

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 2

 

as

 

ReportID_100    Report Name: "Antibiotics Group"       Report Date: 12/06/2019        Report Page 

 

 

* Step 4: look the whole table with these category rows and mark the first row (and last row of each set) and  create summary of rows for each file as that need to be deleted as

 

Delete Row Details as

File NameSet NameRow BeginRow End
File 1Page Header16
File 1Table Header79
File 1Page Header2227
File 1Table Header2831
...   
File 1Report End10111011

 

* Expand by adding custom column to the table:

{[Row Begin]..[Row End]}

 

* Later did left anti join to get only data rows.

 

🙂Hope it helps for others later

 

(moving this code logic to dataflow, which is outside this post, FYI)

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.