Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!

Reply
JMil_2988
Regular Visitor

Advice on how to structure horrific data

Hi there! 
Long-Time self taught user here, but recently started a position where I'm extremely limited in what's available for me to use, and some absolutely horrific raw data and no data science/engineering team so completely alone and overwhelmed.

My main knowledge is modelling through dataflows, ETL raw exports from Excel (stored in sharepoint) into a dataflow gen1, and modelling in PBI Service, then bringing dataflows into PBI Desktop, visualising and publishing. I'd have access to some databases with beautifully clean data which was a pleasure to work with and also gained some basic to intermediate SQL knowledge from.

Since starting at my new position, I'm left with un-delimited .txt files, which I've a macro I every morning I run to convert ~6 into a .xlsx and store into a sharepoint. There are approx 5000 of these .xlsx files which I've cleaned (via vba) so headers are the same, and appended into a dataflow with an incremental refresh which currently is working OK, but there are more folders of 5000+ files in different areas of sharepoint which I'm appending and incrementally refreshing etc, but the problem occurs when I need to make an amendment, whether in PBI service or the pbix itself, takes tooooo long to refresh each time! Currently there's approximately 5 sources, each with @~5000 files of 100,000 rows, so we're talking of about 2.5 billion lines which sounds insane, but each row is required, because for example there could be 50 lines relating to one job, each line describing a different visit to that job, what happened, who completed/abandoned work etc, all stuff I need for future analysis. They're all needed as it's about 2 years of data I need to analyse to see progress and failure over the course of two years, and also gague a forecast for multiple criteria. 

I just feel like I've lost all knowledge I had, and feel that I'm potentially going about making things speed up a little, because sometimes I'll make an amendment in the morning, and it won't finish refreshing untill late evening which is just stupidly inefficient. Are there any tips people could provide without requiring additional licenses to anything, which I've been told isn't a possibility? I do need to share access to a colleague who's working with me on projects, so would require shared access kinda tips if possible.

 

Thanks in advance!

1 ACCEPTED SOLUTION
Mauro89
Solution Sage
Solution Sage

Hi @JMil_2988,

 

I would like to give you a honest answer. Your situation sounds horrible 😅

 

You're dealing with a data infrastructure problem, not just a Power BI challenge. Here's how I would approach this strategically:

 

Document the business impact: Calculate the actual cost - if you're spending 8+ hours daily waiting for refreshes, that's wasted salary. Present this to management with specific numbers.

Make the case for proper tools: A basic Azure SQL Database costs less than $100/month - far cheaper than your lost productivity. 

Risk assessment: What happens when this breaks? When you leave? Document the fragility of the current approach and the business continuity risk.

Propose a phased solution: Start with the most critical analysis using a subset of data, prove the value, then justify proper investment for the full solution.

 

You're being set up to fail with inadequate resources. 

 

Best regards!

PS: If you find this post helpful consider leaving kudos or mark it as solution

View solution in original post

5 REPLIES 5
v-prasare
Community Support
Community Support

Hi @JMil_2988,

We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.

 

 

Thank you for your patience and look forward to hearing from you.
Best Regards,
Prashanth Are
MS Fabric community support

ChielFaber
Solution Specialist
Solution Specialist

I agree with @Mauro89 , but I want to add some words of encouragment. Keep your spirits up and don't lose faith in your own ability.  The fact that your actively thinking about how to improve the infastructure is very positive. 

 

Try to work on the points Mauro89 mentioned and see this as a learning opportunity to enhance your skills to advocate for a better infrastructure.

 


[Tip] Keep CALM and DAX on.
[Solved?] Hit “Accept as Solution” and leave a Kudos.
[About] Chiel | SuperUser (2023–2) |

❤️ Thank you for the kind words and inspiration. I knew it was going to be a challenge prior to starting, but wasn't aware it was going to be this challenging! But you're right, spirits will be kept high and thanks to the replies to this post, I've a way to approach going forward now. Thank you.

Mauro89
Solution Sage
Solution Sage

Hi @JMil_2988,

 

I would like to give you a honest answer. Your situation sounds horrible 😅

 

You're dealing with a data infrastructure problem, not just a Power BI challenge. Here's how I would approach this strategically:

 

Document the business impact: Calculate the actual cost - if you're spending 8+ hours daily waiting for refreshes, that's wasted salary. Present this to management with specific numbers.

Make the case for proper tools: A basic Azure SQL Database costs less than $100/month - far cheaper than your lost productivity. 

Risk assessment: What happens when this breaks? When you leave? Document the fragility of the current approach and the business continuity risk.

Propose a phased solution: Start with the most critical analysis using a subset of data, prove the value, then justify proper investment for the full solution.

 

You're being set up to fail with inadequate resources. 

 

Best regards!

PS: If you find this post helpful consider leaving kudos or mark it as solution

Thank you for that insight! It's constructive and definitely will be something to potentially improve the situation going forward 🙂 Thanks again!

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! It's time to submit your entry.

January Power BI Update Carousel

Power BI Monthly Update - January 2026

Check out the January 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.