March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
The current Finance role that I am in involves standardizing reports for a wide audience to use. I have developed multiple Power BI reports that others are using, and I want to have a data repository for all current and future Power BI reports. The current location of my data is on my personal computer, but I want the data to be available for other analysts to grab when they want to develop their own Power BI reports/analysis. Unfortunately, I cannot connect directly to SAP HANA (IT will now allow it) but instead have to query the data out of SAP HANA using Analysis for Microsoft Excel and connect to flat files/workbooks that hold the data.
I am currently trying to decide on the best location to house the data for the Power BI reports, and I am currently looking into using SharePoint, SQL database, a shared server, or Azure. I think Azure is the least likely option due to cost and the required involvement of IT, but I still want to include it as an option, nonetheless.
Is there someone that can give the pros and cons of each (to use as a data repository) for the organization’s Power BI reports? I have never used SQL or Azure as a data connection in Power BI, so any information would be helpful. It is just me that will be maintaining this data repository, so I am trying to keep it simple but efficient. Ideally, I would not be adding a week’s working hours as a database administrator on top of my current role. Below are what I have so far as my options:
SharePoint- This is the option I am leaning towards mostly as of now.
Pros
-It would not require an on-premises data connection for Power BI Service.
-Can use a macro to refresh the queries held in each excel workbook that would be held on SharePoint and then refresh the reports in Power BI Service
-The team I work for manages the SharePoint site for the organization
-It would not require additional cost unless we start to get close to the storage capacity of our SP site
Cons
-Takes a long time to bring data into Power BI Desktop and slower refresh connecting to an online data source
-Can potentially be a lot of excel workbooks and folders
SQL database
Pros
-Can use Select statements within Power BI reports to gather only the data I would need instead of pulling in an entire dataset (Not a huge benefit I think but a benefit, regardless)
-Additional capabilities in Power BI because of connection to database? (Direct Query?)
-Can create relationships between datasets (Not sure if I would use/need this feature)
Cons
-Would have to use an on-premises data connection for Power BI Service. Would maybe require a virtual machine to run scheduled refreshes.
-Would require the skill and time to manage the database
Shared Server
Pros
-Would be faster than using SharePoint (Online data repository)
Cons
-Would have to use an on-premises data connection for Power BI Service. Would maybe require a virtual machine to run scheduled refreshes.
Azure- Do not know too much about this option.
Pros
-Can handle a lot of data (More than a SQL db I believe)
-Additional capabilities with Power BI?
-Would I need an on-premises data connection in Power BI Service for this, since the data is housed in the cloud?
Cons
-Most complex solution
-Costly
-Would most likely require assistance of IT
Solved! Go to Solution.
There are definitely plenty of ways to store data. If you are currently building reports off of a bunch of Excel files on your local hard drive, then probably the simplest approach is to save those same Excel files to the cloud (e.g. SharePoint) instead so you don't need to worry about data gateways or converting Excel to SQL tables. Not ideal for huge amounts of data (maybe setting up intermediate Power BI dataflows would help some?) but should be fine for lighter loads.
A couple points from my experience: Any kind of flat file store isn't going to have query folding--the Power Query engine will have to read the whole file to load in the data, which can be slow and run into timeouts.
SharePoint file storage can run into throttling, trying to ingest somewhere after 100 files, SharePoint will throttle your connection and your refresh will fail.
IMO a database is best: Azure SQL DB if you can get it provisioned and pay for it; on-prem SQL-based if you have an always-on computer to host the gateway.
@jeffshieldsdev Can't I use the incremental refresh ability to pull in only the necessary files/workbooks I need to refresh from SharePoint? This way I would not have to refresh the 300+ workbooks I am appending into one dataset (similar to what query folding does... I think).
If your data is partitioned that way, I think you can tweak the M code to use the RangeStart and RangeEnd parameters to control which folders/files you're reading from.
EDIT: similar approach https://www.youtube.com/watch?v=x7q1DHf8wE4
Thank you @jeffshieldsdev . I can see this being very useful, however I would still need to refresh the dataflow AND the dataset if using Import instead of Direct Query within my report. That being the case, I would probably need to set up Power Automate flows to refresh the dataset after the dataflow is done refreshing. Can you confirm this or see a better way of going about using dataflows within a report?
@jeffshieldsdev I understand it is possible, but I am asking myself is it worth it. I would rather refresh one source instead of refreshing everything twice (Dataflow + Dataset). Seems like simpler approach. And Direct Query seems too slow to be considered the best option.
I see. Some reasons for splitting dataset into dataset and dataflows:
@astanfo Azure blob or file storage is going to be like exponentially cheaper than using SQL to store your data. Plus, if you already have the data in CSV or workbook files, you can just dump them into Azure storage versus any kind of import process for SQL.
There are definitely plenty of ways to store data. If you are currently building reports off of a bunch of Excel files on your local hard drive, then probably the simplest approach is to save those same Excel files to the cloud (e.g. SharePoint) instead so you don't need to worry about data gateways or converting Excel to SQL tables. Not ideal for huge amounts of data (maybe setting up intermediate Power BI dataflows would help some?) but should be fine for lighter loads.
@AlexS The problem I see with dataflows is the need to refresh the dataflow AND the dataset if looking to update a report for a new month. There is the capability to use direct query, but it can really slow down the report is what it seems (not best for end-user experience) and there is also not the option to view the data in the "Data" view (under report view). Other than that, I do like the idea of having dataflows because it makes the data already prepped and ready for someone to grab. I just have a hard time getting passed the long refresh for imported data (dataflow + dataset).
I use this approach often, and do all my transformation in dataflows, so the dataset is just reading in the output of dataflows. One of my larger models is 8 GB and takes an hour to refresh from dataflows.
@astanfo There are a ton of options here. OneDrive would work (basically SharePoint) or an Azure SQL DB instance or Azure SQL Managed Instance would eliminate the need for a on-prem gateway. Azure blob storage or Azure file storage, cheap, doesn't require an on-prem data gateway. Tons of ways of doing it. Pros and cons, you have the basics. But, why wouldn't the other analysts just be connecting Live to your Power BI dataset? Then you don't have to worry about access to the source files.
Thank you @Greg_Deckler ! Are there many additional benefits of using Azure versus a SQL db?
The problem I see with connecting to a Power BI dataset is there is not the option to view the data in a table format, you can only view the report and model. When slicing and dicing data in a multitude of ways, I think it is good to have this option so the analyst can fully understand everything they just pulled into their report. Another reason is that analysts may not want to connect to an entire Power BI dataset, but instead want to only use one table within the model. Pulling in an entire dataset (maybe 12 tables or more) would be uneccessary.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |