The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
I have data extractions in .CSV format saved to a SharePoint folder. Dataflows pull the data from the SharePoint folder and perform transformations. PowerBi then connects to dataflows, adds some DAX and job's done.
I have to rely on extractions and can't get a direct connection to our Database. I am wondering if there is a more efficient way to do it.
Depends on your definition of efficient - what is the database source? Usually the way this is handled is using an ETL tool to put the data into some kind of SQL database. SharePoint will work in a pinch if you don't have a data engineer to integrate it in a data warehouse/mart/lake/whatever, but you'll start running into issues when you get into anything in the hundreds of thousands of rows or more.
The dataset sources are CSV extractions from our system saved in a SharePoint folder and then fed into Dataflows where transformations happen before connecting to PBIX.
My question is if this would be more effecient than the current set up. My resoning is that the Query Folding is not possible with CSV Files so if there's a Database between extractions and Dataflows it would speed up the process.
Current: Extraction > SharePoint Folder > Dataflows > PowerBi
Proposal: Extractions > SharePoint Folder > Database > Dataflows > PowerBI
Could a Datamart function as a Database to feed Dataflows?
I mean the source of the CSVs. Usually what people do with external systems is go:
System database -> data integration tool w/ scheduled ETL -> organizational database -> Power BI
SharePoint is an unneccessary step unless you have no other place to put it (e.g. you don't have a data engineer). The number of rows will probably drive whether it needs to go in a database or not... there will probably be performance issues with the SharePoint connector when you start getting around 100k+ rows. If you can get it in a database, you can connect directly to that - the dataflow is a bit superflous in the latter case.
HI @Brianii ,
In my mind, he goal is always to connect directly to the source. But, as stated, you can't do that. So, getting the CSV files is probably best method.
However, my suggestions would be:
Why can't you do the DAX in the dataflow?
Why can't you link directly to the CSV files and not have to use SharePoint?
My thoughts being those two items would make the whole thing slightly faster. Not necessarily a huge time savings but the fewer items between start and finish (like using SharePoint and having to have DAX done in the file itself) make it just that much faster and easier.
All that said, you might have the best solution for your current constraits.
Proud to be a Datanaut!
Private message me for consulting or training needs.
1. I do all the heavy lifting in Power M, and use Dax for small things. Is it possible to use the actual Dax in Dataflows?
2. Could you elaborate on what do you mean by directly linking to CSVs? In the current set up all extractions are automiatically saved in a SharePoint folder and then Dataflows feed from the folder.
Hey @Brianii ,
My apologies, I misspoke. I was thinking of complex M transformations and not DAX.
As for the csv files I was wondering about you having a place other than sharepoint (like onedrive or s network share) to remove one more step.
Proud to be a Datanaut!
Private message me for consulting or training needs.