Dataflow and PowerBI Gateway - Sourcing data dynam...

AndrewWestran · ‎01-21-2024

Hi all,

I have a Dataflow Gen2 which is sourcing data from an on-premises SQL Server via PowerBI Gateway.

The server contains many databases, each representing a retail POS.

These databases may change from day to day, i.e., there may be new databases today that were not there yesterday, without notice.

The Dataflow allows me to access the server in a fixed manner (nothing changes in this regard), but I need to be able to look up the databases to be queried and access them dynamically.

The tables in question will be consistent between the databases.

Based on what I have seen so far, the Dataflow does not support this level of flexibility.

Is there something I am missing in the capabilities of the tool?

I can obviously push the data from the source environment and pre-consolidate to solve this issue, however, I was hoping to solve this issue from the Fabric environment due to constraints on the source side.

Any thoughts and comments are appreciated.

Regards,

Andrew

Element115 · ‎02-13-2024

I was re-reading your original post and something occurred to me... have you tried to maintain a table with meta data, that is, a table that will keep track of all the DB servers and DB names as they come and go? This would allow you to create a query that connects to the DB containing this meta data table, and using that table, you can in M select any or all of your databases and thus ingest their data.

AndrewWestran · ‎02-13-2024

Hi @Element115

I gather that I will be able to use the M "sql.databases" function to acquire the details of the databases that I need to access, so no additional metadata tables should be necessary.

I do appreciate the time and thought you have taken to respond to this query. I am open to any further comments you may have to offer. 🙂

Thanks and regards

Element115 · ‎02-14-2024

Interesting! You're right, that should do the trick. I was not aware M had such a function. Pretty nifty. 😊

AndrewWestran · ‎01-23-2024

This is good news for me. Depending on the release date, of course. Sooner would be better... 😁

(I suspect a Pipeline will facilitate the dynamic sourcing more easily than the Dataflow.)

v-cboorla-msft · ‎01-25-2024

Hi @AndrewWestran

The internal team has confirmed that, there is no dynamic ability to switch between different databases today. The connections themselves must be validated and are thus static for this reason.

I hope this helps. Please do let us know if you have any further questions.

Thanks

AndrewWestran · ‎01-25-2024

Hi @v-cboorla-msft

I appreciate that the connection would need to be validated and static. However, I would hope that database selection within the context of a static connection could be dynamic. Privileges from the established connection would apply.

I also appreciate that this would introduce levels of complexity to the static mechanisms of the Dataflow which are hard-wired to a particular table for example, to which the transformation steps would be applied.

It feels like this would be an ideal application of the Notebook capabilities, for example, to select a data source and manage the data definition, sourcing, and push to the final destination in code, rather than using a graphical tool.

Thanks

Element115 · ‎01-31-2024

Can you migrate some of the data to a lakehouse? Perhaps an aggregate version of the source data? Storage cost is very inexpensive from what I've seen.

AndrewWestran · ‎01-31-2024

Thanks for your comment, @Element115.

To manage this migration to the lakehouse without the dynamic DB access capability, we would need to move the functionality to the source system, e.g., export the data to Azure Blob storage and then import it into the lakehouse.

This is our only alternative at this point and is unnecessarily complicated, where Fabric touts a unified toolset, which falls short in this scenario.

v-cboorla-msft · ‎02-13-2024

Hi @AndrewWestran

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.

Thanks.

AndrewWestran · ‎02-13-2024

Hi @v-cboorla-msft

I have responded to all replies on this thread.

Please let me know if there is a specific comment that I have missed.

Many thanks

v-cboorla-msft · ‎02-13-2024

Hi @AndrewWestran

Apologies for my previous comment. I just wanted to check whether your query got resolved. If no please let us know.

Otherwise, will respond back with the more details and we will try to help.

Thanks.

AndrewWestran · ‎02-13-2024

Hi @v-cboorla-msft

No problem 🙂

This issue is not resolved.

I would be interested if this functionality is possibly supported by Notebooks/PySpark or other features of the Fabric environment. Perhaps I have missed something.

Regards

miguel · ‎02-13-2024

If you're trying to connect to a single source that can have many databases (such as SQL Server), then you can use the SQL Server connector without defining a database to connect to and then define your own M query logic as to what table you'd like to query from which specific database inside of the server.

Dataflows is flexible enough in this matter, but it also relies on your own requirements and the logic that you wish to apply.

The SQL Server connector without defining the database will use the Sql.Databases function. If you prefer to write your own M code and more of a code-first approach like in a notebook you can also take that route with Dataflows.

Hope this helps.

AndrewWestran · ‎02-13-2024

Hi @miguel

As long as this functionality works with the on-premises data gateway, it sounds viable.

I will look into this option.

Thanks

AndrewWestran · ‎02-15-2024

Hi @miguel , @Element115 , @v-cboorla-msft

I have built M code to produce the effect that I am trying to achieve as follows:

let

// Function to query top 100 rows from a table in a specific database

GetTableFromDatabase = (server as text, database as text) as any =>

let

Source = try Sql.Database(server, database) otherwise null,

MyTable = if Source <> null then Source{[Schema="dbo",Item="MyTable"]}[Data] else null,

Top100Rows = if MyTable <> null then Table.FirstN(MyTable, 100) else null

in

Top100Rows,

// Get list of databases from the SQL Server

GetDatabaseList = Sql.Databases("MyServer"),

DatabaseNames = Table.SelectColumns(GetDatabaseList,{"Name"}),

//Iterate through the list of databases, accumulating 100 rows of data from MyTable from each DB into a list

LoadDataFromAllDatabases = List.Accumulate(

DatabaseNames[Name],

{},

(state, currentDatabase) =>

let

DatabaseName = currentDatabase,

TableData = GetTableFromDatabase("MyServer", DatabaseName),

NewState = if TableData <> null then state & {TableData} else state

in

NewState

)

in

LoadDataFromAllDatabases

This is working for me up to a point, but I am having some issues with credentials which I am working on resolving with the relevant teams at my organisation.

So far, this approach seems to be viable and appears to be functioning fine in conjunction with the Gateway to the on-premises data.

Once I have the credential issues resolved, I will follow up with confirmation of success and/or any modifications required for the final solution.

Naturally, the code above has been masked and pruned to hide specific details for my implementation, but the code is generic and would work as a framework for any similar requirement.

Thanks for all of the assistance and guidance thus far.

😊

miguel · ‎02-15-2024

I don't think that there's a need to use Sql.Database when you're already querying and gaining access to all databases from the server with Sql.Databases. You're introducing the issue of unbinded connections when you're using new data source functions to the mix - if you just leverage the tree view given by Sql.Databases you should be able to achieve the same without incurring into accessing new data sources (and then have the credentials issue that you're seeing).

AndrewWestran · ‎02-15-2024

Hi @miguel

I am using the Sql.Database object since I will need to address 2,000 databases containing the same table name (MyTable) to extract the data from these. Will using the tree view not imply me having to manually select all 2,000 tables to achieve the same effect?

Perhaps I am not clear on the tree view functionality that you are referring to here.

Are you perhaps suggesting using navigation code from the Sql.Databases output to draw the second level detail in stead of Sql.Database with unbound connections as you say?

miguel · ‎02-15-2024

only seeing this now, but yes. The output of a Sql.Databases is a table with all the databases inside of that server.

You can then create a function that goes through each of those databases from the output of the Sql.Databases, retrieves the data that you want and combines it as you see fit.

miguel · ‎02-15-2024

is not really an object, but rather a data source function. Every time that you use that function it has the potential to require a new linkage of a connection to its data source path.
Are those databases in the same server? or are you trying to access different servers?

If they are in the same server, have you considered using Sql.Databases instead and creating a function that explores the databases in that server and then extracts the data from each table within that Server and then combines them as you see fit?