Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
sam-roy
Frequent Visitor

Ideas to keep the existing data in the dataset after refresh

I've created a custom connector that uses authorization flow to connect to third-party APIs and I use an enterprise gateway to schedule the refresh for the dataset. The issue with that is, that the old dataset is being replaced by the new dataset after every refresh. I have scheduled the refresh to run every 1 hour. So at the end of the day, I lose all the old reports. I want to maintain the historic data. 

 

So as a solution for this, I created a push dataset which I believe is backed up by a DB, and using the rest API I'm pushing the refreshed data to the push dataset. But that requires another authorization flow, which is not supported in the connector. We have two different cliend_id's and we require two different tokens to access these two different APIs. One for Microsoft API and the other for third-party API. So that didn't work. Are there any ways to fix this?

 

On another note, any methods to check if the dataset has values and using the table component to combine the data?

Below is the code I'm using to build the table. Let me how I should change it

 

 

let
TableFromList = Table.FromList(reportEntries, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandColumn = Table.ExpandRecordColumn(TableFromList, "Column1", {"category"},{ "Column1.category"})
in
ExpandColumn

 

11 REPLIES 11
jbwtp
Memorable Member
Memorable Member

Hi @sam-roy,

 

From the context of your post I suspect that you are using Power BI dataflow (considering you are talking about the push). In this case, you should be able to fetch the existing data (before they get refreshed) by self-querying the dataset. This is possible in PBI, just select your dataflow as a datasource as you would do with any other.

 

Since you can fetch the data before the refresh, you can then append them for example to a "historical" table in the same dataflow (which you would also fetch durung the refresh) at the same time as you are refreshing the "live" one. Because commit doesnot happen until all data queries in the flow refreshes successfully can get hold of the "pre-refresh" data. 

 

The sequence of steps would be:

 

1. Get the stored "historical" table

2. Get the stored "live" table

3. Append the "live" table to the "historical" table

4. Refresh the "live" table.

 

 

sam-roy
Frequent Visitor

Hi @jbwtp 

 

Thanks for your reply. I'm not using the dataflows. I've built a power bi custom connector because the third part APIs I'm using to get the data uses Oauth authentication and it involves series of HTTP callouts to get the required data. So for this case, I don't think dataflows will be useful. Is there a way to self query the dataset before refreshing it from the connector code itself?

jbwtp
Memorable Member
Memorable Member

Hi @sam-roy,

 

Try this one:

DirectQuery for Power BI dataset: How does it work? - RADACAD

This is the same logic as above for DFs.

 

I am sure I saw somewhere how to access PBI datasets via Analytics Server (can't seem to find it now), but just thought to give you this lead in case the ablve does not work, so you could investigate further yourself.

 

Cheers,

John

BA_Pete
Super User
Super User

Hi @sam-roy ,

 

First question:

Power Query isn't a data warehouse, it's a mashup engine. It won't store data for you to a production-level standard. Invest in a data warehouse or, at minimum, a basic SQL Server instance on your gateway that you can push data into.

 

Second question:

I'm not really sure what you're getting at here. Do you mean you want to check if there's data/tables present before running transformations to avoid errors? If so, then check out Chris Webb's blog on this subject:

https://blog.crossjoin.co.uk/2015/07/03/ensuring-columns-are-always-present-in-a-table-returned-by-p... 

https://blog.crossjoin.co.uk/2014/09/18/handling-data-source-errors-in-power-query/ 

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thanks for your reply @BA_Pete 

 

I'm not expecting a production-level standard, but is there a way to store data for 24 hours? The thirty-part api will update the dataset every hour. May be I can download the data at end of the day. If that is not possible, then can you elaborate how to push data from dataset to SQL server and then export it back to power bi so that we can visualize the data. Can that be automated through code or flows?

Hi @sam-roy ,

 

If you only want to hold data for 24 hours, then happy for it to be overwritten, I'd recommend putting your 24hr queries into a Dataflow that refeshes once per day:

https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-create 

 

Regards pushing from Power Query to SQL Server, here's the first relevant hit on Google:

https://www.biinsight.com/exporting-power-bi-data-to-sql-server/ 

It would usually be an automated process unless you specifically want to keep it a manual one.

However, there's many ways to skin this cat, and its really going to depend on your in-house expertise in the area, the ability/appetite to use external tools, data security policies and constraints etc. etc.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi @BA_Pete 

 

But my use case is to refresh the dataset every hour, to get real-time data. I also need the old data to compare and visualize. And I'm using custom connector because the third part APIs I'm using to get the data uses Oauth authentication and it involves series of HTTP callouts to get the required data. So for this case, I don't think dataflows will be useful, or is it? I'm quite new to Power BI, so advise me. Can we connect the power bi connector to dataflow?

 

Is there a possibilty to export data from dataset to SQL server automatically after each refresh, perhaps using a power automate?

 

 

 

If you need to update your data every hour, but keep a rolling 24hrs of history then, no, Dataflows aren't going to help you with this. For this scenario, you may want to look into incremental refresh, the closest thing you'll find to a native PBI data warehouse:

https://learn.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-overview 

Be mindful that the use of incremental refresh comes with some quite severe caveats, such as not being able to download or edit your accumulated data, for one.

 

If you just need to grab data once per day and hold it for 24hrs, then I think Dataflows would be the simplest solution.

 

In theory, you can use any connector that you use in Desktop in Dataflows as well. It may take a bit of configuration but, if you're in the game of making/using custom connectors, this shouldn't prove difficult.

 

Regarding exporting from datasets to SQL, I think that's getting beyond the scope of this thread, and probably this forum as well. I think your actual implementation would probably involve exporting a duplicated query from PQ to SQL, an example of which I provided a link for previously.

 

I'm sorry I can't give you a step-by-step walkthrough of what you should do and how you should do it, but we're getting into pretty deep "it depends" waters, where almost every choice around what to do and how it's done will be so heavily dependent on your specific scenario, that it makes it nearly impossible to give the 'right' solution for you.

Have a look into both incremental refresh and the PQ>SQL export link and this should give you the broad options for quickly, cheaply, and fairly easily (i.e. without paying an ETL/DBA expert to set up a DWH for you) retaining historical data in the Power BI universe.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi @sam-roy ,

 

Did you get a chance to look into any of the options I suggested? How did you get on?

I notice that the Microsoft Customer Service Team have already marked an answer as the solution to your issue, but I just wanted to make sure that you actually got the solution you needed.

 

Let me know if there's anything else I can expand on or help with.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi @BA_Pete ,

 

As you said, I looked into the SQL export, used a R script to connect to the MySQL but seems like R is supported only in the personal gateway. I'm currently using enterprise gateway and I can't change to personal gateway. So I was thinking, is it possible to connect to the MYSQL from the custom connector which is built-in M language ? I'm trying to extract the data from the source and export it directly to the MySQL database and use a MySQL connector to publish that data to the power bi service. Also, can we automate the refresh process for the MySQl connector?

Hi @sam-roy ,

 

That's a real pain with R not working on Enterprise gateways.

Connecting to MySQL *should* be fine though, although I'm not an expert on all the capabilities of this source type, to be honest.

You'll need to install the connector on both your local machine (for development) and your gateway(s) for automating the refresh process.

You can get it here if you've not got it already:

https://dev.mysql.com/downloads/connector/net/ 

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors