Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
rademar
Frequent Visitor

API connection, time series analysis issue

Hi Everyone,

 

I've never userd API as a data source so far (always had a direct access to db). But for this project I got API documentation and I don't have a clue what to do...

 

This is a small part from API documentation. There are 50 "Get" tables in api documentation. For each, subsidaryId and pwdWebService are required fields, for some operationdate is also required:

 

GetIncomes

GetIncomes_Xml

Parameters:

Field 

Description  

Data Type

Required

subsidiaryId

Client code

Integer

Yes

pwdWebService

Client password

String

Yes

operationdate

Operation date

DateTime

Yes

 

XML example:

rademar_2-1709374064239.png

 

Response:

rademar_1-1709373309028.png

 

These are my questions:

1. Beginner question: How do I connect to the api? Get data > Web > Basic > Enter URL > enter credentials under "basic" when I get username and pass, or enter API key if I get key and that's it? In documentation there's over 50 tables. Do I get access to all of them or I need to setup api connecction for each of them?

2. I notice operationdate as mandatory field. Does this mean that I can get data from api for just a specific date? I need to make some time series analisys, how do I do that?

3. Is it possible to schedule a refresh? What will happen with previous data from other dates?

4. I need a complete access to all the historical and up to date data from all of these tables in order to make dashboards that will update daily. Is that possible to do via API, or I need direct connection to database?

 

Waiting for help, cheers!

 

 

 

 

1 ACCEPTED SOLUTION

Choose a balance between reducing the strain on the source system (pull only the data that you need) and your maintenance effort (DB management) .  Instead of a db you can also consider CSV or Parquet files for your storage.

View solution in original post

11 REPLIES 11
rademar
Frequent Visitor

Because I need up to date data + historical data, I guess I need to make a database and to schedule api calls each day so the dataset is up to date.

 

This is my line of thinking:

1. Setup Azure SQL database

2. Scrypt makes api calls each day and fills the database with new data every day.

3. PowerBI connects to Azure SQL database

4. Scheduled refresh in PBI Service.

 

Did I miss something? Does this setup make any sence? Any alternative ideas?

 

 

Does the data behind the API change after the fact?  For example if you pull yesterday's data, will that then be it or do you have to repull it after some time to catch the changes?

Good question.

 

In perfect world the data shouldn't change. But some changes due to errors are possible (manual changes by admin to the original data due to human error or similar).. So it's possible that it will be needed to repull the data, let's say once a week or once a month? 

 

Any ideas how to address the possible issue?

I would recommend you consider setting up incremental refresh.  Then you can decide if you want to schedule the refresh automatically, or if you want to manually update certain partitions. 

 

The API query may not fold, so the performance improvements will be minimal. But the refresh flexibility may be worth it.

So you don't recommend makind some database in between powerbi and api? Will it be possible tu pull years of data with incremental refresh? Where is all the data stored that way?

 

A side question: If we plan to make a future proof solution, with possibility that data from other APIs is loaded in the future to the model, would you then recomend making a database? So to make a database structure, map the corresponding fields in api and fill the database, possibly from 2 or more different sources and make reports as it's one dataset. Hope I made this part clear.. 

Kudos for talking this through.  "Storage Guarantee" is one of the weak points of Power BI. Basically there is none.  This may be changing with Fabric,  but it's still a good idea to have a plan B.  I call it "Source Data Recoverability".  Yes, it's a made up word, but it conveys the idea. If your APIs are guaranteed to be able to produce the same data reliably over many years then you don't need a  database in between.  But if they don't, or if you want to do CDC, then a  db certainly makes sense.  It creates additional cost, of course.

Thanks a lot for patience..

 

The APIs are connectet to pos/erp software, which has it's own database, so deffinetly they're guaranteed to produce the same data reliably.

 

So, if there's let's say 50 different api calls each day, that makes 50 different tables in powerbi? If all the data (maybe 5-6 years of data) is pulled from 50 api calls each day, is it ok? Can pbi handle that?

 

I'm still thinking about database, that it might be more elegant solution, because I could make db schema only with data I need for my reports, and then pull the data from APIs selectivly to the db. Correct me if I'm wrong.

Choose a balance between reducing the strain on the source system (pull only the data that you need) and your maintenance effort (DB management) .  Instead of a db you can also consider CSV or Parquet files for your storage.

Hm.. CSV sounds like interesting idea..

 

If I got this right: Make scheduled API call each day and save it as csv file in some shared folder (google drive for example). Use that folder as data source for the table in PowerBi. Each day when new csv shows up in that folder it's appended to that table in pbi. If there's some change in previous data, I can manually make api call for that day and change that day's csv file.  Sounds right? 

I guess auto refresh shouldn't be an issue since it's possible to do it if data is in shared folder?

Correct. CSVs ingest very fast, so you can do a full load of all CSV files in that folder each time if you want to minimize the maintenance effort.

lbendlin
Super User
Super User

1. You may need separate queries, especially if you are interested in acceptable performance.

2. Please refer to the API documentation for that

3. Yes, but you must eliminate/hide all dynamic parts of your URL.  Use RelativePath and Query attributes.  Power BI has no memory, your previous data will be lost.  If that data is immutable you will want to store that outside of Power BI (for example in a OneDrive).

4. See 3.   use the API and your own storage

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.