Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
fenixen
Advocate II
Advocate II

Can I connect to a Amazon S3 bucket using Power Query?

Hi 


Amazon stores billing data in S3 buckets, i want to retrieve the CSV files and consolidate them. Is PowerBI/Power Query able to connect to S3 buckets? 

 

 

Regards

Remi 

2 ACCEPTED SOLUTIONS
v-qiuyu-msft
Community Support
Community Support

Hi @fenixen,

 

As the Amazon S3 is a web service and supports the REST API. You can try to use web data source to get data. See:

Amazon S3 REST API Introduction

How to call REST APIs and parse JSON with Power BI

 

Another I can think of is importing data from Amazon S3 into Amazon Redshift. then in Power BI desktop, use Amazon Redshift connector get data. See: Loading Data from Amazon S3.

 

Best Regards,
Qiuyun Yu

Community Support Team _ Qiuyun Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

ZappySys
Helper I
Helper I

Hi,

 

ZappySys has released API drivers (XML and JSON) with AWS API support. See below blog post it explains scenario of how to access AWS S3 data in Power BI. It also explains Billing / Cost API usecase via API calls. ZappySys will rease CSV driver very soon which will support your scenario of reading CSV from S3 in Power BI but until that you can call Billing API (JSON format)

 

https://zappysys.com/blog/read-amazon-s3-data-power-bi-aws-json-xml-api/

 

Query Amazon Billing APIQuery Amazon Billing APIConfigure AWS Connection using ZappySys ODBC Driver (XML or JSON)Configure AWS Connection using ZappySys ODBC Driver (XML or JSON)Import Amazon Billing data in Power BI using ZappySys DriverImport Amazon Billing data in Power BI using ZappySys Driver

View solution in original post

21 REPLIES 21
Anonymous
Not applicable

Hi,

I used a python script in Power BI. See my suggestion at:

https://stackoverflow.com/questions/51801521/connecting-power-bi-to-s3-bucket/63185488#63185488

Hope it helps.

DB Waller

Hello! I applied a similar solution to our process extracting data from a MP Tool, but to autorefresh a dashboard you need to set up a On-premise app and this lives in your local computer, hence if it is not on it will not refresh because the script is there

Anonymous
Not applicable

Hi,

 

I have tried using Azure Data Factory to pull CSV from S3 bucket. If you have some ETL or SSIS experience, then you are good to go with ADF. I managed to pull the CSV into a blob storage. You can then insert the data from CSV into any SQL database and connect to Power BI. 

 

Good luck!

Peter

Great Peter ... do you have any examples (code, screenshots) that you could share with me?

Thank you

Anonymous
Not applicable

@gdecome 

Let me know if you need more help! Hope the Data Factory solution meet your expectation.

Good luck and have a good day!

 

Peter

Anonymous
Not applicable

@gdecome 

Check this website

https://medium.com/@brijrajsingh/azure-data-factory-v2-pipelines-for-copying-large-aws-s3-buckets-to...

 

First build the Connections for the S3 with bucket name, key and secret, and another to connect to your SQL database. Then run a Copy pipeline to pull from S3 and sink in Azure SQL. 

gdecome
Helper III
Helper III

Hi @fenixen , could you solve your problem? I´m facing the same situation now. I´m able to get url, credentials and s3path with Power Query (Power BI) but not sure how to use them to get the csv file with the data that I need.

Hi @gdecome 

I landed on a semi-manual solution using https://s3browser.com/ to retrieve the data files and then connecting to the files using Power Query. I didn't have people available to help me with API. 

Anonymous
Not applicable

Best idea is to use Redshift in this case.

PBI Architect

Anonymous
Not applicable

This helps!

ZappySys
Helper I
Helper I

Hi,

 

ZappySys has released API drivers (XML and JSON) with AWS API support. See below blog post it explains scenario of how to access AWS S3 data in Power BI. It also explains Billing / Cost API usecase via API calls. ZappySys will rease CSV driver very soon which will support your scenario of reading CSV from S3 in Power BI but until that you can call Billing API (JSON format)

 

https://zappysys.com/blog/read-amazon-s3-data-power-bi-aws-json-xml-api/

 

Query Amazon Billing APIQuery Amazon Billing APIConfigure AWS Connection using ZappySys ODBC Driver (XML or JSON)Configure AWS Connection using ZappySys ODBC Driver (XML or JSON)Import Amazon Billing data in Power BI using ZappySys DriverImport Amazon Billing data in Power BI using ZappySys Driver

v-qiuyu-msft
Community Support
Community Support

Hi @fenixen,

 

As the Amazon S3 is a web service and supports the REST API. You can try to use web data source to get data. See:

Amazon S3 REST API Introduction

How to call REST APIs and parse JSON with Power BI

 

Another I can think of is importing data from Amazon S3 into Amazon Redshift. then in Power BI desktop, use Amazon Redshift connector get data. See: Loading Data from Amazon S3.

 

Best Regards,
Qiuyun Yu

Community Support Team _ Qiuyun Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

I've been through this information and the process seems non-trivial, as compared to connecting to other REST API services.  Has anyone actually connected PBI to an S3 bucket as a data source?   I'm looking for actual experience and not theory per se.  Thanks!

Is anyone still looking for this? I have this up and running .... I can't share my query, but I might be able to point you in the right direction. 

please share the steps to connect to S3 bucket from PBI desktop. 

Sure. This is how I do it:

 

1. Create a new Lambda Function ... it can be empty for now
2. Set up a new API in API Gateway
3. Create a new GET method
3.1 Select Lambda Function for the integration type
3.2 Select the Use Lambda Proxy integration option
3.3 Select the region and type in the name of the lambda function you created in step 1
4. Edit your lambda function.
4.1 Using the AWS SDK, generate a url w/ pre-signed key for your file
4.2 Return a 303 redirect to the url from step 4.1

 

Here is a sample of a lambda function in python 2.7:

 

 

bucket = 'bucket-name'
key = 'path-to-file'

client = boto3.client('s3')
link = client.generate_presigned_url(
        'get_object',
        {'Bucket': bucket, 'Key': key},
        7200, 'GET')

return {
    "statusCode": 303,
    "headers": {'Location': link}
}

 

 

You can use this PowerBI query as a starting point:

 

 

let
    // Replace the URI with the "invoke url" of your API Gateway endpoint
// See: https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-call-api.html#how-to-call-api-console uri = "https://your-web-api.execute-api.us-east-1.amazonaws.com/stage/endpoint",

// PowerBI should understand the 303 redirect and follow it to your presigned s3 url
// Make sure to set IsRetry to true so that PowerBI gets a fresh url with a
// fresh presigned key every time
raw = Web.Contents(uri, [IsRetry=true]), // My file is a gzipped utf-8 tsv, so I need to decompress it // and tell the csv parser the delimiter and encoding binary = Binary.Decompress(raw, Compression.GZip), csv = Csv.Document(binary, [ Delimiter="#(tab)", extraValues=ExtraValues.Ignore, Encoding=65001]) // 65001 is utf8 in csv

 

Once you get everything working with a single file, you can parameterize your API and Lambda function to accept an s3 path so that you can pull in anything from your S3 account. 

 

 

 

 

 

Surely, this helped many people, that is the way I ended up setting the lambda - powerBi but now what is happing is that the lambda is being called twice by power BI I had tables references to the main table but I disable their refreshing and only is refresing the main call, do you know what can be happening? 

Hi,

This is great, I am trying to understand, if in my case I have a CSV, what do I replace this part of code with?

 

// My file is a gzipped utf-8 tsv, so I need to decompress it
    // and tell the csv parser the delimiter and encoding
    binary = Binary.Decompress(raw, Compression.GZip),
    csv = Csv.Document(binary, [
        Delimiter="#(tab)",
        extraValues=ExtraValues.Ignore,
        Encoding=65001]) // 65001 is utf8

 

Or I just ignore?

 

Thank you,

 

Best 

Hi,

This is great, I am trying to understand, if in my case I have a CSV, what do I replace this part of code with?

 

// My file is a gzipped utf-8 tsv, so I need to decompress it
    // and tell the csv parser the delimiter and encoding
    binary = Binary.Decompress(raw, Compression.GZip),
    csv = Csv.Document(binary, [
        Delimiter="#(tab)",
        extraValues=ExtraValues.Ignore,
        Encoding=65001]) // 65001 is utf8

 

Or I just ignore?

 

Thank you,

 

Best 

 

Before you roll this out to production you should think about some kind of authentication. The normal AWS IAM authentication tokens don't play nicely with PowerBI, so you will need to use a custom authentication method for your API Gateway: https://docs.aws.amazon.com/apigateway/latest/developerguide/use-custom-authorizer.html. I ended up using basic auth which is not the strongest thing in the world, but API Gateway uses https, so it should be secure enough. 

 

 

 

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors