Solved: Scheduled Refresh for Data imported from Amazon S3

asfdwd · ‎09-07-2023

Hi,

One datatable for my Power BI report is imported from Amazon S3.

The only way I know is to use python like below.

import boto3
import pandas as pd
import io

bucket = 'name of your bucket'
key = 'name of your file'

s3 = boto3.client('s3')
f = s3.get_object(Bucket=bucket, Key=key)
shape = pd.read_csv(io.BytesIO(f['Body'].read()), header=0, index_col=0)
shape = shape.apply(lambda x: x.fillna(0))
print(shape)

However, it seems that Power BI Scheduled Refresh service doesn't support Python.

Is it possible that I can do Scheduled Refresh for Data imported from Amazon S3?

TomMartens · ‎09-07-2023

Hey @asfdwd ,

at the current moment, there is no other way than to use Python to get S3 data when using Power BI, this might change in the future when Power BI Dataflows Gen 2 become available (a feature of Microsoft Fabric).

Using Python has its own downsides as it requires a gateway in personal mode, this gateway machine also needs a Python installation. There are downsides when using a gateway in Personal mode: https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-personal-mode?WT.mc_id=DP-MV...

My recommendation, use Azure data Factory to get the S3 data (https://learn.microsoft.com/en-us/azure/data-factory/connector-amazon-s3-compatible-storage?tabs=dat...), when this is done you can trigger a dataset refresh from a Azure Data Factory pipeline (https://community.fabric.microsoft.com/t5/Service/Trigger-the-dataset-refresh-after-Azure-ETL-proces...). I have to admit that this is not trivial in comparison to the Power BI Desktop solution you have, but this is most stable solution I can currently think of.

Everything S3 related will become more simple with Microsoft Fabric, but Fabric is in Preview at the current moment and comes with extra costs (as well as the Azure Data Factory approach).

Hopefully, this provides some new ideas, helping to tackle your challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

TomMartens · ‎09-07-2023

Hey @asfdwd ,

at the current moment, there is no other way than to use Python to get S3 data when using Power BI, this might change in the future when Power BI Dataflows Gen 2 become available (a feature of Microsoft Fabric).

Using Python has its own downsides as it requires a gateway in personal mode, this gateway machine also needs a Python installation. There are downsides when using a gateway in Personal mode: https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-personal-mode?WT.mc_id=DP-MV...

My recommendation, use Azure data Factory to get the S3 data (https://learn.microsoft.com/en-us/azure/data-factory/connector-amazon-s3-compatible-storage?tabs=dat...), when this is done you can trigger a dataset refresh from a Azure Data Factory pipeline (https://community.fabric.microsoft.com/t5/Service/Trigger-the-dataset-refresh-after-Azure-ETL-proces...). I have to admit that this is not trivial in comparison to the Power BI Desktop solution you have, but this is most stable solution I can currently think of.

Everything S3 related will become more simple with Microsoft Fabric, but Fabric is in Preview at the current moment and comes with extra costs (as well as the Azure Data Factory approach).

Hopefully, this provides some new ideas, helping to tackle your challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Scheduled Refresh for Data imported from Amazon S3

Helpful resources

Power BI Monthly Update - April 2024