Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi,
One datatable for my Power BI report is imported from Amazon S3.
The only way I know is to use python like below.
import boto3 import pandas as pd import io bucket = 'name of your bucket' key = 'name of your file' s3 = boto3.client('s3') f = s3.get_object(Bucket=bucket, Key=key) shape = pd.read_csv(io.BytesIO(f['Body'].read()), header=0, index_col=0) shape = shape.apply(lambda x: x.fillna(0)) print(shape)
However, it seems that Power BI Scheduled Refresh service doesn't support Python.
Is it possible that I can do Scheduled Refresh for Data imported from Amazon S3?
Solved! Go to Solution.
Hey @asfdwd ,
at the current moment, there is no other way than to use Python to get S3 data when using Power BI, this might change in the future when Power BI Dataflows Gen 2 become available (a feature of Microsoft Fabric).
Using Python has its own downsides as it requires a gateway in personal mode, this gateway machine also needs a Python installation. There are downsides when using a gateway in Personal mode: https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-personal-mode?WT.mc_id=DP-MV...
My recommendation, use Azure data Factory to get the S3 data (https://learn.microsoft.com/en-us/azure/data-factory/connector-amazon-s3-compatible-storage?tabs=dat...), when this is done you can trigger a dataset refresh from a Azure Data Factory pipeline (https://community.fabric.microsoft.com/t5/Service/Trigger-the-dataset-refresh-after-Azure-ETL-proces...). I have to admit that this is not trivial in comparison to the Power BI Desktop solution you have, but this is most stable solution I can currently think of.
Everything S3 related will become more simple with Microsoft Fabric, but Fabric is in Preview at the current moment and comes with extra costs (as well as the Azure Data Factory approach).
Hopefully, this provides some new ideas, helping to tackle your challenge.
Regards,
Tom
Hey @asfdwd ,
at the current moment, there is no other way than to use Python to get S3 data when using Power BI, this might change in the future when Power BI Dataflows Gen 2 become available (a feature of Microsoft Fabric).
Using Python has its own downsides as it requires a gateway in personal mode, this gateway machine also needs a Python installation. There are downsides when using a gateway in Personal mode: https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-personal-mode?WT.mc_id=DP-MV...
My recommendation, use Azure data Factory to get the S3 data (https://learn.microsoft.com/en-us/azure/data-factory/connector-amazon-s3-compatible-storage?tabs=dat...), when this is done you can trigger a dataset refresh from a Azure Data Factory pipeline (https://community.fabric.microsoft.com/t5/Service/Trigger-the-dataset-refresh-after-Azure-ETL-proces...). I have to admit that this is not trivial in comparison to the Power BI Desktop solution you have, but this is most stable solution I can currently think of.
Everything S3 related will become more simple with Microsoft Fabric, but Fabric is in Preview at the current moment and comes with extra costs (as well as the Azure Data Factory approach).
Hopefully, this provides some new ideas, helping to tackle your challenge.
Regards,
Tom