Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Because I'm using elastic search as my database, I'll need to create historic data visualisation. Every week, I will push about 30MB of json data into the elastic search index, and I have imported my data from elastic search to power bi using Python scripting, and everything is working fine. After a few weeks, the data inside elastic search has grown to around 3GB, and the power bi desktop now throws the following exception when trying to refresh the data.
What is best way to handle this scenerio, will i need to change input method or what is best way to hanle this.
Code:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd
client = Elasticsearch([{'host':'localhost'}])
my_datasets1=[]
for index in client.indices.get('mydata_*'):
s = Search(using=client, index=index)
if 'myindexname_' in index:
my_datasets1.append(pd.DataFrame([hit.to_dict() for hit in s.scan()]))
metrics_dataset= pd.concat(my_datasets1)
@nicpenning can you please look into this
Sorry Maggi029, I am not sure how Python ties into your PowerBi integration.
I query Elasticsearch directly with PowerBi and use pagination to get all results on GBs of data.
However, I recommend looking into Elasticsearch Data Transforms that allow you to aggregate large sets of data into a new index with the value you need instead of working with raw documents.
https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-overview.html
An example is have 10,000 documents with the data such as:
{"computer_name": "computer1", "organzation" : "Finance", "country" : "United States of America"}
{"computer_name": "computer2", "organzation" : "Finance", "country" : "United States of America"}
....
{"computer_name": "computer10000", "organzation" : "Finance", "country" : "United States of America"}
You can use transforms to get counts, averages, etc.. to reduce the data down to just 3 documents.
{"unique_computer_names": "700", "organzation" : "Finance", "country" : "United States of America"}
{"unique_computer_names": "100", "organzation" : "Human Resources", "country" : "Nigeria"}
{"unique_computer_names": "200", "organzation" : "Information Technology", "country" : "France"}
Working at something like this at scale can take billions of documents to thousands of documents when you narrow in on specific data you wish to display high level metrics in PowerBi.
This might be a lot of work and research but you can dramatically reduce large datasets by creating summaries of the data and display what is needed. Not a quick solution by any means but it helps PowerBi out by using summarized data.
Hi @nicpenning
Thanks for look into this,
Previously, as you suggested (Solved: Re: Getting Data from Elastic Stack with REST with... - Page 2 - Microsoft Power BI Communit...), I used the rest api to import the data from elastic search. However, I was unable to refresh the data in the PowerBI online service using this method because recursive calls turn this datasource into dynamic data, and PowerBI fails to refresh dynamic data. As a result, I rely on the Python method. Do you believe my request can still be fulfilled using the REST API, and if so, could you please elaborate?
I am not familar with the dynamic data source or the online PowerBi service, so I won't be much help here. I hope you can find a solution soon!
Hey @Maggi029 ,
from my perspective, you are facing or will be facing some challenges these challenges are
This is how I would tackle this
Some architectural considerations
To overcome the file size limit that can be uploaded I would consider moving my python code to Azure, e.g. creating an Azure Automation script or writing an Azure function. This code then creates JSON documents. Each document contains data for a day or for a month, whatever you consider useful. The JSON document can be stored inside an Azure blob store. Then the files will be picked up by Power Query. You can create your data model using just a small number of JSON documents, publish the pbix file to the service, copy all the JSON documents to the blob container, manually refresh the dataset, done!
Hopefully, this provides some ideas on how to overcome your challenges.
Regards,
Tom
Hi @TomMartens
Thanks for your reply , I think you suggestion have some useful benefits. I have few dounts needs to clarify on your suggestion ,since im new to this method.
1. Separate the content from the data by using two different pbix files - What is usecase for this method, is it also supports size management?
2.. Azure Automation script- According to my understanding, you're suggesting that you write an Azure automation script that pulls data from elastic search and converts it into small documents and save it in blob storage, which you can then connect to PowerBI using Azure Blob Storage. Correct me if i was wrong.
Thanks
Hey @Maggi029 ,
in regards to 1
next to be able to develop the data model and the content in parallel, it's important to know, that you can't download the pbix file once you have enabled incremental refresh.
Incremental refresh does not directly support handling large files, but you can reduce the duration of a data refresh as only the incremental will be added to the model. This will become a little bit more complex if the source are files, but there are many posts out there, which are describing the proper configuration. The one of @GilbertQ is my favorite: How you can incrementally refresh any Power BI data source (This example is a CSV File) - Reporting/...
in regards to 2
Yes, accessing files from a blob store, is more simple than publishing a single large pbix file to the Power BI service.
Regards,
Tom
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
144 | |
87 | |
65 | |
50 | |
45 |
User | Count |
---|---|
217 | |
88 | |
81 | |
65 | |
56 |