Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
I'm testing PowerBi and the azure documentdb datasource. It works fine, but when I try to Refresh the data it takes forever. I have 500 000 records in this database, and it uses around 15 minutes to refresh. In a production environment it would be 25 million records in this database and I would like it to update as often as possible (every 1 hour it seems), but with this refresh speed it would not finnish an update within 1 hour. Am I doing someting wrong? I was thinking that PowerBi should handle large amount of data, but it looks like every time I update it fetches all records from the database again (even if none of them has changed).
Solved! Go to Solution.
@gulrog Below is the screenshot of data refresh rate in power bi based on subscription. Even if you have Pro it is going to take a lot to refresh 25 million records. Data refresh in power bi is not delta so each time you refresh it gets all the data.
I would suggest you look at limiting your data you import. Also power bi has direct query feature that doesn't import but connects live to data source. DirectQuery is available with azure sql db and azure sql dw but i don't think it is yet available with document db since it is still in beta. You can submit that as an idea to get it prioritised.
@gulrog Below is the screenshot of data refresh rate in power bi based on subscription. Even if you have Pro it is going to take a lot to refresh 25 million records. Data refresh in power bi is not delta so each time you refresh it gets all the data.
I would suggest you look at limiting your data you import. Also power bi has direct query feature that doesn't import but connects live to data source. DirectQuery is available with azure sql db and azure sql dw but i don't think it is yet available with document db since it is still in beta. You can submit that as an idea to get it prioritised.
Hmm... I get a feeling that the whole product is in beta. I was first trying to use the REST API, but it turned out it was not possible to do something as simple as search out records from yesterday (because of missing DAX support). I then I then decided to use Azure stream analytics together with power bi to analyze realtime events and have an auto updating dashboard. It worked fine, but I also need ot analyze the historical data, so I tried with azure documentdb, but as it turns out it is not a good match with PowerBI, so how I see it now, only traditional sql server will work, but then I need to investigate a lot into tuning indexes and partitions to make it perform good enough with 25 million records. SQL DW could be an option, but I fear the learning curve is a bit steep since I don't know anything about DW.
I read that the team behind PowerBI had a goal that people that see the product should Wow within 5 minutes. I really did, but sadly that Wow turned into a dissapointment when digging a bit deeper.
Anyhow - thank you for taking time to answer my question.
Filtering DocumentDB data based on relative times at the source is possible with some tweaking:
1. Create a DocumentDB data source as normal. Do not enter any custom SQL.
2. Right click the query and select "Advanced Editor"
3. Immediately after the let line add in your custom query such as the example below. This uses the Power Query functions to construct a string on the fly that will be sent to the database. In this case it works out the date for 30 days ago and embeds that date and the current date into the string to get data for the last 30 days only. If done correctly you should see a new "querystring" step added under Query Settings.
let
querystring = "SELECT * FROM yourdoc WHERE (yourdoc.datetimefield BETWEEN '" & DateTimeZone.ToText(DateTimeZone.UtcNow() - #duration(30,0,0,0),"yyyy-MM-ddTHH:mm:ss.000Z" ) & "' AND '" & DateTimeZone.ToText(DateTimeZone.UtcNow(),"yyyy-MM-ddTHH:mm:ss.000Z") & "')",
*** rest of original script ***
4. Under query settings go to the Source step. Should look something like:
= DocumentDB.Contents("https://yourserver.documents.azure.com", "yourdb", "yourcollection")
5. Add a 4th parameter to the end exactly as shown here-> [Query = querystring]
= DocumentDB.Contents("https://yourserver.documents.azure.com", "yourdb", "yourcollection", [Query = querystring])
PowerBI will now use a filtered query against DocumentDB
Dear all,
How should my query look like when I specifically need to get data where "Document.LaneId"=1
let Source = DocumentDB.Contents("https://xxx-iot-service-data.documents.azure.com/"), iotreportdata = Source{[id="iotreportdata"]}[Collections], iotreportdata_xxxTestingAggregatedBy5Minutes = iotreportdata{[db_id="iotreportdata",id="XXXTestingAggregatedBy5Minutes"]}[Documents], #"Expanded Document" = Table.ExpandRecordColumn(iotreportdata_XXXTestingAggregatedBy5Minutes, "Document", #"Changed Type2" = Table.TransformColumnTypes(#"Renamed Columns",{{"Document.ProductsInRange", Int64.Type}, {"Time", type time}}) in #"Changed Type2"
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
37 | |
30 | |
18 | |
12 | |
8 |
User | Count |
---|---|
49 | |
40 | |
32 | |
16 | |
14 |