Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreShape the future of the Fabric Community! Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions. Take survey.
Hello Community,
I have run into a PowerBI dataflow issue. My source table is in Big Query and is a table that gets updated daily and has data for the last 90 days in a rolling fashion.
What i would like to accomplish is provide this data to another team in the org using dataflows.
Can't use direct connection to BQ for access reasons
Can't use dataset as the team has it's own dataset with other sources
My error is when i try to create the dataflow and run the first refresh it fails on me with response too large to return error.
I tried to filter it to 30 days and that worked but even if i create 3 different dataflows i can't think of a way on how to schedule this as i am manually adding the last 30 days filter on one and the other two has actual dates for the days before accumulating to 90.
I also have made the table as lean as possible and it can't be aggregated (requirement is for that level of granularity)
I am considering bringing in just tiny amount (30 days) of data and then running on an incremental load (loading the data piece by piece until i get 90 days) but am yet to try that out? Before i begin just wanted to find if i had any other options or different approaches.
Thanks
Solved! Go to Solution.
Can't use direct connection to BQ for access reasons
Challenge that assumption. Tell them that 650 GB is too much.
If they don't budge, use Incremental Refresh. Start VERY small, with a day or so. See how many rows you get. Recommended partition size is around 8 to 20 million rows.
Please explain "import does not work" - are you getting an error message?
Direct Query on a dataflow = datamart = lipstick on a pig (Azure SQL db being the lipstick and the dataflow being the pig). Possible, sure, but a travesty nevertheless.
You can consider switching the incremental refresh to the dataset instead and to completely remove the dataflow from the equation.
You can consider dataflow Gen2 and store the results as Delta Lake in Fabric.
You have two partitions, but you should have at least 17 (14 for the days in August and 3 for the prior months)
Did you limit the source data in a way that you are not covering the cold partition range?
The only time it ran longer than those avg of 15 seconds was on 2nd of Aug, prior to that it was the regular 15 seconds until 27th july and that's the last i can see.
But if you consume the dataflow you only see two days worth of data?
Not sure if i mentioned before but the source data does run on a 2 day delay (including today), meaning today i only have data on that table till 08/12/2024
And since i kept the inc refresh at refresh rows from past 1 day since refresh date - it will not get any data because it just does not exist?
As for consumption of data once loaded into PBI or even on power query to begin with i just see data from 05/01/2024 to 07/08/2024
So, the source data itself that's on big query runs on a stored procedure on a daily basis that deletes the oldest day's data and adds new day's data into this table from a master table which has data for the last two years.
Upon querying the 90 day table it gives me distinct count of dates for last 90 days so it is functioning as intended.
Once on the dataflow the only other transformation steps i have in place are reducing the column count from 50 to 30 (remove columns) and change the data type on date column to date/time to help on the incremental refresh.
That worked! Thanks a ton for all your help!
Yea, so here's what the error reads - apologies i copy pasted the entire thing
Mashup Exception Data Source Error Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: DataSource.Error: Error in replacing table's content with new data in a version: #{0}., InnerException: #{0}: #{1}, Underlying error: ODBC: ERROR [HY000] [Microsoft][BigQuery] (100) HTTP Error 403: responseTooLarge (Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors). Details: Reason = DataSource.Error;Message = ODBC: ERROR [HY000] [Microsoft][BigQuery] (100) HTTP Error 403: responseTooLarge (Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors).;Detail = [DataSourceKind = "GoogleBigQuery", DataSourcePath = "GoogleBigQuery", OdbcErrors = error "Microsoft.Mashup.Engine1.Runtime.ValueException: [Expression.Error] Value was not specified.#(cr)#(lf) at Microsoft.Mashup.Engine1.Language.ValueCreator.CreateValueForThrow(IThrowExpression throwExpr)#(cr)#(lf) at Microsoft.Mashup.Engine1.Language.ValueCreator.<>c__DisplayClass23_0.<CreateValueForRecord>b__0(Int32 index)#(cr)#(lf) at Microsoft.Mashup.Engine1.Runtime.RecordValue.DemandRecordValue.get_Item(Int32 index)#(cr)#(lf) at Microsoft.Data.Mashup.ProviderCommon.MashupResource.TryGetValue(Func`1 getValue, IValue& value, String& errorMessage)#(cr)#(lf)Record"];Message.Format = #{0}: #{1};Message.Parameters = {"ODBC", "ERROR [HY000] [Microsoft][BigQuery] (100) HTTP Error 403: responseTooLarge (Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors)."};Microsoft.Data.Mashup.Error.Context = User
Alternatively, please let me know if there is any other approach or method i can use to accomplish the task.
Thanks
@Sam_Jain Sorry I'm not fully sure what and how your current data movement process looks like. If data has already landed in the lakehouse, you should have many options for providing rolling 90 days of data to end users. You can create a view on the SQL endpoint of the lakehouse with a 90-day rolling method and let the user use that view in their model, not sure why you want to have another dataflow on top of lakehouse. You have a lot more options available if data is already in the lakehouse.
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
Update - The GEN2 Dataflow failed as well, also on the same mark of time as the GEN1 at 1:44.
So, back to square one.
Oh i wasn't planning on a lake house. I just looked up Gen2 Dataflow and the method it suggested me was to create a lakehouse and then you will find the GEN2 dataflow option.
If there is a different method, i'd like to try that.
And the answer to your first question so basically the data itself is housed in Big Query and the original dataset runs on a daily referesh and has a data size of about 650 GB and the solution here was to create a subset so i created a new table with just the last 90 days of data and a stored procedure alongside with it to have the oldest day's data deleted and new day's data added from the original table which also runs on a daily basis to maintian that 90 day rolling window.
My only goal here is to somehow provide this data without actually giving direct access to BQ and for the team to be able to use this data on their own datasets/reports as just another source.
Thanks
@Sam_Jain is it DF Gen1 or Gen2?
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
That's a great question, sorry! I am not a hundred percent sure but the space is itself is premium with SKU:P3 and i just created the dataflow off of the workspace hitting New -> Dataflow and then to big query for the connection. I did not encounter an option for Gen 1 or Gen 2, if there is a different method to get to them i can definitely try.
Thanks
Attempting to try the GEN2 dataflow via creating the data lakehouse method.
requesting a check from @parry2k @SaiTejaTalasila @lbendlin
Please share your opinions
Much appreciate it.
Hi @Sam_Jain ,
Let's assume If you were able to bring your 90days to dataflow 1 and on the dataflow 2 apply incremental refresh and append 90+new data on it or you can take dataflow 3 and append that dataflow 1 and dataflow 2 on it.
Or
Option 2 using Fabric
You can try this bring your data to datalake and save it as a parquet file.
You can refer it and you can try it-
Loading data from Google Big Query into Microsoft Fabric using Python Notebooks: A Simple Guide
https://www.linkedin.com/pulse/loading-data-from-google-big-query-microsoft-fabric-using-jakub-vala-...
I hope it will be helpful
Thanks,
Sai Teja
Hello @SaiTejaTalasila
Thanks for your repsonse. Both are good approaches, only challenge is i need to maintain that 90 day rolling window. And if i have a consolidated dataflow my source would be the other two dataflows and i would have to somehow make them all run on an incremental refresh.
I figured out the solution with another member's help. Start small and then use incremental refresh!
Thanks again for your input.
User | Count |
---|---|
37 | |
32 | |
20 | |
11 | |
8 |
User | Count |
---|---|
52 | |
42 | |
28 | |
12 | |
11 |