Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
matt-stuart
New Member

Using local cache for table union on multiple tables?

We've connected one of our Power BI desktop instances to an ODataV4 feed that provides several entity sets, in this case we're only interested in 4 of them. All 4 share at least a few columns and we'd like to do a union of all 4 tables of the common columns into a new table that can then be used to power a few visualziations. What we're finding is that the merge table query, which I'll include below, goes back to the origin servers to grab _all_ the data from all 4 tables before doing the merge rather than using the data which should already be in Power Bi. I'm wondering if we're missing something here in our configuration or in how PowerBI works..

 

Technical:

 

Tables: TA, TB, TC, TD

Common Columns: Date (Date), Type (string)

Merge Table Query:

 

let
    CA=Table.SelectColumns(TA, {"Date", "Type"}),
    CB=Table.SelectColumns(TB, {"Date", "Type"}),
    CC=Table.SelectColumns(TC, {"Date", "Type"}),
    CD=Table.SelectColumns(TD, {"Date", "Type"}),
    AggregateTable=Table.Combine({CA, CB, CC, CD}),
    #"Grouped"=Table.Group(AggregateTable, {"Date", "Type"], {{"Count", each Table.RowCount(_), type number}}),
    #"Pivoted"=Table.Pivot(#"Grouped", List.Distinct(#"Grouped"[Type]), "Type", "Count", List.Sum)
in
    #"Pivoted"
    

Basically we want to end up with data that looks like:

 

Date | TypeValue1 | TypeValue2 | TypeValue3
--------------------------------------------------
2017-01-01 | 10 | 0 | 5
2017-01-01 | 3 | 2 | 7
2017-01-02 | 0 | 10 | 6
2017-01-03 | 1 | 8 | 0

The issue is each of the 4 tables contains millions of rows, we do the refresh often enough on those tables for our purposes but by god it goes and asks for all the data again one it calculates the aggregate table! Is there a behaviour we can use to force the aggregate to just pull the data from the on disk copies of the table?

 

Notes:

* This data is used primarily to drive visualizations, if there's other ways to get the line graphs we're looking for then by all means poke me in the eye!

* We'd be happy to look at aggregation severs that are responsible for pulling in the raw data and storing on premise / cloud that PowerBI would then pull from (Gateway perhaps?)

 

Thanks for your help & thoughts!

2 REPLIES 2
v-chuncz-msft
Community Support
Community Support

@matt-stuart,

 

You may try Aggregation Extension first.

Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
matt-stuart
New Member

We've connected one of our Power BI desktop instances to an ODataV4 feed that provides several entity sets, in this case we're only interested in 4 of them. All 4 share at least a few columns and we'd like to do a union of all 4 tables of the common columns into a new table that can then be used to power a few visualziations. What we're finding is that the merge table query, which I'll include below, goes back to the origin servers to grab _all_ the data from all 4 tables before doing the merge rather than using the data which should already be in Power Bi. I'm wondering if we're missing something here in our configuration or in how PowerBI works..

 

Technical:

 

Tables: TA, TB, TC, TD

Common Columns: Date (Date), Type (string)

Merge Table Query:

 

let
    CA=Table.SelectColumns(TA, {"Date", "Type"}),
    CB=Table.SelectColumns(TB, {"Date", "Type"}),
    CC=Table.SelectColumns(TC, {"Date", "Type"}),
    CD=Table.SelectColumns(TD, {"Date", "Type"}),
    AggregateTable=Table.Combine({CA, CB, CC, CD}),
    #"Grouped"=Table.Group(AggregateTable, {"Date", "Type"], {{"Count", each Table.RowCount(_), type number}}),
    #"Pivoted"=Table.Pivot(#"Grouped", List.Distinct(#"Grouped"[Type]), "Type", "Count", List.Sum)
in
    #"Pivoted"
    

Basically we want to end up with data that looks like:

 

Date | TypeValue1 | TypeValue2 | TypeValue3
--------------------------------------------------
2017-01-01 | 10 | 0 | 5
2017-01-01 | 3 | 2 | 7
2017-01-02 | 0 | 10 | 6
2017-01-03 | 1 | 8 | 0

The issue is each of the 4 tables contains millions of rows, we do the refresh often enough on those tables for our purposes but by god it goes and asks for all the data again one it calculates the aggregate table! Is there a behaviour we can use to force the aggregate to just pull the data from the on disk copies of the table?

 

Notes:

* This data is used primarily to drive visualizations, if there's other ways to get the line graphs we're looking for then by all means poke me in the eye!

* We'd be happy to look at aggregation severs that are responsible for pulling in the raw data and storing on premise / cloud that PowerBI would then pull from (Gateway perhaps?)

 

Thanks for your help & thoughts!

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.