Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Level up your Power BI skills this month - build one visual each week and tell better stories with data! Get started

Reply
liveincolorado
New Member

Inferring Missing Columns and Imputing Missing Data

I am in the analysis stage of creating a PBI Report.  Here is what I am working with (not final...just preliminary  steps):

Tabular Editor / The Cube
Oracle SQL Developer

Power BI Desktop
Excel (used by stakeholder to show me how/what they want)

In Excel, the stakeholder's SQL is restricted by date guardrails (...and start_date <= to_date('31-MAR-2025', 'DD-MM-YY') etc).

We're going to let PBI slice and dice dates.  

acct_id,

rec_start_dt,

rec_end_dt,

sa_start_dt,s

a_end_dt,

tot_bal,

curr_bal

Here is an example of the sql output that will become source in cube, just the first few rows:

ACCT_IDCLI_START_DTCLI_END_DTSA_START_DTSA_END_DTTOTAL_BALANCECURRENT_BALANCE=IF(AND(B2<=DATE(2022,8,31), C2>=DATE(2022,8,1)), "Yes", "No")=IF(AND(D2<=DATE(2022,8,31), E2>=DATE(2022,8,1)), "Yes", "No")=IF(AND(H2="Yes", I2="Yes"), "Yes", "No")
12345613-Jul-2130-May-2427-Apr-2116-May-23107.44107.44YesYesYes
5647891-Sep-2216-Oct-227-Feb-2220-Jun-25103103NoYesNo
56478917-Oct-22        7-Feb-2220-Jun-25103103NoYesNo
63524128-Aug-231-May-2410-May-1929-Sep-25247.15247.15NoYesNo
6352412-May-2429-Sep-2610-May-1929-Sep-25247.15247.15NoYesNo
56892318-May-23        5-Feb-202-Dec-22-962-962NoYesNo
56892329-Oct-2017-May-235-Feb-202-Dec-22-962-962YesYesYes
95184715-Mar-213-Jun-2326-Oct-183-Jun-22539.3539.3YesNoNo
59263423-Sep-19         12-Sep-1727-Feb-231540.351511.79YesYesYes
95847126-Aug-25        8-Mar-2326-Aug-25876.254NoNoNo
25896328-Feb-1120-Jun-2514-Sep-9220-Jun-2400YesYesYes
85214728-Nov-225-Dec-2412-Oct-1523-May-251542.38457.97NoYesNo
4569871-Aug-232-Sep-263-Mar-212-Sep-2549.90NoYesNo
45632120-Nov-2322-Dec-2630-Dec-2230-Mar-26337.54337.54NoNoNo
          
 4 11 685.79 4  
 =COUNTIFS(B2:B15,"<=8/31/2022", C2:C15,">=8/1/2022") =COUNTIFS(D2:D15,"<=8/31/2022", E2:E15,">=8/1/2022") =SUMIF(J2:J15, "Yes", F2:F15) =COUNTIFS(H2:H15,"Yes", I2:I15,"Yes")  

 

 

Ask 1: Active Accounts (measure in Power BI)
I need to count the number of active accounts (monthly) and that definition of active is:


between (and including) cli_start_dt and cli_end_dt
AND
between (and including) sa_start_dt and sa_end_dt
AND if no cli_end_dt and no sa_end_dt, assume account is still active.

 

So, ACCT_ID 123456 is active for August 2022, even though that date is not explicitly stated in the table.

 

Ask 2: Total Balance (measure in Power BI)
I need to sum the TOTAL_BALANCE of ONLY the active accounts in the months they were active (Ask 1)

1.  Need to infer the value of the assumed missing dates cells

2.  If real or assumed cell is empty, the value will be the prior month's value.

ACCT_ID 123456 will have 107.44 in the assumed (but missing) date of April 2023, but 0.00 in June 2024

Ask 3: Avg Payoff Balance per Account

I need the average payoff balance per Account (dates will be filtered by slicers)

1.  Active accounts only

2.  If cell is empty, then use previous month's values


August 2022 had 4 active acounts, Total balance for those active account for Aug 2022 is 685.79 (assume every single value for all inferred months/years is from column F.  Average Payoff Balance per Account is $171.48 (685.75/4 active accounts)
Month

The table or matrix would look like this:

2022

MonthNumber of active AccountsTotal balanceAverage Payoff Balance per Account
August4685.79171.48



What is the DAX for the three columns

 

I have written soooo many iterations

1 ACCEPTED SOLUTION

Hi @liveincolorado,

 

Thank you for the update.

Total Balance =
VAR MonthStart = MIN('Date'[Date])
VAR MonthEnd = MAX('Date'[Date])

RETURN
SUMX(
VALUES(Accounts[ACCT_ID]),

VAR LatestDate =
CALCULATE(
MAX(Accounts[CLI_START_DT]),
FILTER(
Accounts,
Accounts[ACCT_ID] = EARLIER(Accounts[ACCT_ID]) &&
Accounts[CLI_START_DT] <= MonthEnd
)
)

 

VAR Balance =
CALCULATE(
MAX(Accounts[TOTAL_BALANCE]),
Accounts[CLI_START_DT] = LatestDate
)

 

VAR IsActive =
CALCULATE(
COUNTROWS(Accounts),
FILTER(
Accounts,
Accounts[ACCT_ID] = EARLIER(Accounts[ACCT_ID]) &&
Accounts[CLI_START_DT] <= MonthEnd &&
COALESCE(Accounts[CLI_END_DT], DATE(9999,12,31)) >= MonthStart &&
Accounts[SA_START_DT] <= MonthEnd &&
COALESCE(Accounts[SA_END_DT], DATE(9999,12,31)) >= MonthStart
)
)

 

RETURN
IF(IsActive > 0, Balance)
)

 

Thankyou.

View solution in original post

13 REPLIES 13
v-sgandrathi
Community Support
Community Support

Hi @liveincolorado,

 

You’re on the right track by simplifying the goal to aggregated yearly stats and using tiered refresh logic, which is a solid way to handle a 5GB dataset.

However, the current SQL approach, creating monthly flags like active1_jan_2025 and repeating logic for each month and year will be hard to maintain and scale, especially over more than 10 years.

Instead of creating separate columns for each month, consider restructuring the output to return data at a month grain (row-based), such as acct_id, year, month, is_active, and total_balance. This method avoids repeating SQL for every month and keeps the dataset flexible.

For your refresh strategy, your idea of quarterly/monthly/daily splits fits well with incremental refresh or partitioning. You can still use time-bound SQL, but it’s better to parameterize it by year or date range rather than hardcoding each month.

Your use of window functions for balances is effective. You can expand this to create a monthly snapshot table (one row per account per month), which Power BI can easily aggregate to the yearly level.

 

Thank you.

liveincolorado
New Member

Need to rethink the goal of the report.  The focus is aggregated yearly stats with no need (at least for now) for any drill-through.  The SQL will be time-bound (never ideal), because the full dataset is 5gb.  Since the focus is on more recent data (after 2023), but access to more historical data (before 2023), then data refreshes for data before 2023 can be quarterly and data refreshes for 2023 -2025 can be monthly, and data for 2026 can be daily.  The following example is being tested for each year going back 10 years:


--ACCOUNTS
WITH ACTIVE AS (
SELECT DISTINCT

a.acct_id,

a.a_start,

a.a_end,

b.b_start,

b.b_end,

FROM accounts a

LEFT JOIN status b

ON a.acct_id = b.acct_id

WHERE a.type = 'current'

AND a.a_start <= DATE '2025-12-31'

AND (a.a_end >= DATE '2025-01-31'

)

 

SELECT * FROM (

SELECT

active.acct_id,

active.a_start,

active.a_end,

active.b_start,

active b_end,

/* FLAG 1 */
CASE WHEN EXISTS (

SELECT 1 FROM accounts a

WHERE a.type = 'current'

AND a.acct_id = active.acct_id

AND a.a_start <= DATE '2025-01-31'

AND (a.a_end >= DATE '2025-01-01 OR a.a_end IS NULL)
) THEN 'Yes' ELSE 'No' END AS active1_jan_2025

 

/ * REPEAT FOR EACH MONTH REMAINING: FEB - DEC */

 

/* FLAG 2 */

CASE WHEN EXISTS (

SELECT 1 FROM accounts a

WHERE a.type = 'current'

AND a.acct_id = active.acct_id

AND b.b_start <= DATE '2025-01-31'

AND b.b_end >= DATE '2025-01-01 OR b.b_end IS NULL)
) THEN 'Yes' ELSE 'No' END AS active2_jan_2025


/ * REPEAT FOR EACH MONTH REMAINING: FEB - DEC */

 

FROM ACTIVE

)

;

 

 


--BALANCES:

WITH ACCOUNTS AS (
SELECT DISTINCT a.acct_id
FROM accounts.account a
WHERE a.type = 'current'
AND a.start <= '2025-12-31'
AND (a.end >= '2025-01-01 OR a.end IS NULL)
),
JANBAL AS (
SELECT
b.acct_id,
SUM(total) AS total_jan_2025,
SUM(current) AS current_jan_2025
FROM amounts b
JOIN transactions c ON b.id = c.id
WHERE TRYN(c.create) <= DATE '2025-01-31'
GROUP BY b.acct_id

/* REPEAT FOR EACH MONTH REMAINING: FEB - DEC */

)

SELECT
acct_id,
janbal.tot_jan_2025,
janbal.cur_jan_2025,
/* REPEAT FOR EACH MONTH REMAINING: FEB - DEC */

FROM ACCOUNTS
LEFT JOIN JANBAL USING (ACCT_ID)
/* REPEAT FOR EACH MONTH REMAINING: FEB - DEC */

ORDER BY acct_id
;

v-sgandrathi
Community Support
Community Support

Hi @liveincolorado,

 

Thank you for the update.

It's great that you found a more efficient way by managing part of the logic in SQL, as that should help with performance.

If you need any more help with the data in Power BI, please let me know.

Thanks again.

v-sgandrathi
Community Support
Community Support

Hi @liveincolorado,

 

Thank you for sharing your Power Query solution, it's a thorough approach, and you've clearly considered the monthly expansion and carry-forward logic. However, I recommend caution, as creating a full account × month cross join and multiple transformations can result in a large dataset and may affect performance as your data grows. This could also increase model complexity.

In Power BI, it's typically more efficient to handle this logic in DAX measures rather than in Power Query. The DAX method dynamically evaluates active accounts by date context, manages open-ended dates with COALESCE, and retrieves the latest balance without generating all month combinations.

Your Power Query method is valid if you require a precomputed snapshot table, but it's best to limit the date range and avoid heavy pivoting to keep the model manageable. I suggest testing both approaches with your dataset to compare performance, refresh time, and usability before making a final decision.

 

Thank you.

Thank you.  It is too heavy a lift for the report, so we are going back to sql statements.  There is one for the accounts (with start and end dates) and then one for balances that aggregates "total this month" as 

SUM(month_total_balance) OVER (

PARTITION BY acct_id
ORDER BY monthly_balance

) as monthly total

I can work with this in PBI.

Hi @liveincolorado,

 

Thanks for the update. Moving the logic to SQL is a good idea when Power BI transformations get complex, especially with account and month expansions.

Using window functions for monthly totals per account is a solid approach. You could also create a monthly snapshot in SQL with active account flags and balances, so Power BI can use the dataset directly.

With the data shaped at the source, Power BI can handle slicing, aggregation, and visualization more efficiently. If you have any issues modeling this, let me know, I’m happy to help.

 

Thank you.

 

liveincolorado
New Member

Thank you!  I'll let you know how this works.  I've been playing with this:

let // Replace this with your actual source table step Source = YourSourceTableHere, // Ensure types TypeFixed = Table.TransformColumnTypes(Source, {{"acct_id", type text}, {"cli_start_dt", type date}, {"cli_end_dt", type nullable date}, {"sa_start_dt", type date}, {"sa_end_dt", type nullable date}, {"total_balance", type nullable number}, {"current_balance", type nullable number}}),

// Window parameters: set first month and number of months StartMonth = #date(2016, 4, 1), // set to desired first month (first day) Months = 123,

// Build global month list for the window MonthList = List.Transform({0..(Months-1)}, each Date.AddMonths(StartMonth, _)), MonthTable = Table.FromList(MonthList, Splitter.SplitByNothing(), {"MonthStart"}), MonthTable2 = Table.AddColumn(MonthTable, "YearMonth", each Date.ToText([MonthStart],"yyyy-MM")),

// Normalize null ends to a far future so open-ended accounts are treated as active if in window NormEnds = Table.TransformColumns(TypeFixed, {{"cli_end_dt", each if _ = null then #date(9999,12,31) else _, type date}, {"sa_end_dt", each if _ = null then #date(9999,12,31) else _, type date}}),

// Cross join accounts with month table => account x month (full window) AddKeysrc=Table.AddColumn(NormEnds, "Key", each 1), AddKeyMonths = Table.AddColumn(MonthTable2, "Key", each 1), Join = Table.NestedJoin(AddKeySrc, "Key", AddKeyMonths, "Key", "M", JoinKind.Inner), ExpandMonths = Table.ExpandTableColumn(Join, "M", {"MonthStart","YearMonth"}, {"MonthStart","YearMonth"}),

// Compute month end for comparisons AddMonthEnd = Table.AddColumn(ExpandMonths, "MonthEnd", each Date.EndOfMonth([MonthStart]), type date),

// Flags per month AddCLIFlag = Table.AddColumn(AddMonthEnd, "CLI_In_Month", each ([cli_start_dt] <= [MonthEnd] and [cli_end_dt] >= [MonthStart]), type logical), AddSAFlag = Table.AddColumn(AddCLIFlag, "SA_In_Month", each ([sa_start_dt] <= [MonthEnd] and [sa_end_dt] >= [MonthStart]), type logical), AddActive = Table.AddColumn(AddSAFlag, "IsActive", each ([CLI_In_Month] and [SA_In_Month]), type logical),

// Sort and carry-forward balances per acct across months (fill down) Sorted = Table.Sort(AddActive, {{"acct_id", Order.Ascending}, {"MonthStart", Order.Ascending}}), KeepCols = Table.SelectColumns(Sorted, {"acct_id","MonthStart","YearMonth","MonthEnd","CLI_In_Month","SA_In_Month","IsActive","total_balance","current_balance"}), Grouped = Table.Group(KeepCols, {"acct_id"}, {{"Rows", each Table.FillDown(Table.TransformColumnTypes(_, {{"total_balance", type nullable number}, {"current_balance", type nullable number}}), {"total_balance","current_balance"})}}), Expanded = Table.ExpandTableColumn(Grouped, "Rows", {"MonthStart","YearMonth","MonthEnd","CLI_In_Month","SA_In_Month","IsActive","total_balance","current_balance"}),

// Replace remaining null balances (leading nulls) with 0 ReplaceNulls = Table.ReplaceValue(Expanded, null, 0, Replacer.ReplaceValue, {"total_balance","current_balance"}),

// Pivot each measure into month-columns. We'll create three pivoted tables and join them back. SelectBase = Table.SelectColumns(ReplaceNulls, {"acct_id","YearMonth","CLI_In_Month","SA_In_Month","IsActive","total_balance","current_balance"}),

// Pivot CLI CLI = Table.SelectColumns(SelectBase, {"acct_id","YearMonth","CLI_In_Month"}), CLI_Pivot = Table.Pivot(Table.TransformColumnTypes(CLI, {{"YearMonth", type text}}), List.Distinct(CLI[YearMonth]), "YearMonth", "CLI_In_Month", List.Max),

// Pivot SA SA = Table.SelectColumns(SelectBase, {"acct_id","YearMonth","SA_In_Month"}), SA_Pivot = Table.Pivot(Table.TransformColumnTypes(SA, {{"YearMonth", type text}}), List.Distinct(SA[YearMonth]), "YearMonth", "SA_In_Month", List.Max),

// Pivot IsActive ACT = Table.SelectColumns(SelectBase, {"acct_id","YearMonth","IsActive"}), ACT_Pivot = Table.Pivot(Table.TransformColumnTypes(ACT, {{"YearMonth", type text}}), List.Distinct(ACT[YearMonth]), "YearMonth", "IsActive", List.Max),

// Pivot total_balance TB = Table.SelectColumns(SelectBase, {"acct_id","YearMonth","total_balance"}), TB_Pivot = Table.Pivot(Table.TransformColumnTypes(TB, {{"YearMonth", type text}}), List.Distinct(TB[YearMonth]), "YearMonth", "total_balance", List.Max),

// Pivot current_balance CB = Table.SelectColumns(SelectBase, {"acct_id","YearMonth","current_balance"}), CB_Pivot = Table.Pivot(Table.TransformColumnTypes(CB, {{"YearMonth", type text}}), List.Distinct(CB[YearMonth]), "YearMonth", "current_balance", List.Max),

// Merge all pivot tables on acct_id Merge1 = Table.NestedJoin(CLI_Pivot, "acct_id", SA_Pivot, "acct_id", "SA", JoinKind.LeftOuter), Merge2 = Table.ExpandTableColumn(Merge1, "SA", Table.ColumnNames(SA_Pivot)), Merge3 = Table.NestedJoin(Merge2, "acct_id", ACT_Pivot, "acct_id", "ACT", JoinKind.LeftOuter), Merge4 = Table.ExpandTableColumn(Merge3, "ACT", Table.ColumnNames(ACT_Pivot)), Merge5 = Table.NestedJoin(Merge4, "acct_id", TB_Pivot, "acct_id", "TB", JoinKind.LeftOuter), Merge6 = Table.ExpandTableColumn(Merge5, "TB", Table.ColumnNames(TB_Pivot)), Merge7 = Table.NestedJoin(Merge6, "acct_id", CB_Pivot, "acct_id", "CB", JoinKind.LeftOuter), Final = Table.ExpandTableColumn(Merge7, "CB", Table.ColumnNames(CB_Pivot)) in Final

Unfortunately, this produces the same error as my earlier attempts.  It is not counting the total active accounts in a month.  

Hi @liveincolorado,

 

Thank you for the update.

Total Balance =
VAR MonthStart = MIN('Date'[Date])
VAR MonthEnd = MAX('Date'[Date])

RETURN
SUMX(
VALUES(Accounts[ACCT_ID]),

VAR LatestDate =
CALCULATE(
MAX(Accounts[CLI_START_DT]),
FILTER(
Accounts,
Accounts[ACCT_ID] = EARLIER(Accounts[ACCT_ID]) &&
Accounts[CLI_START_DT] <= MonthEnd
)
)

 

VAR Balance =
CALCULATE(
MAX(Accounts[TOTAL_BALANCE]),
Accounts[CLI_START_DT] = LatestDate
)

 

VAR IsActive =
CALCULATE(
COUNTROWS(Accounts),
FILTER(
Accounts,
Accounts[ACCT_ID] = EARLIER(Accounts[ACCT_ID]) &&
Accounts[CLI_START_DT] <= MonthEnd &&
COALESCE(Accounts[CLI_END_DT], DATE(9999,12,31)) >= MonthStart &&
Accounts[SA_START_DT] <= MonthEnd &&
COALESCE(Accounts[SA_END_DT], DATE(9999,12,31)) >= MonthStart
)
)

 

RETURN
IF(IsActive > 0, Balance)
)

 

Thankyou.

Hi @liveincolorado,

 

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help.

Thank you.

Hi @liveincolorado,

 

I wanted to follow up on our previous suggestions regarding the issue. We would love to hear back from you to ensure we can assist you further.

 

Thank you.

grazitti_sapna
Super User
Super User

Hi @liveincolorado,

 

I've created the sample .pbix file for you with DAX for all the three columns.

Active Accounts = 
VAR StartDate = MIN('Date'[Date])
VAR EndDate = MAX('Date'[Date])
RETURN
CALCULATE(
    DISTINCTCOUNT(Accounts[ACCT_ID]),
    FILTER(
        Accounts,
        
        -- CLI condition
        Accounts[CLI_START_DT] <= EndDate &&
        COALESCE(Accounts[CLI_END_DT], DATE(9999,12,31)) >= StartDate &&

        -- SA condition
        Accounts[SA_START_DT] <= EndDate &&
        COALESCE(Accounts[SA_END_DT], DATE(9999,12,31)) >= StartDate
    )
)
Total Balance = 
VAR EndDate = MAX('Date'[Date])
RETURN
SUMX(
    VALUES(Accounts[ACCT_ID]),
    
    VAR LatestRow =
        CALCULATE(
            MAX(Accounts[CLI_START_DT]),
            FILTER(
                Accounts,
                Accounts[ACCT_ID] = EARLIER(Accounts[ACCT_ID]) &&
                Accounts[CLI_START_DT] <= EndDate
            )
        )

    VAR Balance =
        CALCULATE(
            MAX(Accounts[TOTAL_BALANCE]),
            Accounts[CLI_START_DT] = LatestRow
        )

    VAR IsActive =
        CALCULATE(
            COUNTROWS(Accounts),
            FILTER(
                Accounts,
                Accounts[ACCT_ID] = EARLIER(Accounts[ACCT_ID]) &&
                Accounts[CLI_START_DT] <= EndDate &&
                COALESCE(Accounts[CLI_END_DT], DATE(9999,12,31)) >= EndDate &&
                Accounts[SA_START_DT] <= EndDate &&
                COALESCE(Accounts[SA_END_DT], DATE(9999,12,31)) >= EndDate
            )
        )

    RETURN IF(IsActive > 0, Balance)
)

 

Avg Payoff Balance = 
DIVIDE(
    [Total Balance],
    [Active Accounts]
)

 

Please see attached file.

 

🌟 I hope this solution helps you unlock your Power BI potential! If you found it helpful, click 'Mark as Solution' to guide others toward the answers they need.
💡 Love the effort? Drop the kudos! Your appreciation fuels community spirit and innovation.
🎖 As a proud SuperUser and Microsoft Partner, we’re here to empower your data journey and the Power BI Community at large.
🔗 Curious to explore more? [Discover here].
Let’s keep building smarter solutions together!

Hi @liveincolorado,

 

Did you try this solution? if it works kindly mark as accepted soltution, if not let me know the challange you are facing

Helpful resources

Announcements
April Power BI Update Carousel

Power BI Monthly Update - April 2026

Check out the April 2026 Power BI update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.