The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi,
I'm trying to experiment with incremental refreshes, using our "Visit" data. "Visit" is a fact that includes not only historic visits (completed, cancelled, etc.) but also future dated "scheduled" visits.
I'd like to partition the data by encounter date (probably at the month level), but have rows updated using the update timestamp column available.
The steps I've gone through are:
1. Properly set up incremental refresh parameters
2. Set "Encounter date" to filter on the parameters
3. In the front end of desktop, told it to enable incremental refreshes
- Store 5 years of encounter dates
- Refresh rows where encounter dates in the last 5 years
- Detect changes using LastUpdateDate
The problem I'm hitting is that all future Encounter Dates are getting filtered out.
I may try to jury rig the parameters so that the RangeEnd doesn't really do anything - I'm hoping this prevents future encounter dates from getting filtered out.
Anyone have other ideas? I wish documentation on these incremental refreshes were better.
Thanks,
Scott
This seems to work:
= Table.SelectRows(sampletable, each
[datetimefield] >= RangeStart
and [datetimefield] < if Date.IsInCurrentMonth(RangeStart)
then Date.AddYears(RangeEnd,1)
else RangeEnd
)
To get this to work, I had to enclose the if statement with parenthesis:
= Table.SelectRows(sampletable, each
[datetimefield] >= RangeStart
and [datetimefield] < (if Date.IsInCurrentMonth(RangeStart)
then Date.AddYears(RangeEnd,1)
else RangeEnd)
)
Hi Scott, Any progress on this issue? By looking at your question I am assuming you are working with data out of your EMR. We have a similar issue where we are trying to find future visit dates for a Provider Availability report. Just wondering if you could share any possible solutions/workarounds. Thanks Dan
ok.. so a little bit of a hack.... until they give an effective date..
but if you have access to adjuat the source view..or can use native query and modify,
make a dummy column subtracting 100 years from your encounter date (more if you go that far back?)
and then have it use the dummy column for storing 110 years of data, and detect changes using lastupdatedate?
Hi @gebberryOffice @Anonymous
Did you manage to figure this out? I'm having the same issue but with budget numbers.
Thanks
In the documentation, like here, Microsoft mentions that they plan to add the ability to set an "effective date," other than "the current UTC date/time," from which the incremental refresh policy "steps backward" to determine the "last X periods" to be incrementally refreshed.
I wonder if that will mean that you will be able to set an effective date in the future, to solve this. For example, you could send your incremental refresh policty to "Refresh rows in the last 3 years..." and set an "effective date" of '2021-01-01 00:00:00'. Then, presumably (hopefully), it would create partitions for 2020, 2019 and 2018...and each incremental refresh would refresh all three of those partitions.
Hi SPowell42OSUWMC ,
"I may try to jury rig the parameters so that the RangeEnd doesn't really do anything - I'm hoping this prevents future encounter dates from getting filtered out."
<--- Could you share some sample data and clarify more details about your requirement?
Regards,
Jimmy Tao
Hi Jimmy, appreciate the question - and sorry for the long answer you're going to get
I can't really share data as it's HIPAA health care info, but the general scenario is that this is hospital inpatient and outpatient visit data. Visits data has a bit of complexity around "time" and incremental refreshes:
- Visits are often future dated (i.e. I've already scheduled a follup up appointment scheduled with my doctor for 12 months from now)
- Visits can often last a LONG time, for people who are very sick in a hospital. So it's hard for us to say things like "only update records whose appointment start date was within the last 30 days"...because we regularly have to go back and update records from a very long time ago. Someone in the hospital today can have been here for a long time.
All of the "Incremental update" videos / articles focus on doing something more straight forward, like "refresh the last 10 business days", because the back dating / updating of records is assumed to not go back a long time...but in our case they can / do.
Based on this, what I really want to tell PBI service to do is:
- partition based on something like "visit start date" - mainly because this should relatively equally distribute the data across partitions, NOT because it really has anything to do with limiting data for an incremental refresh
- update all records where the last update timestamp is from the last time you loaded successfully - which would get all changed records. But last update timestamp isn't really an appropriate date to use for "partitioning" data - because records don't spread uniformly across last update date (for example, if we get a system upgrade, we often have large chunks of records getting reprocessed with the same "last update timestamp")
To get this to work in PBI currently, I've set:
- Store rows where encounter date (aka visit start date) is within the last 5 years - but doing so seems to be preventing any future dated entries (for example, the doctor visit I have scheduled for a year from now) from being included in the data. I suspect strongly PBI service is putting a "where encounter date <= sysdate" kind of filter on when running the query, but it's hard for me to tell - is there anywhere for us to actually see the queries getting generated?
- Refresh rows where encounter date is within the last 5 years - because quite truthfully I just assume that any record can be updated at any point. This isn't financial data where you know that once a period closes it can't be written to again. There are no rules we can really apply to say how far back a record could be updated
- Detect data changes / only refresh when the maximum value of the "last update datetime" column changes. I hope (???) what this is doing is something like "select * from our_data_table where LastUpdateDate >= last time PBI service sucessfully loaded, but the documentation isn't very clear on exactly what this does
TL/DR version - we don't really have a great "partition by" date, and even if I choose one, it can't really be used to limit the refresh dataset - only the LastUpdateDate column can help with that.
So my idea was to code things in Power BI query so that RangeEnd is somehow referenced, but doesn't actually do anything (I might add 500 years to the RangeEnd value or something when putting in the data filter). I hope that this allows the incremental refresh to "work" from the standpoint that the service will allow it, but NOT filter out the future dated records
I hope this makes sense! I do appreciate your help - and it looks like you are from Microsoft, so if needed I could always open a ticket on this.
Thanks!
Scott
Hi Scott, Any progress on this issue? By looking at your question I am assuming you are working with data out of your EMR. We have a similar issue where we are trying to find future visit dates for a Provider Availability report. Just wondering if you could share any possible solutions/workarounds. Thanks Dan
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
50 | |
21 | |
12 | |
11 | |
10 |
User | Count |
---|---|
120 | |
32 | |
31 | |
21 | |
20 |