Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
I have a delta table that is updated hourly and transformation notebooks that run every 6 that work off change data feed results. Oddly, I am receiving an error message even though the transaction log files appear to be present. I am able to query all versions up to and including version 270. I noticed there are two checkpoints between now and version 269 but do not believe that is cause for concern. Additionally, I only see merge commands since this time when I view history for this table (don't see any vacuum or other maintenance command issued).
I did not change retention settings, so I assume 30 days history should be available (default). I started receiving this error within a 24 hour period of the transaction log occurrence.
Below is a screenshot of files available, the command I am attempting to run, the error message I received, and finally a screenshotof the table history.
Any ideas what went wrong or if I am not comprehending how delta table / change data feeds operate?
Screenshot:
Command:
Error Message:
org.apache.spark.sql.delta.DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] abfss://adf33498-94b4-4b05-9610-b5011f17222e@onelake.dfs.fabric.microsoft.com/93c6ae21-8af8-4609-b3ab-24d3ad402a8a/Tables/PaymentManager_dbo_PaymentRegister/_delta_log/00000000000000000000.json: Unable to reconstruct state at version 269 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)
Screenshot of table History:
Solved! Go to Solution.
Hello @mhseko
Your error is not a bug in Delta Lake or CDF, but a result of how retention and log management work. Once the log or checkpoint files for a version are gone, you cannot reconstruct the table or access its CDF for that version, regardless of visible recent logs or lack of `VACUUM` commands. To avoid this in the future, extract and store CDF data you need to retain before the retention window expires.
Delta log files have a default retention period of 30 days in Fabric, after which they are automatically deleted. This is not controlled by manual `VACUUM` commands; the deletion is managed by Fabric’s background processes
If you want to retain more history, you must set the retention properties (`delta.logRetentionDuration`, `delta.deletedFileRetentionDuration`) to a longer interval. These should persist unless the table is dropped/recreated
Hello @mhseko
Your error is not a bug in Delta Lake or CDF, but a result of how retention and log management work. Once the log or checkpoint files for a version are gone, you cannot reconstruct the table or access its CDF for that version, regardless of visible recent logs or lack of `VACUUM` commands. To avoid this in the future, extract and store CDF data you need to retain before the retention window expires.
Delta log files have a default retention period of 30 days in Fabric, after which they are automatically deleted. This is not controlled by manual `VACUUM` commands; the deletion is managed by Fabric’s background processes
If you want to retain more history, you must set the retention properties (`delta.logRetentionDuration`, `delta.deletedFileRetentionDuration`) to a longer interval. These should persist unless the table is dropped/recreated
I have verified this with multiple delta tables. A given transaction log file cannot be utilized if the preceding checkpoint file is missing. Ok, I'll update schedules accordingly. In general, the processes I have work fine. This issue cropped up for a table that did not have updates for an extended period of time.
User | Count |
---|---|
5 | |
4 | |
3 | |
2 | |
2 |
User | Count |
---|---|
10 | |
8 | |
7 | |
6 | |
6 |