What’s the best way to archive Workflow data?
Some customers have a requirement that for audit purposes, they need to keep the Workflow runtime data for quite some time – I worked with a bank in the UK which wanted to keep everything in the Workflow tables for seven years! So here’s a couple of tips for consideration.
Firstly, if you need to keep runtime data for completed workflows for more than a month or so, then you shouldn’t do it in your Workflow system – archive it somewhere else. If you leave large quantities of data in tables which are being hit frequently during run-time, then there will be a significant performance hit. I’ve seen a number of posts in various lists which say something like “I’ve got 2 million rows in my item attribute values table, we’ve never purged anything and the system is slow – what can I do?”
By the time you have this many records, purging the data will also take a long time (some customers have reported that it takes over a day on some systems!!) – a Catch-22 situation, since you can’t purge to remove the data because it takes too long, and in the meantime lots more data is being written to the Workflow tables…
If you are archiving the data elsewhere, there are two different ways you can approach this – either push the data from the Workflow tables into the archive system, or you can pull it into the archive system from the Workflow system.
For a push, you would need to write triggers to each table that workflow writes to, so that whenever data is inserted or updated, the change is replicated in the archive system. This would be very processor intensive, since you are effectively running the same job twice. I would not recommend this method.
For a pull, you need to write something that can run at the end of the day to copy everything from the Workflow system into the archive. This is significantly better for performance, since it can be scheduled to run when the system workload is low, and so should not impact the operation of the system. Once you have successfully archived the data, you should then purge the Workflow data. If the requirement is just to archive the runtime data (and not to archive changes to the workflow definition), then you can use the queries which are executed in the standard wfstat.sql script for any completed workflows to determine what information you need to keep.
My recommendation would be to take the second option, and pull the data from Workflow into your archive system. The archive can even be kept in the same database as the workflow system, but in completely separate tables which are not used during regular operation, or it could be on a completely separate system using something like a database link to connect to the different database. You could even write something to pull all the information from the runtime tables and create a payload which can be enqueued onto an Oracle Advanced Queue – from there the you can determine what other environments pick up the message, and you could just leave the message on the queue for processing / viewing later.
As ever, any comments or views on my suggestions are more than welcome!



Leave a Reply