Service Level Agreement
According to the crack team of lawyers I keep on retainer, I should be clear about the Service Level Agreement in place.
- We do not commit to a maximum amount of downtime.
- We do not commit to a maximum time to recover from downtime.
That said, Jason’s mail is stored on this server, as well as his public resumé. Things do happen; but Jason’s motivated to keep downtime minimal and quick,
Staffing: Just me, and my spare time. I try to handle outages immediately; and I’ve added several people to the access list at the datacenter just in case. Most issues can be handled via remote keyboard/monitor (I have that) and remote hands (the datacenter provides that).
Maintenance: We will try and do major maintenance at off-hours. However, Jason’s schedule may require that more common typical maintenance be done at any time.
Monitoring: Both on-site monitoring (system health, several services) as well as remote monitoring (basic HTTP health check) are in place. The remote monitoring is tied to Jason’s pager. Additionally, several end users have their own monitoring set up, and in turn contact Jason and/or try and effect repair directly if possible.
Major upgrades: These typically *will* take a day of downtime. These happen every couple of years. Typically this is done by either swapping the entire computer; or at minimal swapping a newly/mostly built main hard drive, syncing files, booting, and seeing what breaks.
The main system that
/www/virt/ reside on is mirrored. If one of the two hard drives fail, there is a chance the system could require a reboot to recover.
Nightly backups are made to a second mirror. You’ll find this under
/disk2/snapshots/ . Each time we run an incremental backup, a new directory is made. We will keep as many of these backups as we can; typically at least 30 days worth. This will provide you with your choice of days to restore files from. Restoring files is self-serve.
Approximately monthly, a copy of the nightly backups is taken off-site. This is in case of catastrophic failure (fire, earthquake, malicious hacker, etc) where the primary system is completely gone. This offsite backup will be used to build a replacement system. The backup is encrypted, in case the hard drive is physically stolen or in case it needs to be replaced (and the old one destroyed).