Disaster Recovery, the things I learned the hard way
I've been working in the Disaster recovery business for 6 years, so I thought I could put in writing a few things that have been the bane of the projects I've been assigned leading to failure, exhaution and plain customer rejection of the DR solution.
The plagues of your DR
This filer you shall clean and archive
See this filer with 7 TB of data and 4 Million files you have ?
Of those 4 millions, half of them if not way less are necessary on a daily basis.
In case of a disaster and yo need accounting and your invoicing back online ASAP ? you will need a tenth of it in the first 4 to 12 hours.
Do yourself, and the tool that you use to recover, a favor :
- Archive in Zip directories with stale data
- Order and make people in the company order the data by Year / Month if possible, or have an archive directory with zip files of unused data.
AKA HAVE A DATA LIFECYCLE POLICY !
This tool that uses MAC addresses for licensing you shall banish
In case of DR, you might be in a slightly different environment than the production.
This can happen for many reasons : The provider you chose is late on the delivery of something, you chose a DRaaS solution that uses any kind of public cloud, where usually setting arbitrary mac addreses is going to be forbidden, the server is now in a vortual environment and was physical previously...
Tools that rely on the MAC address of the network interfaces to check if the licence is valid are going to be an issue and you will need to copntact the editor to obtain a new licence...
Which is usually NOT a great experience because the very concept of Disaster Recovery is not a priority for many of them and the latency of sales people you will certainly be forwarded to can be a nightmare.
Which brings us to the next point ->
The name and contact information of your providers you shall keep up to date
You're in it up to the neck, and you are kindly informed that your contact at "provider of critical business solution X" has moved on and is no longer at the company.
Congratulation, you now have another rabbit hole involing talking to humans from outside that you need to resolve.
Keep up to date with your support of the solutions every 6 months, have a list of those in a vault / file manager outside your systems and review your support contracts at the same time.
No support contract while triggering a DR is shaking hands with danger. Ask your support for how they will help you set things back up in case of an emergency and how much it might cost you.
This huge partition you should break up
A 13 TB Exchange server, on a 20 mbps line, is not a great backup and recovery recipe. Nor is a SQL dump recovery of 3 TB of EDI Data to apply to your restarted MSSQL server.
The larger the partition, the bigger the data on it, usually those large systems are also a monolithic database, where the recovery of data MUST be consistent in a single atomic operation : You will need all of your database's files for it to start up, but when each of them weigh a hundred gigabytes, this will introduce a delay in your recovery.
Less is better
You have to always do more with less at your disposal, to help you with that, give your problems to someone else :
- Exchange to Office 365 for non critial people (if you're european, yeah, keep exchange for the VIPs, just reduce it's size), if not everyone.
- Move the photos of the company gathering of 2003 to cold cloud storage.
- When solutions like azure workstation, Shadow and Citrix cloud exists,ask yourself if you still need to maintain a local VDI solution
Your documentation you will keep up to date and protect
Yes, you need to include your internal wiki in your DR, and in it, must be present the instruction on how to restart and maintain your services. Create an inventory of EVERY. SINGLE. ONE. OF. THEM. , if there's someone that point a lack in documentation, add it to the todo.
Your secrets and passwords you will keep with you and synced
Use a password manager, idealy externalized... No, hosting it on premise is not necessarily more secure, and when disaster strikes, if it gets wiped out too, you're in deep shit.