You’ve probably heard the unbelievable story about the man who accidentally deleted his entire company’s and customers’ data. More astonishing is that he also deleted all backups. It seems he coded an “rm –rf” into his script and included the backup filesystems that were mounted. The “rm” command means delete and the “–r” makes the deletion recursive, meaning it deletes the directory and everything underneath of it.
Any time you are going to be automating recursive deletions using rm, you need to double and triple check your code and test. In fact, I would quadruple check my work, take a break, and then return for the quadruple check with a fresh set of eyes. If possible when performing something so potentially destructive like this, ask for an independent set of eyes to read over your code, such as a trusted expert or technical forum. There is also no excuse for a lack of testing. Even if one does not have a test server, test the code on a small, unimportant filesystem before letting it loose on yours and your customers’ livelihood.
The other obvious problem is with regards to the backups of the data. I know tape may be considered old fashioned, but there is something to be said for the safety of having disconnected backup copies of important data. That being said, disk backups can certainly be utilized, but managed in a much safer way. If the backup filesystems really need to be available to query, they should not need to be writable. There should always be a copy of critical data that is not accessible in the same way active data is.
The “rm” command does not overwrite blocks on disk, so if no other steps were taken, the data could potentially still be on disk. All activity should be stopped immediately, though, and a reputable data recovery company should be engaged. The company may or may not be able to recover the data.
The bottom line is that one always needs to be careful. This does not just include checking and testing, but also setting up safeguards against human error. If backup data cannot be accessed in the first place, a mistake can’t possibly overwrite it. This life changing error could have been prevented by simple checks and restrictions.