Forum Moderators: phranque

Message Too Old, No Replies

Ever messed up a live site

Did it take long to fix?

         

Habtom

5:45 am on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ever messed up a live site? I never did to a large extent, but the thought of it kills me.

DrDoc

6:43 am on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ever messed one up?

Well, I have messed up unimportant sites where it did not matter that I did the testing directly in the live environment. I have, however, fixed live sites (... that did matter) which have been messed up by someone else (who was subsequently fired).

Scary? Can be. The amount of "damage" heavily depends on what's been messed up.

Syntax error in a script? Pfft.
Massive DB update that went wrong? Eep!
Loss of order or customer information? Gaaah!

If you have great backups (which are up-to-date and which you know can be restored in a jiffy) it's not that big of a deal. If you don't ... it may very well be fatal.

A system is only as good as the amount of time and effort it takes to restore it when things go wrong.

The fix I talked about earlier ... that someone else messed up. Well, this happened to be a an untested DB update. The guy should've tested and re-tested the update in a test environment before running it. To make matters worse, this happened on a quite busy ecommerce platform. Hundreds of thousands of orders were messed up. About two thirds of these could immediately (within 5 minutes) be restored from last night's backup. The others should have been just as easy to restore if the guy had only backed up the DB right before running the stupid untested update. So ... "feel free to take the rest of the day off, while I am still digesting what just happened" ... and then off I went to save 35000 orders. I wouldn't trust anyone else with the fix. It had to be done right. And it had to be done manually. And it had to be done quickly. It took me about 5 hours. 5 long, stressful, horrific hours of frenetic manual DB fixing. In the end, all orders but 2 were restored by running handfuls of manual queries and digesting thousands of lines of server logs ... Horrible!

So ...
1) Don't mess it up in the first place
2) Know that if you ever do mess things up (which you will) that you have sufficient, reliable, tested backups which will let you sleep comfortably
3) Have disaster recovery plans in place

Do that, and things will be just peachy!

wyweb

9:25 am on Jul 15, 2007 (gmt 0)



Ever messed up a live site?

Oh yeah, and not just one either. One little typo (read: One enormous typo) when changing DNS settings for a server move. Let's see.. 30 sites, was it? Something like that. All of my own and about 10 others that I host for others.

I don't just mess em up. I knock em offline.

Friday evening and I'm walking out the door a minute later. Computer free all weekend. No way to check or verify that the change was correct. Why check it though? I mean how hard is it change nameservers? That's how complacent I was back then and it's come back to bite me more than once.

Never quite with that many teeth though.

rocknbil

11:19 am on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Early in my career . . . .

instead of

delete from table where id=12345

a slip of the finger, it's right next door . . .

delete from table where id-12345

Oddly enough, all the records less than 12345 went poof. Huh.

<eek>

Backups saved me, that day.

httpwebwitch

2:37 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



More times than I care to divulge.
When uptime and reliability are important, I do keep processes in place to prevent online damage (backups, staging servers, automated deployment, autopingers, etc).
However for many of my own projects, they're edited live on the server using notepad++. Sometimes i'll edit offline then send them in via FTP; sometimes not. it depends on the situation.

The worst blunder in my career so far resulted in a loss of approx $150K. Though i like to boast, sorry I can't take total credit for that one, it was a group effort...

I know one anecdote where a fellow programmer burned down a whole fleet of sites by deploying a script with one typo - a small 'k' instead of a capital 'K'. Fun!

Gibble

9:20 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Invariably...though, the worst I can recall only took an hour of manual fixing to repair, and only happened because my boss told me to make a quick fix...which ultimately broke way more than it fixed.

So, while I was the one at the keyboard, I wasn't the one who said to make the change, nor had I worked there long enough to understand the system well enough to know what all to look for in verifying the change was correct.

Though, that didn't stop my boss from blaming me...

...gee, thanks, I've been here a week, how was I to know that there were other sites of ours consuming this web service, and your "fix" was going to break them.