Over the last few days, we’ve had storms.
That is, rain, and thunder, and wind.
This has had effects on many people, some worse than others, as can be seen in the news stories plastered all over news.com.au.
Last night, we went to bed, and the rain and wind were loud enough to keep me awake for some time. Just as I thought things would settle down a bit, the house shook, and I noticed light shining into the lounge room.
What had happened was the network had restarted, causing monitors to turn on, and the ADSL modem to get a blink. This was due to a power blip. We’ve had a few of these over the last few days. What happens is that the power drops for something like 15 ms, which causes a VERY quick flick in the lights, and some (if not all) the network equipment and computers to all start rebooting.
My machine, unfortunately, hasn’t missed a reboot, copping all the power blips, and therefore, all the reboots.
The router on the other hand, running Tomato firmware, which I must say, works great for VoIP QoS, had 18+ days uptime on it, when I forced a reboot today to get the redial program running (meant to do that the other day when the connection didn’t come back by itself).
But, when I woke up this morning, a problem happened with one of my servers :(.
I turned it back on as it was off after the outage. I opened up command prompt on my machine (after logging in of course, due to the power flicker), and started pinging its IP.
I waited, and waited, and we had no response to the PINGS.
I tried remote desktop, just incase it had confused its interfaces to the network again, and was blocking ping internally. Nope, it was definitely down.
No worries I thought, probably something simple, like a prompt wanting to be dismissed in the BIOS or something basic.
I connected a monitor to it, and what I saw was well, a blank screen.
I turned it off and on again, it looked like it was going to boot, BIOS was successful, and it looked like it was getting to the animated Microsoft screen.
Nope, it sat there on a blank screen. This was very much confusing. I start following through a troubleshooting process, to determine the problem.
I took several hours, and did everything imaginable to get this machine to boot back into its OS.
I also had issues getting it to boot from a Setup CD, however, it would boot from a system tools CD, a Windows 98 CD, a Ubuntu Live CD just fine.
It didn’t like an XP CD, and it didn’t like its own Windows Server 2003 eval CD.
Ripped out network cards, swapped CPU, swapped memory, no joy.
Got down to the point where I was going to swap motherboards, though I had still thought.. the motherboard couldn’t have a fault, the Live CD works fine, Partition Commander loaded the file structure off the drives fine, it simply can’t be an IDE issue, or a motherboard issue.
I got tired of it, testing everything I could find to test and isolate the issue. This all was from 11am to 5.30pm.
I did one last thing, I thought, if its an IDE issue, not affecting the CD drive, but is affecting the HDD, a Seatools test will likely show it, surely.
So, I stuck the Seatools CD in, and attempted to boot from it, but it wouldn’t start from it due to a few scratches on the surface of the disc, I think.
After rebooting it again ready with another CD, I was too late to get the disc in before the BIOS prompt, anyway… I was amazed. Windows was actually starting all by itself.
Just moments earlier (that is, after each test I attempted to get into Windows), it didn’t work, it would just freeze up. When trying Windows XP or Server 2003 setup discs, it just locked up at “Setup is inspecting your hardware configuration”.. I googled as well, but got nothing useful, as the entries at that point were all SATA / PATA related, and I don’t have those drives in this server.
So, the problem doesn’t seem to be a problem after all that testing, and it wasn’t an old issue, because the machine rebooted in one of the previous power blips earlier.
Weird issue. It works now. It wasted a good 7 hours of my time, very disappointed, no, more annoyed, as I set today aside to do some more on OzVoIPStatus, and I really wanted to make some changes to the site today :(. It still doesn’t make sense either, how can it just DO that, after I tried all the reboots earlier to get it going, and it refused. The disk would light up for 1 second, and sit there, waiting for it to continue loading more of the loading application into memory for it to process.
It’s up now.
And in good news, the dedicated server is also running stable waiting for me to finish migrating over to it so we can get operational on it. I’ve got some sip providers on it now, and I should look at moving the tester over to it soon too, so that we can prove its stability, and in fact, just duplicate the entire setup to the box, if its going to crash again, it should crash in the office and not back in the data centre. It really does feel so much more stable!
top – 00:10:59 up 1 day, 13:07,
Closer and closer reaching the two day slot, where it would normally crash (it was crashing ~ 2 days).
Tomorrow, hopefully, I can get some more time in on OzVoIPStatus, basically, I want to try and focus on making the site more presentable to new users. I suck at presentation (really, I do), so I think I might just take a different road, and make the site more capable of templates, so perhaps we can try a few different layouts and get something default, that works for new users, and something more advanced, or more “detailed” that works for advanced users.
I really also have to put some more time into the features for the site, and get those finished, essentially, we want to source more data directly from providers themselves, so they can maintain their own records where possible, and I’ll obviously make some methods for the larger or smaller providers that are yet to open their eyes to the monitoring of their servers, but managing 180+ VoIP providers is no easy job for me to do myself, and we have had just 39 plans in the database for some time now, which indicates to me that a user provided data model isn’t going to work (as well, it has to still be validated, and corrected from those who think dollars are cents).
On that note though, this post was to focus on the bad weather we have had, and amazingly, we haven’t been affected as bad as others, who have yet to get power on, going into 24 hours without power. To describe that for someone who uses power regularly, ARE YOU INSANE?! – That’s just far too long to be without the connections to the internets and the power to the computers. I really dunno what I’d be doing that bored. I suppose I still have the 2 UPS in storage that I could connect up that should have around 4 hours of power on it :)!
I do hope for their sake they all get the power back, and this bad weather eases off soon enough (well, fill the dams up, and move along). I wouldn’t want to be a business owner in Wallsend right now, Jared of Servers Australia taking some pictures of his local town showing shops with smashed windows, likely a key target for looters to snatch and run anything salvageable from the floods they had. Take a look here:Servers Australia: Wallsend Wash.
My favourite thus far is the Jigsaw carpark, but also, there’s a bank that might have attracted the view of some salvagers.
It was indicated in the news we’d be getting some more, but I think that the rain is easing off some more. They predicated we’d get some more at 10pm, it’s now.. 12.22am and its just cold and quite. On that note, so much for Global Warming, everything feels cold. (joke)..
This is getting a bit long, so here’s hoping for a sunny day soon for the mothers to do the washing, the SES to get some sleep, the lawns to get some progress, and the power to be restored to those in the dark.