Record-breaking uptime is over – 1003 days

Please, a moment of silence, for one of longest uptimes for a actively used server.

When we started many years ago and moved into an office, our first server was a white-box desktop. We scrambled to build it out of components we had… some memory from here, a motherboard from over there, and hard drives (software RAID) from who knows what. It was by no means anything comparable to our current arsenal made out of stacks of PowerEdge servers running vSphere. Anyway, we have moved a few times and it has faithfully followed us. It has occupied our current location for about 3 years.

The other day, it got jealous. Well actually, I think there was a sharp voltage drop when we plugged a 4U PowerEdge server into the UPS it was sharing. The high-quality components it’s made out of apparently showed their true colors this time causing …wait for it…. a reboot!

So now we’re back to 0… it’ll be a long journey. No one has committed to upgrading the critical software it holds, so it won’t be decommissioned anytime soon.

See you again in 2.747945205479452 years.

Before and After the 4U server was plugged into the UPS. Ouch!

BEFORE 4U PowerEdge
LINEV    : 117.0 Volts
LOADPCT  :  23.9 Percent Load Capacity
BCHARGE  : 100.0 Percent
TIMELEFT :  85.0 Minutes
LASTXFER : Automatic or explicit self test

AFTER 4U PowerEdge
LINEV    : 113.7 Volts
LOADPCT  :  50.4 Percent Load Capacity
BCHARGE  : 100.0 Percent
TIMELEFT :  39.0 Minutes
LASTXFER : Unacceptable line voltage changes

Troubleshooting/Debugging BSOD errors

What happens when you get a Blue Screen of Death (BSOD)?  I’m sure almost everyone just says something like “____ Microsoft!”  Unfortunately, most of the time, you would just be using Microsoft as a scape goat.  Why?  According to Microsoft and other gurus, about 70-80% of crashes are caused by 3rd party drivers.  Yep, all those great toys you have hooked up to your computer and the software that control them are most likely responsible.

I have probably just blown your mind or you are probably full of skeptism.  Hopefully these debugging techniques can make you a believer….

Step 1:  Disable auto-reboot on a crash

Step 2:  Create a memory dump versus a Mini crash dump..  This will allow you to get more information from the dumps.

Step3:  Install Windows Debugger tools

Step4:  Set environment variable to automatically download symbols from the Microsoft symbol servers (WinDBG->Source Symbol Path->”srv*C:symbols*http://msdl.microsoft.com/download/symbols”)

Step5: Open the crash dump file located in C:Windows or C:Windowsminidump

Step6: Run “analyze -v” to get list of drivers in the stack text.  If the driver points to one of the Windows core system files (ntoskrnl.exe, win2k32.sys, etc), then you probably have to dig a little deeper.

Step7: Additional helpful debug commands to run to find the culprit

kv – Looks at stack of current thread.  This is used for misdiagnosed analysis.  Look for suspicious drivers

lm kv – Shows version information (dates, etc) of currently loaded drivers to find updates for.

!vm – Check pool usage (if close to maximum, then it’s a leaky driver)

!thread – looks at currently running threads

!process 0 0 – summary level display of processes during crash

!irp <irp from IRP List from !thread> – Associates drivers thread (it’s a hint to investigate)

!poolused (needs to enable on xp and earlier) – Use with Strings

!deadlock

 

 

Debugging mode (F8) – Use when no crash dump created…, needs to connect using usb (modify boot.ini) or serial from another system running windbg

Windbg – File->Kernel Debug

Debug -> Break to connect to crashed system

.dump (saves dump information)

 

Hung system troubleshooting (computer freeze)

– Use crash on control-scrl-scrl (registry setting)

– Check other processors on multiple processors

lm kv <driver name from stack>

Help for Asterisk AA50 including issues, how to rebuild compact flash filesystem, and workarounds

First, I would like to say that the AA50 is not a recommended product.  Actually, I think it's the opposite of it.  I would recommend an analog Phone with a voicemail recorder before I would recommend one of these things.  Why do I have such harsh feelings towards it?  Well, support personnel is unable to realize that a PBX has major issues if it reboots randomly and prevents you from leaving voicemails or getting voice prompts.  I even tried to make them understand by explaining to them that the problem is not an advance or unsupported feature, but one that's critical to the basic intended functionality of the device itself.  My response was "It's not meant to be used as a full PBX".  Secondly, they told me the issues are being worked on, but they haven't figured it out yet.  Uhh… my support ticket was created about a year ago!  Response "Do you know how hard it is to rewrite a firmware?"  I'm a very patient and understanding person, but if you fail to recognize a critical issue with a product at such a simple level, I feel my point will never be accepted.  Just imagine if Toyota took a year to fix their brake problems or say the cars weren't suppose to be fully used that way…. 

I'm proud to do Digium's job for everyone by providing the public community a work-around and documenting what I've learned.  Hope this help others.  As for the AA50, I will never buy anything solely and directly made by Digium again.  Buy Sangoma and use open-source Asterisk.

Background: http://www.keycruncher.com/blog/2009/11/02/digium-confirms-major-issues-with-aa50-voip-appliance-spotaneous-reboots-and-memory-card-write-lock-a-review/

Symptoms:

  1. The system reboots randomly and frequently
  2. The system loses access to the compaq flash filesystem frequently, thus no voicemails or voicemenu prompts or even backups.
  3. The system prevents you from deleting voicemails due to the issue with Symptom 2.

Detail Description:

Basically, the reasons are:  Memory leak(s) (Symptoms 1) and Memory card write-locks (Symptoms 2,3)

Work-around:

Create an automated cronjob to reboot the system on a nightly basis.

  1. Create a script (reboot-24hrs.sh) in /etc/config (use this directory because it's backed up to the local storage; not flash storage)
    #!/bin/sh
    sleep 86400
    /bin/asterisk -rx
    reboot

Edit /etc/config/rc.local and add /etc/config/reboot-24hrs.sh &

What if you wanted to rebuild your compact flash card?  The answer is simple:

  • The appliance on startup (/etc/rc) mounts the compact flash using this command:  "mount -t ext3 /dev/hda1 /var/lib/asterisk/sounds"
  1. /sbin/create_sounds (Formats the compact flash memory card and creates the proper sounds directory.  It also downloads the files from the Internet)
  2. /sbin/update_tz (Downloads time zone files from the Internet)
  3. /sbin/update_phoneprov (Downloads phone provisioning files from the Internet)

A useful print server configuration tool

Have you ever wanted to make a backup of all your printers, it’s shares, the permissions for them, and the drivers on your print server?  Well, Microsoft has a very useful tool that does this.  Furthermore, it also does restores!  I couldn’t believe my eyes either!  It’s great for when you need to setup redundant print server configurations or when you are migrating print servers!

Here it is:

http://www.microsoft.com/WindowsServer2003/techinfo/overview/printmigrator3.1.mspx

Malware,Spyware,Scareware – How to detect and prevent infection…

What is malware and how do I get it?

Generally speaking, malware are malicious software designed to infiltrate a computer system without the owner knowingly allowing it to.  It’s intent is to perform devious acts on or using your computer.   These are programs that generate misleading alerts and false detections in order to convince users to purchase illegitimate security software.

Additional Malware Info

What are the symptoms?

Pop-ups, website redirection, network configuration changes, unresponsive computer, etc…

Information regarding Antivirus2009 Malware

Information regarding Internet Security 2010

How did I get it?

The source usually comes from emails, websites, pirated software downloads, P2P applications, fake video codecs, software exploits (ie. acrobat), etc… The typical scenario is a pop-up that asks you to download and install something.  Once the download and install happens, the malware will take over the computer.

How do I protect myself?

  1. We still live in a world where humans can usually make the best decisions.  This means user training is one of the best method to prevent infections.  Below are a list of things to train users on that doesn’t require a lot of time.
    • Users should be a little paranoid and skeptical when it comes to reading the emails they receive, especially emails requesting actions to be taken. If it sounds important, take the time to read and verify it carefully!
    • Users should make sure they have an SSL connection when making transactions online or logging into banking sites.
    • Exercise caution with e-mail and files received from unknown sources, or received unexpectedly from known sources.  If the email is from someone they know, make sure it has relevant content specific to that person (ie. writing style, context of message, etc.)
    • Users should know sometimes a pop-up can be made to look like a Windows error message. Recognizing legitamite software interfaces can help (Antivirus software, Windows Security Center, Windows Defender, Anti-malware software)
    • Don’t download random software from the Internet until you know it has a valid homepage and user base (look for software reviews for it). Once that’s verified, make sure you download directly from the vendor’s website.
    • Users should understand how a website can be spoofed to go to the wrong website using the HOSTS files.
    • Users should understand that a text link can have a different URL embedded.
    • Don’t install software unless you were intentionally trying to.
  2. Keep Windows and your browser software up-to-date by downloading and applying security updates.
  3. Use an active and updated antivirus and anti-malware application that detects harmful websites, files, and emails. There are many applications out there that are free. Some highly recommended ones are Spybot Webroot, Search and Destoy, MalwareBytes, SuperAntispyware, PC Tools Spyware Doctor.

Removal Tips:

  1. Boot into SAFE MODE. It will give you a more effective platform to work with.
  2. The key is to get the system to allow you to install anti-malware software with the latest updates to slowly remove the programs.
  3. Fix infections and reboot often will get you further along in the removal process.
  4. There is no perfect anti-malware software, therefore, you should run scans using multiple anti-malware software to make sure all malware is removed.
  5. Can’t run/install software due to access permissions – This is usually due to the software restriction in your local security policy or your registry has malicious group policies regarding software restrictions configured.
  6. Can browse website or weird website redirections – Check the Internet Explorer proxy settings. 95% of the time, it shouldn’t be using a proxy. Also, make sure your HOSTS file doesn’t have malicious entries in it.