Continuing Computer Problems

I’ve got a hardware problem of some kind.

I put in a brand new terabyte drive and installed Fedora 11 on it. As I was doing some package updates, the screen froze, just as it did yesterday, when all the trouble started. When I tried to reboot, it didn’t even recognize the drive as a system disk. I rescued it, and fsck said the boot sector was clean (I don’t know if that means no problems, or it has no data).

I did a full memory check overnight last night, so I don’t think it’s that. Maybe Carl had the right idea about a bad video card, but how would that cause it to trash the hard drive? Any other ideas?

15 thoughts on “Continuing Computer Problems”

  1. Thats a weird one.

    Jerry Pournelle would say: check your cables & connections.

    I once had a bad HD controller in an old ABIT motherboard – drives would work on the second cable socket, but not the first.

    Also had a bad KVM switch earlier this year that randomly wouldn’t let my input through to the computer. The computer would eventually go to sleep & just appear to be hung (that was hard to diagnose).

  2. If your HD controller is on your mother board (and most are) it might be the problem. But don’t overlook software problems.

    When is the last time you ran scans from anti-spyware and at least two anti-virus programs (there are on-line ones you can run).

    Lastly power supply problems can cause most any problem. If your machine is older it could be just giving up, or your machine could be clogged with dust and dirt or have a bad fan.

    All of this can be looked at and figured out by process of elimination, such as replacing the video card or such. But if you use this machine a lot and it is older, it is cheaper in the long run to just replace it.

    Not to push a brand but I have (in my family, and extended family) 8 Lenovo computers. This is the corporation that took over IBM’s PC branch a few years ago. We have had no problems to speak of and their customer support has been satisfactory.

    Just my two cents (with a background hardware in large systems with big blue)

    Papa Ray
    West Texas

  3. Bad mobo. They will cause intermittent random errors up until catastrophic failure. They can cause many a hair to be pulled. Fortunately, I’ve got Rogaine. I would find the BIOS jumper on the motherboard and reset back to defaults. While you are looking for the BIOS jumper check all the capacitors and see if any are swollen. A blown cap is sure sign of failure. If you can’t find the BIOS jumper use the defaults option in the BIOS itself. If that doesn’t help then go through and disable all the caches and advanced power management options. Your BIOS may even have a fail safe option which will kill most performance optimizations.

    If the video adapter was going bad you’d likely, though not always, see artifacts and glitches on the screen while the system was trying to POST. If you see problems before the operating system even has a chance to start loading then you know it is a hardware problem. In fact, bad video card or mobo will intermittently cause the system to fail to boot all together. In other words, you press the power button and the power and fans comes on but otherwise nothing happens. Sometimes I just reboot a computer over and over again to see if I can catch it doing something fishy during the bootstrap process.

    Just to make life interesting a bad power supply can make your head spin. If you have a multimeter you can jumper the power lead on the block connector to the mobo. This forces the power supply to come on as soon as you plug it in. Then, you can test your 5v and 12v rails to make certain they are in spec. It is not uncommon for a cheap PSU to be .5 volt low to begin with so degradation can quickly lead to power starvation. You can google how to test your PSU and find many a guide on the overclocker forums.

  4. I think the screen freezing is spurious: you’re seeing the whole machine freeze up for some reason.

    What’s the provenance of the box? A old spare lying around? It could be a bad motherboard (or, more specifically, the SATA controller on the board), a flaky power supply, bad cabling as Ed says.

    This is going to be one of those obscure problems, where you’ve ruled out the hard drive, and probably the memory (though I don’t fully trust the memtest program). I suppose you have one clue in that you’ve seen this happen when you’re doing updates, which tends to have some disk i/o associated with it, and your drive is borked.

  5. I actually think your instinct may be correct about the video card — we had this exact symptom with a Fedora machine at work and it turned out to be a video card problem.

    did you try a ubcd?

  6. I suppose you have one clue in that you’ve seen this happen when you’re doing updates, which tends to have some disk i/o associated with it, and your drive is borked.

    The first time it happened, it had nothing to do with updates. I was just trying to move a sentence in Open Office Write.

    And it’s a BRAND NEW DRIVE.

  7. From what I’ve read of your troubles, it could be either the power supply, motherboard, or video card. I’ve had all of them die at one time or another in the past with all kinds of weird symptoms that point the finger at something else.

    If you can swap those parts to test I would, otherwise I’d try some testing to try and figure out which is the likely culprit. If you ever get screen artifacts, or just a plain black screen on boot, that’s likely the video card. If the system is unreliable but totally unpredictable on how long it will last, that’s likely power supply. If it is unreliable and the crashes are progressively closer to POST (which is what I think you’ve been describing) I’d put the motherboard as top suspect.

    As anecdotal evidence supporting that, more than half the motherboards I’ve had die on me do so in early or mid summer as the heat and humidity go up and throw a marginal capacitor or voltage regulator over the edge. Video card and power supply failures haven’t had any such pattern in my experience.

  8. Most live Linux CD’s have a copy of MEMTEST on them. Try booting that and running it overnight.

    There’s also a “CPUTEST” floating around the net somewhere, but I don’t remember which recovery CD I saw it on.

  9. I ran a memory test Monday night from the installation CD, with no problems shown. I’m ninety percent convinced that it’s the failing hard drive that I originally backed up my data to. I’m going to run without it today and see what happens. If that’s the problem, it makes my data recovery problem all the worse.

  10. Dennis Wingo Says:
    July 7th, 2009 at 8:38 pm

    “I would still bet on the memory. It has all of the signs.”

    You’d probably be getting error messages indicating memory block address read/write problems. Or, the BIOS would keep reporting that the amount of system memory has changed every time you boot up. I have seen bad memory pass diags, hell just yesterday in fact, but most always get BSOD or kernel fault check errors in the O/S when one actually starts to use it.

    Total lock ups are usually a bad mother board.

    Although, yes, I have seen a bad hard drive lock up a computer before, even as a slave drive. Hell, even as an external USB drive. Several drives I have resurrected by swapping the controller card on the hard drive itself with a drive of similar make and model. Usually in this case though the BIOS just outright refuses to acknowledge that a drive is plugged into the system at all. Another time I had to scavenge data off a bad disk and it would copy about 100 megs or so and then the drive would make a “PEEWWWWwwwww” noise as the spindle motor would suddenly cut out and the system would lock up. Oddly enough, the system would suddenly start responding again when I physically disconnected the power on the back of the bad drive and plug it back in. I’d copy another couple hundred megs of data and then have to rinse/repeat.

    Any obviously loud clicks, clacks, or crunching sounds coming from the HDD indicate a head crash and sure sign of failure. Most drives that I have seen lock up a system usually make their head crashing clack sounds immediately before the event.

  11. The drive hasn’t been making any obvious noise. All I know is that after I accidentally made it part of the Linux installation yesterday morning, it was reporting that it was failing, with bad sectors. And when I tried to do an install with it still attached, the machine locked up. I Installed without it late last night, and I’ve been using the machine all morning with no problems so far.

  12. Just to add to all the possibilities so far; it could be dirty power to the box. I had a problem much like this some time ago – random system hangs, random reboots, that sort of thing – and eventually found that there was a loose connection in the mains socket.

    You might be able to test this by attaching a UPS, which usually power clean as well.

  13. As a rule when something goes wrong with an existing system it’s whatever you changed on it last. As you just installed a new hd I’d say it was a bad drive. Replace it and try again. A drive crash could cause the symptoms you describe and the drive could test clean anyway because it’s an intermittent problem on the controller.

    I’ve never been fond of using large hard drives as backup because there’s so many ways for the hardware on these to go bad and then you have to send it to a recovery facility to pull the data. An external tape backup unit is still the most reliable method available.

Comments are closed.