Computer Problem

I’m posting this from Patricia’s machine, because mine died in its sleep last night. When I got up, it had reset for some reason, and was ready to boot. But it wouldn’t. It just spins forever. I tried reverting to previous kernels, but still no joy. It even fails when booting to rescue it. Not sure how to even start to diagnose it. I could try booting her machine with it (very similar hardware), to see if it’s a software issue, but I’d have to figure out how to edit the fstab, because it overmounts a separate drive as /home, and her machine wouldn’t have it.

Anyway, fun times.

[Update a while later]

Definitely not a software problem; I tried booting it from the drive I use to boot my laptop, and the same issue. I looked at the hardware monitor in the BIOS, and not seeing any issues. The CPU temperature seems fine. And the CPU seems to be working well enough to get to the boot menu. I suspect it is a motherboard issue (though it could be memory). I should see if it will boot with memtest.

[Tuesday-morning update]

Per comments, in the BIOS setup, I am seeing all three drives (the SSD that the OS lives on, the hard drive that is /home, and the SSD that Windows lives on for the VM), 32G of RAM. All looks fine from that standpoint. Next step is to find a stick and put memtest on it.

[Update a while later]

OK, I’m posting this from the machine. It seemed to boot fine from a Fedora 33 live USB (i.e., it doesn’t seem to be a hardware problem). Now to figure out why it won’t boot from its own drive, or from the other Fedora drive I tried. Any diagnostic suggestions?

[Update a few minutes later]

So I ran fsck on /dev/sda (my system drive). It said there was a dirty bit set on the first partition, but other than that it found no problems. But it still won’t boot.

[Afternoon update]

So, using the instructions on this page, I tried reinstalling grub2. On the last command, I got this error message:

****************************************

[root@localhost-live /]

# grub2-install /dev/sda
Installing for i386-pc platform.
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda2”: No such device
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda2”: No such device
Unknown device “/dev/sda3”: No such device
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda1”: No such device
Unknown device “/dev/sda1”: No such device
grub2-install: warning: ../grub-core/partmap/gpt.c:190:this GPT partition label contains no BIOS Boot Partition; embedding won’t be possible.
grub2-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
grub2-install: error: will not proceed with blocklists.

************************************************************

[root@localhost-live /]

# ls /dev/sda1
/dev/sda1

So I don’t know why it’s having trouble knowing the device (sda1 is my boot partition). Any ideas?

[Update a few minutes later]

Wait a minute. Why is it installing for an i386 platform? This is a Ryzen. [Off looking up man on grub2-install]

Weird. It says the default platform is the one that the installer is running on. I’m pretty sure that this live USB is x86_64.

Here is the boot partition:

EFI grub2 mach_kernel System

[Update a while later]

OK, weirder and weirder. I’m following the instructions on this page (just the last one to reinstall grub and shim), but when I do the dnf, it times out getting to the repositories. And I can’t ping Google. Yet I’m able to web surf. Riddle me that, Batman.

56 thoughts on “Computer Problem”

  1. Hrm… The failure of rescue booting is troubling. Have you tried swapping the primary hard drive with another computer’s HD? If that doesn’t work, I’d say you have a dead motherboard. I just bought one for a Ryzen 5 3600 for about $100.00, though I haven’t swapped it out yet. In my case a USB host controller died and took out half my USB ports.

      1. Mine is an ASRock B450M/ac, which is what my computer came with. I stayed with it to avoid any mechanical or cabling issues swapping it out. I would’ve already done so, but with all the political chaos going on, I don’t want to risk any downtime when my machine is still running fine, absent half the USB ports being dead.

  2. Rand,

    Haven’t had a problem like the one you describe in a very long time, but…you might consider swapping out the power supply, since they typically have separate (read: fail-able) power leads for different items embedded in or connected to the MB.

    1. Or at least look up your power supply online and see if you can get cable voltages and use a DMM on the leads. Most motherboard cable configs are standardized around a few voltages. Too lazy to look it up, but you can, should be straightforward to find.

  3. Do you have a bootable Linux USB flash drive? Have you tried that? Those can come in handy, esp. the ones that allow you to run fsck on your hard drive.

    Could be memory. Memtest will help with that. Or if you don’t have too many DIMMs try removing all but the proper ‘first’ one and then swapping it if still no joy.

    And then there is my all time favorite motherboard trasher. Try replacing the battery on the TOY clock. Known to trigger sad Macs on the ol’ Quadra 605s that the MAC dealers at the time would only be way to happy to sell you a $400 motherboard ($125 from the mail order warehousers) to replace a $12 battery. Let’s hope your nearly new Ryzen 5 3600x doesn’t suffer from that. I don’t think it should.

    1. BTW you are at your quota for the year Rand. Unlike Medicare I have a reverse deductible aka a cap. Will help you again after Jan. 1. 🙂

    2. If the BIOS comes up with the correct date and time and not the one it had when you last shut if off, it’s likely not the TOY clock battery. But I had to throw that out there because, well, because all you MAC bigots… I was once one too…. 🙂

        1. Seriously, folks. The BIOS coin battery is something to check — it is cheap enough to just replace.

          I’ll be here all week. Try the veal . . .

    3. I second the recommendation for booting a live Linux thumb drive. I had an update kill grub one time and I fixed it with a boot repair live usb drive. I also recommend having up-to-date kernel versions of Parted Magic and Clonezilla on hand. They’re great for testing what’s alive and what’s not.

      1. I, for one, find these computer-problem threads very educational. I’ve learned a lot from them – both the posts and the comments.

        I have no useful knowledge on this particular problem, so all I can say is, best of luck, and please do post an update if/when an answer is found.

  4. Since it will POST and you can bring up the config screen, it’s probably a disk problem. Look in the config for the disks and see if they show up. A fix that works at least half the time is to simply disconnect and re-seat ALL of the cables on both ends. You may have to fiddle with the boot order to get it to boot to USB. there are several good diagnostic disk images around and I’m not current enough to make a recommendation. If re-seating the connections don’t fix it, probably a dead or corrupted disk.

      1. If you aren’t using an SSD and it turns out to be a dead disk, now might be the time to consider a switch.

  5. Did you try

    Swapping out the sata cables.
    Different sata port.
    Swapping out the power supply.
    Failsafe bios settings.

  6. [Tuesday afternoon update] Sounds like a configuration problem post crash that is prohibiting a reboot. Are you running a RAID array? If so make sure you have all your settings set up correctly.

    A good way to check the configuration of your system disk is boot up on your USB flash drive and then mount your system disk so you can inspect all your configuration settings. One possible problem I’ve read about is if your swap partition is mis-configured it can cause a boot hang. If you start a boot and be patient (several minutes) see what kind of messages pop out of the kernel. That would be most helpful.

    1. Also make sure your BIOS settings are correct for your system disk. Are you supposed to be setup for UEFI boot or something else? Boot order in the BIOS correct? Etc.

    2. I’m not running a RAID array. I don’t think it’s a BIOS issue. It has no problem loading grub. But when I select a kernel (or it defaults to boot it), it does the spinning Fedora thing ad infinitum, and doesn’t boot. There are no error messages when I boot from the USB. and I’ve done nothing to the BIOS other than telling it to prioritize USB for booting.

      1. It sounds like a grub configuration problem, or possibly what Grub is pointing to is corrupt. The latter seems unlikely as you have problems with multiple kernels. I suggest trying booting a live boot repair usb and then letting it reinstall grub for you.

  7. From a (very) quick review of people having similar issues it sounds like you are missing a BIOS boot partition on your disk. I don’t know how you made your system disk. But you might want to check your partition table to make sure that this partition exists and then re-try the grub2-install. Or that the existing boot partition is marked for BIOS boot. HTH

    1. This is the only boot partition:

      EFI grub2 mach_kernel System

      What should I look for? And what changed? It’s been booting fine ever since I built the machine. I didn’t make any changes to anything Sunday night.

      1. That seems really odd for your configuration. But maybe I’m behind the times. I would have expected your boot partition to say BIOS grub2 linux_kernel System. AFAIK Fedora/RedHat is still Linux which is not mach. Mach is for Gnu Herd. Try some other settings for your boot partition. I have no idea why it is saying this.

        1. Some different switches on grub2 install? I don”t know if it tries to partition a boot partition on your disk as part of what it does, but if it does, it seems the defaults are wrong for Linux.

          1. Speculating that you may have a bad (i.e. failed) sector in your boot partition. But you need to fix the boot partition for BIOS/Linux or perhaps EFI/Linux and try grub install again. What are you using for partition software?

  8. OK, weirder and weirder. I’m following the instructions on this page (just the last one to reinstall grub and shim), but when I do the dnf, it times out getting to the repositories. And I can’t ping Google. Yet I’m able to web surf. Riddle me that, Batman.

    It’s actually very simple. The Internet hates you Rand.
    Try again a little later. Was ‘this page’ supposed to be a link?

      1. Did you try the steps to move resolve.conf and replace it with a DNS nameserver of 9.9.9.9 or some other known to work one?

        Also I thought the suggestion of going with hard-wired Internet vs WiFi wise.

          1. That is bizarre. Maybe the automatic network config (is this DHCP off an ISP router?) is redirecting you to a bad proxy that only does HTTP?

            Can you do the network settings manually?

  9. Would anyone happen to know whether Fedora auto-updates?

    The reason I ask is that Rand’s issue reminds me of when a client of mine needed my help to fix a Windows-based machine, which was suddenly unable to correctly see its ethernet. The issue turned out to be a windows update, as he had his machine set to allow updates automatically (never a good idea), and the “update” couldn’t see the on-motherboard ethernet. I took the easy/fast way out (he’s a day-trader, and needed access urgently) and installed an ethernet card.

    So, though it’s a very different OS, my first suspicion would be an update (assuming Fedora does that, and assuming Rand has it set to do it automatically). Just a guess.

      1. You system disk could have been corrupted long before Sunday night. You wouldn’t have noticed it until your system was forced to reboot.

  10. If you can’t resolve the boot partition issue you can always just grab a new or spare disk and re-install on it, then once you’re back you can mount the old disk and copy any files you need to preserve off it. This can save a ton of time and frustration. My advice, grab a copy of CentOS 7 or (RH7 w/support) install it and stay away from Fedora.

    1. There is no data on the OS SSD; I keep that on a hard drive that I overmount on /home. I’m just trying to preserve this one because it’s a PITA to reinstall all the packages I use from a new install.

      1. It’s time consuming but shouldn’t be a PITA. How’s your scripting kung fu? You can install your add-on packages from a script you write and maintain on a USB flash drive. After a new install copy it over to your new system disk and run from there.

        It’s all a matter of trade-offs, usually time.

Comments are closed.