Strange Computer Problem

March 4, 2021 Rand Simberg 57 Comments

When I started using the machine this morning, it seemed to be running like molasses in January. I tried rebooting, and it took forever to boot, and then wouldn’t let me log in. I fired up a clean Fedora from a stick, and fscked my drives. The /home hard drive had a lot of errors on it, that got fixed, but there was no problem with the SSD where my OS resides. Then I rebooted. It took a long time, but finally came up. Everything continues to load and run slow. Nothing seems to be bogging down the CPU, and there is plenty of free memory. Any ideas what the problem could be?

[Update a while later]

OK, I rebooted without mounting the hard drive. It seems to be running fine now. So I guess I need a new drive.

[Update early afternoon]

Well, this is fun. It won’t boot with the drive mounted, so I’m back to the Fedora on a stick, but I can’t find the logical volume where my fstab is to tell it not to mount the drive.

[Evening update]

I got the new drive, and started to dd the data from the old drive to it. The process died after about 2.7G, with an “i/o error.” How screwed am I?

[Update a couple minutes later]

I’m trying again, with a conv=noerror flag. I may not get everything, but hopefully most of it.

[Friday-morning update]

Well, it’s copying at 11MB/s. At that rate, it’s about a third of the way through, and won’t be done until tomorrow. I’m glad it wasn’t bigger…

[Friday-afternoon update]

So, after moving about 833GB, the process ground to a slow crawl, so I gave up on it, and am going to try ddrescue. I bought another drive to write the image to, but when I try to partition it, I get this message:

(parted) mkpart
Partition name? []? gpt
File system type? [ext2]? ext4
Start? 1.048
End? 1800000
Warning: You requested a partition from 1048kB to 1800GB (sectors
2046..3515625000).
The closest location we can manage is 1048kB to 1048kB (sectors 2047..2047).
Is this still acceptable to you?

It’s a 2-terabyte drive. What’s going on? (And yes, I do have the correct drive selected, /dev/sde).

[Update a few minutes later]

Never mind, I found the problem.

[Update]

OK, WTF now?

[root@localhost-live /]

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1.8G 1 loop /run/media/liveuser/disk
loop1 7:1 0 7.5G 1 loop
├─live-rw 253:0 0 7.5G 0 dm /
└─live-base 253:1 0 7.5G 1 dm
loop2 7:2 0 32G 0 loop
└─live-rw 253:0 0 7.5G 0 dm /
sda 8:0 0 232.9G 0 disk
├─sda1 8:1 0 600M 0 part
├─sda2 8:2 0 1G 0 part
└─sda3 8:3 0 230G 0 part
├─fedora_localhost–live-home00
│ 253:2 0 10G 0 lvm
└─fedora_localhost–live-root00
253:3 0 220G 0 lvm
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
sdc 8:32 0 55.9G 0 disk
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.6T 0 part /mnt
sdf 8:80 1 14.9G 0 disk
├─sdf1 8:81 1 1.9G 0 part /run/initramfs/live
├─sdf2 8:82 1 10.9M 0 part
└─sdf3 8:83 1 22.9M 0 part
zram0 252:0 0 4G 0 disk [SWAP]

[root@localhost-live /]

# umount /mnt

[root@localhost-live /]

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1.8G 1 loop /run/media/liveuser/disk
loop1 7:1 0 7.5G 1 loop
├─live-rw 253:0 0 7.5G 0 dm /
└─live-base 253:1 0 7.5G 1 dm
loop2 7:2 0 32G 0 loop
└─live-rw 253:0 0 7.5G 0 dm /
sda 8:0 0 232.9G 0 disk
├─sda1 8:1 0 600M 0 part
├─sda2 8:2 0 1G 0 part
└─sda3 8:3 0 230G 0 part
├─fedora_localhost–live-home00
│ 253:2 0 10G 0 lvm
└─fedora_localhost–live-root00
253:3 0 220G 0 lvm
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
sdc 8:32 0 55.9G 0 disk
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.6T 0 part
sdf 8:80 1 14.9G 0 disk
├─sdf1 8:81 1 1.9G 0 part /run/initramfs/live
├─sdf2 8:82 1 10.9M 0 part
└─sdf3 8:83 1 22.9M 0 part
zram0 252:0 0 4G 0 disk [SWAP]

[root@localhost-live /]

# mount /dev/sde1 /mnt

[root@localhost-live /]

# ddrescue -d /dev/sdb1 /mnt/test.img /mnt/test.logfile
GNU ddrescue 1.25
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 0 B, tried: 0 B, bad-sector: 0 B, bad areas: 0

Current status
ipos: 0 B, non-trimmed: 0 B, current rate: 0 B/s
opos: 0 B, non-scraped: 0 B, average rate: 0 B/s
non-tried: 2000 GB, bad-sector: 0 B, error rate: 0 B/s
rescued: 0 B, bad areas: 0, run time: 0s
pct rescued: 0.00%, read errors: 0, remaining time: n/a
time since last successful read: n/a
Copying non-tried blocks… Pass 1 (forwards)

ddrescue: Error writing mapfile ‘/mnt/test.logfile’: No space left on device
Fix the problem and press ENTER to retry,
or E+ENTER for an emergency save and exit,
or Q+ENTER to abort.

************************************************

lsblk says it’s got 1.6 Terabytes. I just partitioned it. How can there be no space left on the device?

[Update a while later]

Yes, I forgot to format after petitioning…

[Saturday-morning update]

OK, so what does this mean?

[root@localhost-live /]

# ddrescue -d /dev/sdb1 /mnt/test.img /mnt/test.logfile
GNU ddrescue 1.25
Press Ctrl-C to interrupt
ipos: 1784 GB, non-trimmed: 43778 kB, current rate: 14680 kB/s
opos: 1784 GB, non-scraped: 0 B, average rate: 46776 kB/s
non-tried: 229778 MB, bad-sector: 0 B, error rate: 0 B/s
rescued: 1770 GB, bad areas: 0, run time: 10h 30m 51s
pct rescued: 88.51%, read errors: 668, remaining time: 1h 10m
time since last successful read: n/a
Copying non-tried blocks… Pass 1 (forwards)
ddrescue: Write error: No space left on device

**************************************************

So it rescued 88.51%. What does that mean, in terms of actual data recovery? It says it rescued 1770 GB, but I’m sure I didn’t actually have that much data (it was probably less than a terabyte). And why is there “no space left on device”?

[Monday-morning update]

Here is the final result of copying it to one of the new hard drives:

ipos: 1772 GB, non-trimmed: 57184 kB, current rate: 180 kB/s
ipos: 1986 GB, non-trimmed: 0 B, current rate: 40448 B/s
opos: 1986 GB, non-scraped: 16249 kB, average rate: 18850 kB/s
non-tried: 0 B, bad-sector: 1029 kB, error rate: 0 B/s
rescued: 2000 GB, bad areas: 2010, run time: 1d 5h 28m
pct rescued: 99.99%, read errors: 3885, remaining time: 35m

Not sure what that means in terms of data integrity, but I’m now backing up the drive to the other new drive, after which I’ll e2fsck it, then try mounting it. It’s moving the data pretty briskly, and says it will be done in about three hours.

57 thoughts on “Strange Computer Problem”

David Spain says:

March 4, 2021 at 12:20 PM

Power cycle? Maybe something went south in the SSD firmware.
1. David Spain says:
  
  March 4, 2021 at 12:32 PM
  
  Or the hard drive firmware more likely.
  1. David Spain says:
    
    March 4, 2021 at 12:40 PM
    
    You might also have a marginal hard drive. Recommend immediate back up just in case. That could take forever unless you have it shadowed somewhere. If the power cycle doesn’t help or makes it worse time to consider a swap out of /dev/sdX that hosts /home….
    
    The reason I focus on /home is that if you had done a complete reboot the modern ext3 & ext4 file systems do journaling. You should not be experiencing fsck issues across a reboot under those file systems unless there is some underlying hardware issue. Assuming a clean reboot that you didn’t interrupt…
    1. Rand Simberg says:
      
      March 4, 2021 at 12:48 PM
      
      That does seem to be the problem. I hope I can back it up.
2. Rand Simberg says:
  
  March 4, 2021 at 12:39 PM
  
  I did shut down and restart.
  1. David Spain says:
    
    March 4, 2021 at 12:41 PM
    
    Shutdown doesn’t imply power cycle. Did you power cycle?
    1. David Spain says:
      
      March 4, 2021 at 1:23 PM
      
      In Linux lingo a shutdown merely means the kernel has entered a halt loop. On older timesharing systems a shutdown implied dropping into single user mode as root with the kernel still running. A shutdown with restart means some piece of software in the kernel causes an unmaskable CPU interrupt to trigger a reboot. None of these scenarios cycle power to either the CPU or motherboard leaving asynchronous state machines and / or bus driven microcontrollers to their own mystical ways and affectations.
David Spain says:

March 4, 2021 at 12:20 PM

How was the speed when you booted from stick?
1. Rand Simberg says:
  
  March 4, 2021 at 12:38 PM
  
  It was fine.
  1. David Spain says:
    
    March 4, 2021 at 12:42 PM
    
    Assuming a power cycle see comment about disk hosting /home
David Spain says:

March 4, 2021 at 12:53 PM

A guide for doing disk forensics under Linux. HTH

https://fedoraproject.org/wiki/Smartctl
1. Rand Simberg says:
  
  March 4, 2021 at 1:37 PM
  
  Using that tool, the disk self reported that it was fine. But if I mount it at boot, the machine doesn’t want to let me log in…
  1. David Spain says:
    
    March 4, 2021 at 2:27 PM
    
    smartctl is far from perfect it’s biggest usefulness is to watch the reported errors to see if that is incrementing frequently. Sometimes you can get a result by engaging the long test and then see if the error log shows bigger numbers than it had before. Usually the self test won’t fail you have to watch the behavior of the error log. It’s far from foolproof but is one way of conducting a non-destructive test. Regrettably with Linux you have to look for symptoms. It won’t tell you outright.
    1. Rand Simberg says:
      
      March 4, 2021 at 2:34 PM
      
      Well, now the disk management tool in Gnome is telling me that the disk is self reporting as failing. I have another drive I want to back it up to, but it has two partitions on it: a boot and the OS. I deleted the boot partition, but the rest is still calling itself /dev/sd2, when I want it to be /dev/sd1. Any way to fix this, short of blowing away the other partition with all its existing data (it’s an old backup)? If I can’t renumber it, the boot will fail when it tries to mount it.
      1. David Spain says:
        
        March 4, 2021 at 2:54 PM
        
        See:
        https://superuser.com/questions/393613/how-to-renumber-a-partition#915412
        
        You can renumber the partition table by entering the ‘x’ command to enter ‘expert’ mode.
        
        Before you do the ‘w’ make sure you’ve written down or copied all the info from the first -l so you can put the partition table back to something useful if you mess it up.
        
        I haven’t done this in a few years so I’m reticent to give you too much advice here. Lest I get it wrong.
    2. David Spain says:
      
      March 4, 2021 at 2:35 PM
      
      The smartctl self test is driven by the disk vendor’s firmware. And frankly it appears most vendors can’t be bothered with a decent self test if they bother at all. Something even guides like Tom’s Hardware ought to make an issue of but hey $/bit is Uber alles!
      1. Rand Simberg says:
        
        March 4, 2021 at 2:49 PM
        
        I see that Best Buy has a Western Digital 2 Terabyte for fifty bucks, so I’ll just back up to a new drive.
      2. David Spain says:
        
        March 4, 2021 at 2:56 PM
        
        +42
      3. George Turner says:
        
        March 4, 2021 at 3:03 PM
        
        I found a vintage IBM PC 70 MB full-height drive on Ebay for $8.50. Proven, reliable, and offering plenty of storage space for documents, recipes, and games. But they want $14+ for shipping, and I’m not sure your PC can mount a full-height drive. So you’re better off with the 2 Terabyte one.
      4. David Spain says:
        
        March 4, 2021 at 5:16 PM
        
        Sometimes we just have to make due with less.
David Spain says:

March 4, 2021 at 5:33 PM

Try dd again with the arguments used for disk to disk in this example:

https://www.thegeekdiary.com/how-to-backup-linux-os-using-dd-command/
Ed Minchau says:

March 4, 2021 at 5:36 PM

You have more problems with Linux in a year than I do with Windows in a decade.
1. David Spain says:
  
  March 4, 2021 at 5:40 PM
  
  To be fair this is a hardware issue.
2. Rand Simberg says:
  
  March 4, 2021 at 5:44 PM
  
  This would have happened regardless of the OS.
Sigivald says:

March 5, 2021 at 8:46 AM

<I.I got the new drive, and started to dd the data from the old drive to it. The process died after about 2.7G, with an “i/o error.” How screwed am I?

You’re “this is why we make backups” screwed, at least potentially.

But it’s a useful lesson – if it ain’t backed up, it’s already gone is the one true way to approach data.
1. David Spain says:
  
  March 5, 2021 at 9:32 AM
  
  True, perhaps an investment in a network drive might be in order?
  1. David Spain says:
    
    March 5, 2021 at 9:35 AM
    
    Or invest in another WD disk and set the two up in a RAID array with one being the shadow disk. Either way.
David Spain says:

March 5, 2021 at 9:47 AM

Well, it’s copying at 11MB/s. At that rate, it’s about a third of the way through, and won’t be done until tomorrow. I’m glad it wasn’t bigger…

Assuming you are copying from rotational media this sustained slow rate is also telling. Rather than having a bad block or sector sounds like a sense amplifier is having great difficulty. Which makes multiple accesses necessary to pull the data off the recalcitrant disk which means you can only pull data off at a fractional rate of your I/O bus whether SATA or something else. Good luck but most of all be patient!
MichiCanuck says:

March 5, 2021 at 9:56 AM

After a painful experience with a failed drive, I set up an old refurbished computer to continually back up important parts of my home directory. I set up network drives and use fwbackup (a front for rsync). Works great.

I also like to always have a relatively up to date PartedMagic drive (with Clonezilla and Ghost 4 Linux) for cloning drives. Clonzilla makes more compact images, but it’s more fussy. Given your disk errors, the more brute force G4L would likely be more appropriate. It’s like a front end for dd, but it’s easy to use.

Finally, it makes sense to have a copy of Boot Repair on hand, but you know that.
1. David Spain says:
  
  March 5, 2021 at 11:02 AM
  
  +42
Peter Monta says:

March 5, 2021 at 1:31 PM

Try the ddrescue tool if the dd copy doesn’t work out. It has a more aggressive retry policy and also logs the status of each copied region.
George Turner says:

March 5, 2021 at 5:35 PM

Well, generally a drive is partitioned and then formatted before it is useable, the old “FORMAT C:” solution to many a software issue. ^_^
Rand Simberg says:

March 5, 2021 at 6:54 PM

It is distinctly possible that I didn’t format the drive after creating the partition. I’ll try again in the morning.
1. David Spain says:
  
  March 6, 2021 at 4:36 AM
  
  It’s events like these that make me happy I’m on the East Coast and fast asleep…. *_*
Michael S. Kelly says:

March 6, 2021 at 4:40 AM

“Yes, I forgot to format after petitioning…”

You have to “petition” your computer?

Um, I think you would be much better off with Windows 10, or even with some incarnation of an Apple OS – Jaguar. Leopard, or even Ferrari, and, especially, Blue Steel.
1. David Spain says:
  
  March 6, 2021 at 4:49 AM
  
  When you can pry my Linux from my cold, dead, hands…. 🙂
  
  FYI Apple OS *is* essentially Linux, under the hood….
  Drop into the command line interpreter and poke around if you need proof… like % cat /proc/cpuinfo for example…
  1. David Spain says:
    
    March 6, 2021 at 4:52 AM
    
    Been that way since System X (System 10) I believe…
    I once had the privilege of programming directly on System 6. It was fun and educational to see the lineage of Andy Hertzfeld’s work. Nicely done Andy…
  2. David Spain says:
    
    March 6, 2021 at 4:57 AM
    
    Or even better: % uname -a
    heh…
    1. David Spain says:
      
      March 6, 2021 at 10:18 AM
      
      Erm I mean: GNU/Linux sorry rms….
David Spain says:

March 6, 2021 at 4:46 AM

Um save the drive that had the 833GB transfer just in case. Your source disk sounds like it is really really on its way out. Did you get enough via the first transfer that all key files were recovered? Just sayin’ this might be the best you can do. Highly recommend a backup strategy. And of course my Space Cadet training says that I have to ask the following question: If you have one, why not just restore from it? Not trying for the salt->wound, just trying to save you time…. There’s a little bit of the Doc McCoy in me that says a little pain is good for the soul….
1. David Spain says:
  
  March 6, 2021 at 4:47 AM
  
  cp -r to selectively restore what is most important might be your friend here, just sayin….
David Spain says:

March 6, 2021 at 8:07 AM

And why is there “no space left on device”?
Did you mount the output device? The path name you gave for the output file is suspicious. Usually when you mount a drive you have to give it a mount point in /mnt. Like /mnt/newdrive/mydiectory/test.img

But why back up to an image file? Don’t you just want to go disk to disk? If you are doing direct can’t you just give it /dev/sda1 (or wherever you new drive configured) for the output file instead?
1. David Spain says:
  
  March 6, 2021 at 8:09 AM
  
  $ info ddrescue.
2. Rand Simberg says:
  
  March 6, 2021 at 8:17 AM
  
  I was following directions here. Yes, I mounted a spanking new partitioned and formatted drive to /mnt. That’s where the *.img file went.
  1. David Spain says:
    
    March 6, 2021 at 9:03 AM
    
    That’s right, I forgot, you flipped on the benefit of coffee again.
    Well maybe you are caffeine immune? In that case, you live in CA, go outside (everyday is warm and sunny in CA right?) run around the house three times (in both directions for a total of six) THEN proceed with this…. ^_^
    
    I was following directions here.
    
    Jump ahead in those instructions to the section called “Cloning directly to a new disk”. This is what you need to do. Creating a dot img file is fine for when you are just doing backups, but you need a backup AND restore. To paraphrase DEVO: “backup is what you’ll get but cloning is what you need. The disk you want”. In fact playing DEVO in the background while doing this seems quite appropriate. (I know, I know you despise DEVO, ’cause of its Kent State genesis and all that…).
David Spain says:

March 6, 2021 at 8:14 AM

See this link for disk to disk recovery. But first pour yourself a big cup of Joe:

https://datarecovery.com/rd/how-to-clone-hard-disks-with-ddrescue/
1. Rand Simberg says:
  
  March 6, 2021 at 8:18 AM
  
  I hate coffee, and it has no discernible affect on me.
Peter Monta says:

March 6, 2021 at 12:45 PM

ddrescue copies the entire device, so the destination device must be equal to or larger than the source. You mention you’re using less than 1 TB, but that’s a filesystem thing. ddrescue wants to copy the entire 1.8 TB verbatim. Like dd, it is copying sectors, not files.
1. David Spain says:
  
  March 6, 2021 at 1:02 PM
  
  Yup also put the log file someplace else like /root.
David Spain says:

March 6, 2021 at 2:38 PM

A suggestion: In future posts about computer problems (and there WILL be such posts) it would make your blog easier to follow if you put the output of troublesome console sessions down here in the comments and just stick a one paragraph summary in the OP with a reference to comments for details.
1. Rand Simberg says:
  
  March 6, 2021 at 2:56 PM
  
  I’ve added a “more” tag.
2. David Spain says:
  
  March 8, 2021 at 5:55 AM
  
  I meant to add that bumped short one paragraph summaries in the OP are fine until they get to multiple pages when you can then more tag them. It’s nice to be able to follow your progress. Good Luck!
David Spain says:

March 7, 2021 at 12:28 PM

So…. How’d you make out? All fixed?
1. Rand Simberg says:
  
  March 7, 2021 at 12:42 PM
  
  It’s still working on it. About 77% recovered, with estimates to completion ranging from a couple hours to a month, depending on current speed… Once I get it off the bad drive, I’ll probably back it up, and then do a file-system repair.
  1. George Turner says:
    
    March 7, 2021 at 1:12 PM
    
    I have a pair of 4 or 5 TB external USB drives which I should really make more periodic use of for full system backups. They’re cheap and if I traveled much they’d come in really handy for keeping all my data at hand.
    1. David Spain says:
      
      March 8, 2021 at 6:03 AM
      
      Since he now has two drives of the same size he can shadow one in an internal RAID array to leverage the speed of his I/O bus. Switchover ought to be painless and no restore needed although both drives age at the same rate. Nice to occasionally swap them each out with a newer drive to stagger their age. Like drives like tires. heh. Or go all SSD. The aging characteristics / data look good so far. If you can afford them. Age staging still a good idea even with SSDs that have no moving parts.
Pingback: More Computer Fun | Transterrestrial Musings

Comments are closed.

57 thoughts on “Strange Computer Problem”

Biting Commentary about Infinity…and Beyond!