Linux – fsck has been running for more than 30 days on 30TB ext4 partition, can’t mount

filesystemsfscklinux

Given a 30TB partition of disks on hw raid5.
LVM is on top and filesystem is ext4. (It's 99,9% full of data.)
I wanted to add another 20TB and resize the partition and filesystem.
Before resizing, it insisted on running FSCK first.
It has been running for more than a week, I canceled, but was unable to mount the partition. FSCK was required first.
So I started it again.

fsck.ext4 -v -C 0 /dev/vgname/lvname
e2fsck 1.44.3 (10-July-2018)
Superblock has an invalid journal (inode 8).
Clear<y>? yes
*** journal has been deleted ***

Resize inode not valid.  Recreate<y>? yes

And since 31 days passed, and it's still running, occupying 1 CPU core 100%.

When looking at it with strace, this is what I see:

strace -p 3174
strace: Process 3174 attached
strace: [ Process PID=3174 runs in x32 mode. ]
strace: [ Process PID=3174 runs in 64 bit mode. ]
pread64(4, "\375\210\372\374\360\10\375=$\375\254\221\375\334\361\375l?\376?U\376\24?\376\27\351\375:\305\375\217"..., 4096, 2447145635840) = 4096
mremap(0x7fa5e3565000, 208764928, 208769024, MREMAP_MAYMOVE) = 0x7fa5e3565000
pread64(4, "\0\305\7\0\321\376\377q\367\377Q\364\377\371\361\377H\355\377\323\346\377\271\337\377\275\332\377J\326\377\16"..., 4096, 1724118507520) = 4096
pread64(4, "x\377\371p\377_b\377\177W\377\35[\377\223N\377\226[\377&h\377QS\377\203O\377sT\377"..., 4096, 3443764559872) = 4096
pread64(4, "\377\263\371\377\375\355\377\363\6\0\367\356\377\326\21\0\350\353\377?\30\0\242\345\377\375\26\0|\344\377D"..., 4096, 6956990242816) = 4096
pread64(4, "\0\3201\273\0\24)\273\0\34=\273\0\336/\273\0\316/\273\0\3167\273\0\220*\273\0\3569\273"..., 4096, 8609803698176) = 4096
pread64(4, "o\f\257\205\16\377=\20\367\270\21\376\312\22\252R\0234\227\23\242\303\23\234\343\23Z\376\23LI\24"..., 4096, 1755810463744) = 4096
mremap(0x7fa5e3565000, 208769024, 208773120, MREMAP_MAYMOVE) = 0x7fa5e3565000
pread64(4, "\22\0\\\2\0\347\352\377\347\303\377?\250\3776\224\377Ht\377\17W\377\245G\377\5G\377}[\377"..., 4096, 14672424988672) = 4096
mremap(0x7fa5e3565000, 208773120, 208777216, MREMAP_MAYMOVE) = 0x7fa5e3565000
pread64(4, "\255\2\202)#m\22\5N\244F\210\221\20+.\21\5\352\306\344\220\25\3567\250\16\323\2\247P\352"..., 4096, 16981972766720) = 4096
mremap(0x7fa5e3565000, 208777216, 208781312, MREMAP_MAYMOVE) = 0x7fa5e3565000
pread64(4, "M\0\205N\0KO\0\4P\0\221P\0)Q\0\336Q\0\204R\0SS\0\tT\0\371T\0"..., 4096, 833004105728) = 4096

A new line is produced every 30-60 seconds, so pretty seldom.
Can anyone give me a clue what is happening and shall I wait or what is to be done to be able to access the data again?

Best Answer

thank you all for suggestions. The disk was unmounted already before I ran fsck. After I got the response suggestion from antony_sebastian I logged in to the server to try this, resumed my screen command and fsck was waiting for input. Surprisingly, after 33 days of checking, it finished processing the 30TB disk. Responding 'yes' to all the fixable issues, the data was back, although everything moved under "Lost+found" and the root directory tree folder names were lost. Other than that the data was intact and fine.

Thanks for the suggestions and help, all!

Related Question