Windows 7 – Troubleshooting Unresponsive System and Potential HD Failure

freezehard drivesmartwindowswindows 7

On a fully updated Win7 x64, every so often the system stalls for a minute or so. This has been going on for a couple months now. By stalling I mean the mouse responds and I can move windows around, but any window, any program, that is open becomes whiteish when I select it AND any new programs will not open. It doesn't matter what kind of program it is. When the stall stops all clicks I made (open new programs for example) take effect.

Nothing shows up consistently (as in every time this happens) in the event log. Today though I was able to find something, but it doesn't reveal much other than the "system was unresponsive". It's a 7009 for "A timeout was reached (30000 milliseconds) while waiting for the Windows Error Reporting Service service to connect."

It doesn't matter if I have any USB devices plug-in or not. I've ran Microsoft Security Essentials and Malwarebytes.

While the machine is unresponsive, I've noticed that Drive D (the other partition on the single internal HD in this laptop) is displayed like this in explorer. This never occurs with Drive C or any other drive on the machine. how drive D shows up in explorer in explorer.

SMART report for the physical drive: SMART report

Read benchmark by HD Tune 5 Pro, probably the most telling piece of the puzzle. Isn't this alone enough to see there is a problem with the drive, regardless of whether the unresponsiveness is caused by such purported problem? read benchmark by HD Tune 5 Pro

Here is a short hardware report:

Computer:      LENOVO ThinkPad T520
CPU:           Intel Core i5-2520M (Sandy Bridge-MB SV, J1)
               2500 MHz (25.00x100.0) @ 797 MHz (8.00x99.7)
Motherboard:   LENOVO 423946U
Chipset:       Intel QM67 (Cougar Point) [B3]
Memory:        8192 MBytes @ 664 MHz, 9.0-9-9-24
               - 4096 MB PC10600 DDR3 SDRAM - Samsung M471B5273CH0-CH9
               - 4096 MB PC10600 DDR3 SDRAM - Patriot Memory (PDP Systems) PSD34G13332S
Graphics:      Intel Sandy Bridge-MB GT2+ - Integrated Graphics Controller [D2/J1/Q0] [Lenovo]
               Intel HD Graphics 3000 (Sandy Bridge GT2+), 3937912 KB 
Drive:         ST320LT007, 312.6 GB, Serial ATA 3Gb/s
Sound:         Intel Cougar Point PCH - High Definition Audio Controller [B2]
Network:       Intel 82579LM (Lewisville) Gigabit Ethernet Controller
Network:       Intel Centrino Advanced-N 6205 AGN 2x2 HMC
OS:            Microsoft Windows 7 Professional (x64) Build 7601

The drive less than 1 year old. Do I have a defective drive? Seagate Tools diag says there is nothing wrong with the drive…

UPDATE: I noticed that the windows error reporting service entered the running state then the stopped state and the space between the two events was exactly 2 minutes. Which error it was trying to report I don't know. I check the "Reliability Monitor" and it shows no errors to be reported. I've disabled the windows error reporting service to see if the problem stops.

Best Answer

Based on the new information you have provided, I can say that there is in fact no problem at all. Then why does it “go offline” for a few seconds for up to three minutes after suspending the guest OS? Because as you said, the HDD LED light stays lit while the drive remains unresponsive because it is being heavily used.

What is happening is that when you finish using VMWare and want to sleep the guest OS, you use the standby or hibernation feature instead of shutting down. This causes VMWare to copy the contents of the VM’s RAM to disk so that it can resume where it left off without having to boot up all over again. Depending on how much memory you have assigned to the VM and how much was being used, this can mean that VMWare has to write quite a lot of data (gigabytes) to disk.

When VMWare copies the memory to disk, the drive becomes more or less unresponsive to new disk operations until the current disk operations (writing the RAM to a file) have finished. As a result, when you open My Computer, Windows tries to refresh the data but it cannot read the drive to fetch the needed data because there’s all those write commands already in line waiting to happen. Therefore it leaves it empty and looking like it’s offline until it can manage to slip in those read requests (between VMWare’s write operations).

If you open the drive in Explorer, you will see that either it will not open it at all for a while, or it will open it and flash the address bar with a green progress bar like it does whenever there is a lengthy file operation (like searching for thousands of files).

In summary, there is nothing surprising or mysterious about this situation. If instead of putting a VMWare guest OS into standby, you had just manually copied a giant file to the drive, the results would be exactly the same.

So what can you do to fix it? Aside from changing to a faster drive (or using an internal one if D: is external), your best bet is to defragment the drive. If D: is very fragmented, then when VMWare tries to flush the RAM to disk, it will cause it to thrash around a lot while writing chunks of the giant file to different areas (of course this is assuming it’s not an SSD, which if D: is still a partition on the same 0ST320LT007 drive as C:, then it’s not).

If you defragment the drive (assuming that there is sufficient free space), then the system can write the RAM file with only a few file operations in large swaths (e.g., write 1GB of data at cluster X) instead of many, many little operations (write 1MB here, write 245.18MB there, 4KB here, another 18.1MB somewhere else…) Then sleeping the VM will finish much faster and the drive will be more responsive.

To find out exactly what the access is that is causing the drive to be active and busy, you can use a tool like Process Monitor. Run it and click the class-filters to select only the file-class filter as seen below.

Now you can see what files and folders are being accessed. Make sure to memorize the hotkey to start and stop activity capturing (Ctrl+E) so that you can stop it once it starts flooding with what is likely to be the disk operations from VMWare.

Screenshot of Proccess Monitor with only file class filter active

Related Question