echo 0 > /proc/sys/kernel/hung_task_timeout_secs
only silences the warning. Besides that it has no effect whatsoever. Any value above zero will cause this message to be issued whenever a task is blocked for that amount of time.
The warning is given to indicate a problem with the system. In my experience it means that the process is blocked in kernel space for at least 120 seconds usually because the process is starved of disk I/O. This can be because of heavy swapping due to too much memory being used, e.g. if you have a heavy webserver load and you've configured too many apache child processes for your system. In your case it may just be that there are too many mysql processes competing for memory and data IO.
It can also happen if the underlying storage system is not performing well, e.g. if you have a SAN which is overloaded, or if there are soft errors on a disk which cause a lot of retries. Whenever a task has to wait long for its IO commands to complete, these warning may be issued.
Q1: “It” is wake_up
. It wakes up all tasks that are waiting for the disk data. If they weren't waiting for that data, they wouldn't be waiting on that queue.
Q2: I'm not sure I understand the question. Each wake queue entry contains a pointer to the task. try_to_wake_up
receives a pointer to the task that it's supposed to wake up. It is called once per function.
Q3: There are lots of wait queues. There's one for every event that can happen. The disk driver sets up a wait queue for each request to the disk. For example, when the filesystem driver wants the content of a certain disk block, it asks the disk driver for that block, and then the request starts with the task that made the filesystem request. Other entries may be added to the wait queue if another request for the same block comes in while this one is still outstanding.
When an interrupt happens, the disk driver determines which disk has data available from the information passed by the hardware and looks up the data structure that contains the kernel data for this disk to find which request was to be filled. In this data structure, among others, are the location where the data is to be written and the corresponding wake queue indicating what to do next.
Q4: The process makes a system call, let's say to read a file. This triggers some code in the filesystem driver which determines that the data needs to be loaded from the disk. That code makes a request to the disk driver and adds the calling process to the request's wait queue. (There are actually more layers than that, but you get the idea.) When the disk read completes, the wait queue event triggers, and the process is thus removed from the disk's wait queue. The code triggered by the wait queue event is a function supplied by the filesystem driver, which copies the data to the process's memory and causes the read
system call to return.
Best Answer
If a task is blocked, it waits for resources to become available again.
In your case there was propably either a IO-problem or a contention in the disk-area. Or your system-load was so high that there was not enough CPU-power available to finish the job in time.
I have seen this error from cron, if it tries to start a job in a very busy time.