Linux – In which thread does block driver issue commands to block device

block-deviceiolinux-kernel

I'm currently reading "Understanding The Linux Kernel" book, as I've understood block I/O request order is like this; user space call => vfs call => I/O scheduler call => block device driver call.

What I want to know is where the block device driver issue commands to block device (hard disk etc)? Does it issue commands in a dedicated scheduled kernel thread or in the user process thread where the block I/O request begin?

Best Answer

If you're not doing synchronous IOs, the user doing the write will just create transactions for the kjournald thread to dequeue (note here: I'm using an ext3 file system with journal=ordered, traces given using Linux 3.0-rc7)

We can have a look at what is happening by putting a breakpoint on the IO scheduler elevator_dispatch_fn method, for example deadline_dispatch_requests for the deadline IO scheduler:

There are two ways the device queue will run:

either called from the kjournald thread that will run (ext3 file system, mounted with the default commit=5, and is therefore scheduled to run every 5s)

#0  deadline_dispatch_requests (q=, force=0) at block/deadline-iosched.c:246
#1  __elv_next_request (q=<optimized out>) at block/blk.h:86
#2  blk_peek_request (q=q@entry=) at block/blk-core.c:1829
#3  scsi_request_fn (q=) at drivers/scsi/scsi_lib.c:1511
#4  __blk_run_queue (q=) at block/blk-core.c:305
#5  queue_unplugged (q=, depth=1, from_schedule=<optimized out>) at block/blk-core.c:2673
#6  blk_flush_plug_list (plug=plug@entry=, from_schedule=from_schedule@entry=false) at block/blk-core.c:2755
#7  blk_finish_plug (plug=plug@entry=) at block/blk-core.c:2762
#8  journal_commit_transaction (journal=journal@entry=) at fs/jbd/commit.c:412
#9  kjournald (arg=) at fs/jbd/journal.c:152
#10 kthread (_create=) at kernel/kthread.c:96
#11 kernel_thread_helper () at arch/x86/kernel/entry_64.S:1161
#12 ?? ()

Either called from an interrupt:

#0  deadline_dispatch_requests (q=, force=0) at block/deadline-iosched.c:246
#1  __elv_next_request (q=<optimized out>) at block/blk.h:86
#2  blk_peek_request (q=q@entry=) at block/blk-core.c:1829
#3  scsi_request_fn (q=) at drivers/scsi/scsi_lib.c:1511
#4  __blk_run_queue (q=) at block/blk-core.c:305
#5  blk_run_queue (q=q@entry=) at block/blk-core.c:339
#6  scsi_run_queue (q=q@entry=) at drivers/scsi/scsi_lib.c:449
#7  scsi_next_command (cmd=cmd@entry=) at drivers/scsi/scsi_lib.c:502
#8  scsi_end_request (requeue=<optimized out>, bytes=<optimized out>, error=<optimized out>, cmd=) at drivers/scsi/scsi_lib.c:574
#9  scsi_io_completion (cmd=cmd@entry=, good_bytes=<optimized out>) at drivers/scsi/scsi_lib.c:822
#10 scsi_finish_command (cmd=cmd@entry=) at drivers/scsi/scsi.c:847
#11 scsi_softirq_done (rq=<optimized out>) at drivers/scsi/scsi_lib.c:1456
#12 blk_done_softirq (h=<optimized out>) at block/blk-softirq.c:34
#13 __do_softirq () at kernel/softirq.c:238
#14 call_softirq () at arch/x86/kernel/entry_64.S:1210
#15 do_softirq () at arch/x86/kernel/irq_64.c:80
#16 invoke_softirq () at kernel/softirq.c:325
#17 irq_exit () at kernel/softirq.c:340
#18 smp_apic_timer_interrupt (regs=<optimized out>) at arch/x86/kernel/apic/apic.c:862
#19 <signal handler called>
#20 irq_stack_union ()

Now if you are doing synchronous calls, the request_fn method will run directly within the write system call, as seen below:

#0  deadline_dispatch_requests (q=, force=0) at block/deadline-iosched.c:246
#1  __elv_next_request (q=<optimized out>) at block/blk.h:86
#2  blk_peek_request (q=q@entry=) at block/blk-core.c:1829
#3  scsi_request_fn (q=) at drivers/scsi/scsi_lib.c:1511
#4  __blk_run_queue (q=) at block/blk-core.c:305
#5  queue_unplugged (q=, depth=1, from_schedule=<optimized out>) at block/blk-core.c:2673
#6  blk_flush_plug_list (plug=<optimized out>, from_schedule=from_schedule@entry=false) at block/blk-core.c:2755
#7  blk_flush_plug (tsk=<optimized out>) at include/linux/blkdev.h:880
#8  io_schedule () at kernel/sched.c:5669
#9  sleep_on_page (word=<optimized out>) at mm/filemap.c:182
#10 __wait_on_bit (wq=, q=q@entry=, action=action@entry=<sleep_on_page>, mode=mode@entry=2) at kernel/wait.c:202
#11 wait_on_page_bit (page=page@entry=, bit_nr=bit_nr@entry=13) at mm/filemap.c:571
#12 wait_on_page_writeback (page=) at include/linux/pagemap.h:394
#13 filemap_fdatawait_range (mapping=mapping@entry=, start_byte=start_byte@entry=0, end_byte=end_byte@entry=511) at mm/filemap.c:292
#14 filemap_write_and_wait_range (mapping=, lstart=0, lend=511) at mm/filemap.c:371
#15 filemap_write_and_wait_range (mapping=mapping@entry=, lstart=lstart@entry=0, lend=lend@entry=511) at mm/filemap.c:378
#16 vfs_fsync_range (file=, start=start@entry=0, end=end@entry=511, datasync=0) at fs/sync.c:176
#17 generic_write_sync (file=file@entry=, pos=pos@entry=0, count=count@entry=512) at fs/sync.c:242
#18 generic_file_aio_write (iocb=<optimized out>, iov=, nr_segs=1, pos=<optimized out>) at mm/filemap.c:2614
#19 do_sync_write (filp=, buf=<optimized out>, len=<optimized out>, ppos=) at fs/read_write.c:348
#20 vfs_write (file=file@entry=, buf=buf@entry="", count=<optimized out>, count@entry=512, pos=pos@entry=) at fs/read_write.c:377
#21 sys_write (fd=<optimized out>, buf="", count=512) at fs/read_write.c:429
Related Question