I'm working on a homework assignment right now for an introduction to Operating Systems and am having quite a bit of fun, but confusion at the same time. I'm working on piping at the moment; my bit of code below here.
Originally, my code looked like this:
// Child process - write
if (fork() == 0) {
fprintf(stderr, "Child\r\n");
close(1);
dup(p[1]);
close(p[0]);
close(p[1]);
runcmd(pcmd->left);
// Parent process - read
} else {
wait(0);
close(0);
dup(p[0]);
close(p[0]);
close(p[1]);
fprintf(stderr, "Parent\r\n");
runcmd(pcmd->right);
}
My thought process towards this was that the parent would wait until the child was terminated, then read from the pipe and that was it. I posted this code to my instructor on our discussion page and he told me that there were several issues with the code, one of which was:
- The parent process could hang infinitely if the child was running over a long enough input that it blocks the pipe.
He mentioned that the correct implementation therefore (in regards to wc
), would be to use a blocking read command, which would wait on the pipe until data was available, and then begin reading until the pipe has closed.
I tried looking around for some way to "read" from the pipe the moment it had data in it, but was unsure of how to go around it. In the end, in an effort to try to solve the issue with the possibility of waiting forever on a blocked pipe, I had the parent and child run simultaneously in parallel, but that may mean that the reading process may terminate first and not read in all the data before write has finished. How would I go about addressing the issue?
int p[2];
pipe(p);
// Child process - read
if (fork() == 0) {
fprintf(stderr, "Start child\r\n");
close(0);
dup(p[0]);
close(p[0]);
close(p[1]);
fprintf(stderr, "Child\r\n");
runcmd(pcmd->right);
// Parent process - write
} else {
fprintf(stderr, "Start parent\r\n");
close(1);
dup(p[1]);
close(p[0]);
close(p[1]);
fprintf(stderr, "Parent\r\n");
runcmd(pcmd->left);
}
Edit: I also tried the read
command, but was unsure of how to actually use it since it requires the buffer, and also the expected size to read in (?). I'm uncertain of how to retrieve either of those when you don't know the size of the incoming data.
Best Answer
Piping is simple. You’re making it hard on yourself by jumping into the pool at the deep end. (Or perhaps it’s your instructor’s fault for not guiding you better.)
To become more comfortable with pipes, I suggest that you write two trivially simple programs:
One that just writes some text to the standard output and exits. It can be something simple — “The quick brown fox jumps over the lazy dog.”, “Lorem ipsum dolor sit amet, consectetur adipiscing elit, …”, a short string (maybe even a single character) repeated many times — whatever you want. Use
printf
,write
,fprintf(stdout, …)
, or whatever other function(s) you like.To test this program, just run it from a shell prompt. It should display the chosen text and exit (return you to your shell prompt).
And one that just reads text from the standard input and writes it to standard output. Use
getc
,gets
,read
, or whatever other function(s) you like. Exit when you get end-of-file. Check the man page for whatever function you use to see how it indicates end-of-file.To test this program, create a text file (called something like
jon_file.txt
) and put some text into it. You can do this quickly by saying something likeecho "Hello world" > jon_file.txt
, or you can use an editor. Then typeprog2 < jon_file.txt
. It should display the contents of the file and exit (return you to your shell prompt).Don’t call
pipe
,dup
, or anything fancy — not evenopen
orclose
. (Do include whatever debugging and/or auditing code you want to ensure that you understand what is happening when.) And then runprog1 | prog2
. If you’ve done it correctly, you’ll get the output you expect.Now try to “break” it by adding
sleep
calls to the programs. If you break it, let me know how you did it. It should be almost impossible — unless you make one program (or both) sleep for longer than you’re willing to sit and wait, you’ll always getprog2
to output all the data thatprog1
writes.And in case the above example doesn’t make it clear: having the parent and child (or, in general, the processes on both sides of a pipe) run “simultaneously” is the right thing to do.1 The reading program won’t “terminate first” just because there is no data in the pipe currently. As you should have learned from the above exercise, if a program tries to read from a pipe that has no data in it currently, the
read
system call will force the program to wait until data arrive. The reading program won’t terminate until there are no data left in the pipe and no more coming, ever.2 (At this point,read
will return an end-of-file.) The “no more data coming ever” condition is indicated by the writing program closing the pipe (or exiting, which is equivalent, becauseexit
callsclose
on all open file descriptors).I don’t understand why you’re sweating the
read
system call at this point — although, if you don’t know how to use it yet, that confirms my suspicion that your instructor is presenting material out of logical order. (I assume that you mean theread
system call and not theread
command.) The only way your program makes sense is ifruncmd(pcmd->right)
is something that reads from standard input by some method (like ourprog2
program, above). It looks like your program is just doing the function of the shell — setting up the pipes, and then letting the programs run. At that level, there’s no reason for your program (to the extent that you have shown it to us) to do any I/O (reading or writing).__________
1 Related reading: In what order do piped commands run?
2 Of course this is an oversimplification. As you will learn soon, if you haven’t already, you can design the reading program to terminate when there is no data in the pipe currently — but that’s not the default behavior. Or you can design the reading program to terminate under any number of other conditions — e.g., if it reads a
q
from the pipe. Or it could be killed by a signal, etc…I’m looking back at this answer six months later, and I see that I really didn’t address the entire question; I covered the second half, but not the first. So, continuing from the above,
Modify the first program to write a lot of data — at least 100,000 (105) or 102400 (210×102) characters — to stdout. Also, if you haven’t already done this, modify it to write some on-going status information to stderr. This can be something very simple; e.g., one “
.
” to stderr for every 1000 (or 1024) characters to stdout, and “!\n
” to stderr when it’s done.To test this, run
prog1 > /dev/null
. If you followed my suggestion (above), you should see 100 dots (.
), followed by!
and a newline. If you don’t have any calls tosleep()
or other time-consuming functions inprog1
, this output should come fairly quickly.Then run
prog1 | wc -c
. It should display your stderr status information, as mentioned above, followed by100000
or102400
or however many bytes you wrote to stdout. (This will be the output fromwc -c
, reporting how many bytes it read from its stdin (the pipe).)Modify the second program to
sleep
10 or 20 seconds before it starts reading.To test this, run
prog2 < jon_file.txt
again. Obviously it should pause for the amount of time you specified in yoursleep()
, and then display the contents of the file and exit (return you to your shell prompt).Now run
prog1 | prog2 > /dev/null
. But, before you do that, you might want to try to guess what will happen.︙
︙
︙
I expect that it will print some dots — maybe 8, maybe 64 or 65, maybe some other number — and then the pause, and then the rest of the dots, and the
!
. This is becauseprog1
can start writing immediately, even ifprog2
isn’t reading yet. The pipe can hold the data untilprog2
is ready to start reading — but only up to a point. The pipe has a buffering limit. This may be 8000 (or 8192), 64000 (or 65536), or some other number. When the pipe is full, the system will forceprog1
to wait. Whenprog2
starts reading, it drains the pipe; this makes room for the pipe to hold more data, and soprog1
is allowed to start writing again.If you don’t see the above behavior at first, try increasing the numbers: 200,000 bytes, 30 seconds, etc.
So your teacher was partly right when he criticized the first draft of your program. (Or, perhaps, he was exactly right, and you misquoted him.) As you understand, that version of the program waited for the
runcmd(pcmd->left)
program (the pipe writer) to finish, and then it would startruncmd(pcmd->right)
(the pipe reader). But what it the left program outputs 100,000 bytes? It will fill the pipe and then wait until it can write some more. But it won’t be able to write more until “somebody” reads from the pipe and drains the storage buffer. But the main program won’t start the pipe reader until the pipe writer has finished. Everybody is waiting for somebody else to do something, which they won’t do until the first guy has done something. (“I’ll give you the jewel as soon as you give me the money.” / “No, I’ll give you the money after you give me the jewel.”) So, yeah; bottom line: if data stopped moving through the pipe because it was full and no process was reading from it, then both processes would hang infinitely.This sort of situation is known casually, culturally, as a Catch-22. In computer science, it is formally called a deadlock, informally called a deadly embrace.