Streaming Data to REST Service Using Unix Pipes

curlpipewget

Based on an answer to another question I am using curl to stream the stdout of one process as the entity of a POST request:

myDataGeneratingApp \
| curl -H "Content-Type: application/json" -H "Transfer-Encoding: chunked" -X POST -d @- http://localhost:12000

Unfortunately, curl is waiting for EOF from the stdout before it begins sending the data. I know this because I can run my application stand-alone and data comes out to the console immediately but when I pipe to curl there is a significant delay before the service begins receiving data.

How can I use curl to stream data immediately as it becomes available from the standard out of the application? If not possible in curl then is there another solution (e.g. wget)?

Best Answer

Looking through the curl code transfer.c it seems that the program is able to repackage request data (from curl to the server) using the chunking protocol, where each chunk of data is prefixed by the length of the chunk in ascii hexadecimal, and suffixed by \r\n.

It seems the way to make it use this in a streaming way, after connecting to the server is with -T -. Consider this example:

for i in $(seq 5)
do date
   sleep 1
done | 
dd conv=block cbs=512 |
strace -t -e sendto,read -o /tmp/e \
 curl --trace-ascii - \
 -H "Transfer-Encoding: chunked" \
 -H "Content-Type: application/json" \
 -X POST -T -  http://localhost/...

This script sends 5 blocks of data, each beginning with the date and padded to 512 bytes by dd, to a pipe, where strace runs curl -T - to read the pipe. In the terminal we can see

== Info: Connected to localhost (::1) port 80 (#0)
=> Send header, 169 bytes (0xa9)
0000: POST /... HTTP/1.1
001e: Host: localhost
002f: User-Agent: curl/7.47.1
0048: Accept: */*
0055: Transfer-Encoding: chunked
0071: Content-Type: application/json
0091: Expect: 100-continue
00a7: 
<= Recv header, 23 bytes (0x17)
0000: HTTP/1.1 100 Continue

which shows the connection, and the headers sent. In particular curl has not provided a Content-length: header, but an Expect: header to which the server (apache) has replied Continue. Immediately after comes the first 512 bytes (200 in hex) of data:

=> Send data, 519 bytes (0x207)
0000: 200
0005: Fri Sep 14 15:58:15 CEST 2018                                   
0045:                                                                 
0085:                                                                 
00c5:                                                                 
0105:                                                                 
0145:                                                                 
0185:                                                                 
01c5:                                                                 
=> Send data, 519 bytes (0x207)

Looking in the strace output file we see each timestamped read from the pipe, and sendto write to the connection:

16:00:00 read(0, "Fri Sep 14 16:00:00 CEST 2018   "..., 16372) = 512
16:00:00 sendto(3, "200\r\nFri Sep 14 16:00:00 CEST 20"..., 519, ...) = 519
16:00:00 read(0, "Fri Sep 14 16:00:01 CEST 2018   "..., 16372) = 512
16:00:01 sendto(3, "200\r\nFri Sep 14 16:00:01 CEST 20"..., 519, ...) = 519
16:00:01 read(0, "Fri Sep 14 16:00:02 CEST 2018   "..., 16372) = 512
16:00:02 sendto(3, "200\r\nFri Sep 14 16:00:02 CEST 20"..., 519, ...) = 519
16:00:02 read(0, "Fri Sep 14 16:00:03 CEST 2018   "..., 16372) = 512
16:00:03 sendto(3, "200\r\nFri Sep 14 16:00:03 CEST 20"..., 519, ...) = 519
16:00:03 read(0, "Fri Sep 14 16:00:04 CEST 2018   "..., 16372) = 512
16:00:04 sendto(3, "200\r\nFri Sep 14 16:00:04 CEST 20"..., 519, ...) = 519
16:00:04 read(0, "", 16372)             = 0
16:00:05 sendto(3, "0\r\n\r\n", 5, ...) = 5

As you can see they are spaced out by 1 second, showing that the data is being sent as it is being received. You must just have at least 512 bytes to send, as the data is being read by fread().

Related Question