Ny command-line, generic HTTP proxy (like Squid)

command linehttpPROXYsquid

I can easily use Netcat (or, Socat) to capture traffic between my browser and a specific host:port.

But for Linux, does there exist any command-line counterpart of a Squid-like HTTP proxy that I can use to capture traffic between my HTTP client (either browser or command-line program) and any arbitrary host:port?

Best Answer

Both Perl and Python (and probably Ruby as well) have simple kits that you can use to quickly build simple HTTP proxies.

In Perl, use HTTP::Proxy. Here's the 3-line example from the documentation. Add filters to filter, log or rewrite requests or responses; see the documentation for examples.

use HTTP::Proxy;
my $proxy = HTTP::Proxy->new( port => 3128 );
$proxy->start;

In Python, use SimpleHTTPServer. Here's some sample code lightly adapted from effbot. Adapt the do_GET method (or others) to filter, log or rewrite requests or responses.

import SocketServer
import SimpleHTTPServer
import urllib
class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        self.copyfile(urllib.urlopen(self.path), self.wfile)
httpd = SocketServer.ForkingTCPServer(('', 3128), Proxy)
httpd.serve_forever()

Related Solutions

How to make netcat use an existing HTTP proxy

Netcat is not a specialized HTTP client. Connecting through a proxy server for Netcat thus means creating a TCP connection through the server, which is why it expects a SOCKS or HTTPS proxy with the -x argument, specified by -X:

 -X proxy_protocol
         Requests that nc should use the specified protocol when talking
         to the proxy server.  Supported protocols are “4” (SOCKS v.4),
         “5” (SOCKS v.5) and “connect” (HTTPS proxy).  If the protocol is
         not specified, SOCKS version 5 is used.

connect specifies a method for creating SSL (HTTPS) connections through a proxy server. Since the proxy is not the other end point and the connection is endpoint-wise encrypted, a CONNECT request allows you to tunnel a point-to-point connection through an HTTP Proxy (if it is allowed). (I might be glossing over details here, but it's not the important point anyway; details on "HTTP CONNECT tunneling" here)

So, to connect to your webserver using a proxy, you'll have to do what the web browser would do - talk to the proxy:

$ nc squid-proxy 3128
GET http://webserver/sample HTTP/1.0

(That question has similarities to this one; I don't know if proxychain is of use here.)

Addendum A browser using an ordinary HTTP proxy, e.g. Squid (as I know it), does what more or less what the example illustrated, as Netcat can show you: after the nc call, I configured Firefox to use 127.0.0.1 port 8080 as proxy and tried to open google, this is what was output (minus a cookie):

$ nc -l 8080
GET http://google.com/ HTTP/1.1
Host: google.com
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
DNT: 1
Proxy-Connection: keep-alive

By behaving this way, too, you can use Netcat to access a HTTP server through the HTTP proxy. Now, what should happen if you try to access a HTTPS webserver? The browser surely should not reveal the traffic to anyone in the middle, so a direct connection is needed; and this is where CONNECT comes into play. When I again start nc -l 8080 and try to access, say, https://google.com with the proxy set to 127.0.0.1:80, this is what comes out:

CONNECT google.com:443 HTTP/1.1
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
Proxy-Connection: keep-alive
Host: google.com

You see, the CONNECT requests asks the server for a direct connection to google.com, port 443 (https). Now, what does this request do?

$ nc -X connect -x 127.0.0.1:8080 google.com 443

The output from the nc -l 8080 instance:

CONNECT google.com:443 HTTP/1.0

So it uses the same way to create a direct connection. However, as this can of course be exploited for almost anything (using for example corkscrew), CONNECT requests are usually restricted to the obvious ports only.

Generic HTTP server that just dumps POST requests

Simple core command line tools like nc, socat seem not to be able to handle the specific HTTP stuff going on (chunks, transfer encodings, etc.). As a result this may produce unexpected behaviour compared to talking to a real web server. So, my first thought is to share the quickest way I know of setting up a tiny web server and making it just do what you want: dump all output.

The shortest I could come up with using Python Tornado:

#!/usr/bin/env python

import tornado.ioloop
import tornado.web
import pprint

class MyDumpHandler(tornado.web.RequestHandler):
    def post(self):
        pprint.pprint(self.request)
        pprint.pprint(self.request.body)

if __name__ == "__main__":
    tornado.web.Application([(r"/.*", MyDumpHandler),]).listen(8080)
    tornado.ioloop.IOLoop.instance().start()

Replace the pprint line to output only the specific fields you need, for example self.request.body or self.request.headers. In the example above it listens on port 8080, on all interfaces.

Alternatives to this are plenty. web.py, Bottle, etc.

^{(I'm quite Python oriented, sorry)}

If you don't like its way of outputting, just run it anyway and try tcpdump like this:

tcpdump -i lo 'tcp[32:4] = 0x484f535420'

to see a real raw dump of all HTTP-POST requests. Alternatively, just run Wireshark.

Best Answer

Related Solutions

How to make netcat use an existing HTTP proxy

Generic HTTP server that just dumps POST requests

Related Question