When i connect from a "client" program
to it, typically i would specify the
Ip address + port of the target or
server system;
Yes, correct.
But what port would the client be
using ?
The client usually uses a random port. More precisely: For TCP to work, the only requirement is that the combination of destination address, destination port, source address, source port is unique - because this is used to keep track of TCP connections. So in principle the OS could just increment the source port number for each new connection. Actually, many OSes used to do this, but it made certain kinds of attacks easier, because an attacker could predict the next port number. So most modern OSes now use random source ports.
And how does the server know
which port to connect back to the
client on?
A TCP packet contains both the destination and the source port, so each side knows both port numbers. See e.g. the diagram for the data inside a TCP packet on http://en.wikipedia.org/wiki/Transmission_Control_Protocol .
Then extending this to a specific
protocol, say Ftp (typical port 21),
can i change it such that the server
uses port 69, but the client uses port
100?
Usually you can configure a server to use any port you choose (though this depends on the individual server application). So you could configure the FTP server to use port 69. The client port cannot be configured as far as I know. The same goes for any other protocol such as RDP.
At any rate, why would you want to change the client port?
Why would a port number ever be used to tell what kind of application data protocol resides inside when there's not absolute guarantee?
Because guessing is a terrible way to run things, and there is no way you can stop, for example, someone malicious from sending the wrong thing anyway. So, it helps in the case where everyone is playing nice, and doesn't make anything worse.
To my understanding, there are no restrictions to what kind of application data you send over a port (it's just a suggestion).
Correct. In fact, it isn't even a suggestion, just an agreement that a lot of people happen to share.
Plus isn't the protocol data already included somewhere in the packet for this purpose?
No. At least, not at the level that the port usually indicates: you know what sort of higher level IP protocol is being sent (eg: TCP, UDP), but not what the content of that is (eg: HTTP, SMTP).
Also, What happens to the data if you send HTTP or some other kind of protocol to a destination of port 25 (which expects SMTP)?
TCP just passes the data to the application layer, which can do anything to it that it wants. Most of the time, you just get errors. Sometimes you get exploitable security holes.
Occasionally you get nice behaviour for incorrect clients, like the plain text HTTP errors that some HTTPS servers will give when you don't use SSL to the port.
Third, what happens to the data if you send it to a port that isn't bound with any program, and therefore not being listened to?
You get an ICMP error message from the receiving system. Technically, the receiver could do anything it pleased, but in practice, that is what happens.
Finally, if a port can only be bound to a single program, how can multiple programs that depend on incoming HTTP data be running on my computer at the same time?
When your browser makes an HTTP connection to a remote server it uses a random local port, and talks to the well known port (80 or 443) on the remote server. IN this case the is unique for each distinct outbound connection. (Though, technically, it doesn't have to be, as for the server case.)
On the server side, when you listen, only one process can accept new connections on a port (in Unix / BSD sockets), but it can pass the established connection to other processes to service. Because the set is unique, traffic can be routed to the right connection.
Best Answer
Futher to Hello71s answer, it might help to visualise a port by thinking about the structure of an address in a packet. A packet being a unit of data passed around a network. TCP is an example of a transport layer protocol that uses ports, and is commonly used over IP.
So IP has two addressing components - the source IP and the destination IP. TCP adds to this by using a source port and a destination port. It is the ports that enables the recieving machine to differentiate traffic destined for the same IP address - ie, if you have a server that recieves both web requests and email on a single IP address, then you need to determine which application should recieve the data - the email service or the web service. So they may look like this if a single user was to carry out a web request and an email request to the same server:
The web service owns port 80 and the email service owns port 25 - they "listen" on their respective ports, which enables the traffic to end up in the right place.
The source port is "ephemeral" - in that is it made up at the time the packet is sent. However, it still serves a useful purpose. It enables both ends of the connection to keep track of separate conversations. Consider if our user sent two simultanous web requests:
This lets the web service know that these are separate requests, but also, the return traffic from the webserver - the web pages - are sent back to the respective source ports, which enables the browser to know which request the server is responding to.
Note that this all refers to port numbers, from a TCP/IP perspective, the actual data being moved across these ports could be anything. It doesn't care or have any awareness of applications, so if you had web traffic on port 25 and email on port 80, it would be none-the-wiser.
It is up to the sending and receiving application to ensure the data is the right structure, and this is where application protocols come in. HTTP is an example of an application protocol that web browsers use to communicate with web servers. It is a well defined protocol that ensure that the browser will send requests to any web server and that webserver will understand and respond sensibly. But what it doesn't include in its definition is anything about how packets get from A to B - that is the responsibility of the preceding layers - the transport, internet and link layers.