A Simple Socket Tale
Winter 2007. The road is crammed with cars as it is time to go home from office, naturally, it is already 19.00. I looked over the window, again, depressed by the fact that I would be at least 30 minutes stuck on traffic listening to my electronic music, stressed. Trying to figure out what to do on a almost empty office, I just drop by a corner where I still see a monitor spiting light.
As a system admin, I was always curious of what this smart engineers from the R&D department get them attached to.
The senior engineer seemed stuck and frustrated on a problem debugging the function he wrote for a PLC driver. I walked next to his desk and just sit on an empty chair, hoping he would not feel I am invading his space, also hoping I won’t break his chain of thought. He looked exhausted.
After a bit of chit-chat on how the traffic suck, I was curious about what was keeping him so engaged, so I asked him directly.
He could not figure out why the firmware was receiving a specific stream of data, when the client was clearly sending a different input.
So, he start walking me through the problem. I was more than eager to listen.
When he started, a wall on my knowledge broke instantly… hold on a second ? how can you see the byte stream on a program you are writing ? This was a whole new universe for me, since the furthest I could get sseeing traffic was through tcpdump… that was my horizon… so I instantly become fascinated by the next layer :)
I told him about it, I had no idea how could I even help him, if I do not know what is a socket and have zero understanding how a software could manage connections. After my comment, I just expected a laugh or even a “Well, it is fine, may be another day we can just look at this…”, but no. He said, well, let me get something real quick so you will understand it. I was thrilled by his kidness and it motivated me more to help him out the way I could.
He opened vi, and start writing out of his brain… my fascination blasted!
In this article I am expanding his kidness for any reader and will cover what he did, to taugh me, how to write a simple program for me to help me understand how sockets work.
If you are like me in that cold 2007, keep reading, you may find it interesting!
Best way to learn is by example, so I am going to write a very simple server program that accepts TCP connections in C. Why C ? because I want. Period.
The idea is that a simple client, like netcat (nc) connects, it sends the message: “Hola, caracola”, then the server will be ready to receive streams of data
Engineer was debugging, so he implemented a small library, he call it ‘cagada’ (or something like that, cagada means dump in Spanish) as he was observing the raw memory in hex byte, for the specific debugging he was performing. So, firt goal was to actually see what was above the network layer and what his daemon was getting.
He opened his fancy anjuta editor in Gnome with a blank documented and throwed first line of code:
void dump(const unsigned char *data_buffer, const unsigned int length)
To start with, he needed a small function that receive 2 parameters. data_buffer is a pointer to the start of the data he wanted to dump, which was expected to be a sequence of unsigned characters (bytes) by definition, later will make more sense. Declaring it as const as he did not want function to modify any content of the data. It’s a good practice to use const when you don’t intend to modify the input parameters to make your code more readable and to prevent accidental modifications (learned that day!)
For the lenght, his significance was obvious: It specifies the length of the data buffer. The function will use parameter to iterate through the data buffer, ensuring that it processes each byte in the specified range. Like data_buffer, it is also declared as const to indicate that the function won’t modify its value. Together, these parameters allow the cagada function to operate on any block of memory by passing the starting address of the data (data_buffer) and the length of the data (length). This flexibility makes the function reusable for different scenarios and data sets.
Fine, till now I could understand why was expected both parameters, so later he continued declaring some auxiliary variables. As his intent was to iterate on the data buffer, I expected at the bare minimum to declare as an auxliary a couple, just to serve as a trivial index.
Another variable is in the mix, byte, which is an unsigned char. Made sense, unsigned char type in C is specifically designed to represent small integers ranging from 0 to 255 (2^8 - 1), which exactly matches the range of values that can be stored in a byte.
unsigned int i, j;
unsigned char byte;After the declaration, we started to put meat on.
for(i=0; i < length;i++){
byte = data_buffer[i];
printf("02x ", data_buffer[i]);
We wanted to print the value of data_buffer[i] to the console and observe what was spitted. The format specifier “%02x” is used to print the value as a two-digit hexadecimal number with leading zeros.
if(((i%16)==15) || (i==length+1))After processing 16 bytes or reaching the end of the array, then we print the ASCII representation of the processed bytes. Non-printable characters are represented as dots (.).
Condition (i % 16) == 15 prints the bytes in groups of 16. After printing the hexadecimal representation of 16 bytes, he added a separator (—) and then print ASCII representation of those bytes. This grouping simply, made it easier for a human reader to interpret and analyze the data, this is important especially when dealing with large amounts of binary or hexadecimal data, as we will see later
My attention was caught after seeing this conditional, this hardcoded values did not make any sense to me. He kept hitting the keyboard almost without thinking.
if((byte > 31) && (byte < 127))
printf("%c", byte);
else
printf(".");He began explaining:
byte > 31 checks and byte < 127 if the ASCII value of the byte is greater than 31. ASCII values less than 32 are control characters (non-printable characters), and this condition ensures that only characters with values 32 and above are considered, all the way to 127. For the purpose of the exercise we were what we cared about.
Here you can see the full function.
/* cagada.h */
void cagada(const unsigned char *data_buffer, const unsigned int length) {
unsigned int i, j;
unsigned char byte;
for(i=0; i < length ; i++) {
byte = data_buffer[i];
printf("%02x ", data_buffer[i]);
if(((i%16)==15) || (i==length+1)){
for(j=0; j<15-(i%16); j++)
printf(" ");
printf("--- ");
for(j=(i-(i%16)); j<=i; j++) {
byte = data_buffer[j];
if((byte > 31) && (byte < 127))
printf("%c", byte);
else
printf(".");
}
printf("\n");
}
}
}So more or less I got it, is a simple iteration with a bit of decoration / human readable component. Assuming cagada will be called as needed by the main function we wanted to deal with. After saving the file, he modified its main program to include the flag. Obviously, I do not remember the exact code and this is more or less coming from the bottom of my brain, but I do remember the fundamentals. For the sake of the example, I am going to write a small server program that uses cagada to see it in action.
First, let’s include our cagada library and define the port for our server. Note the " " instead of <>, indicates same directory.
#include "cagada.h"
#define PORT 8888Below, the main function and the main fun. We set up a socket first using socket() function. To this point, we never mentioned the type of socket, of course is a TCP/IP socket and its protocol family is PF_INET for IPv4 (Why IPv6 if no ones use it?), ah yeah! and a SOCK_STREAM
int main(void) {
int sockfd, new_sockfd;
struct sockaddr_in host_addr, client_addr;
socklen_t sin_size;
int recv_length=1;
char buffer[1024];
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
perror("in socket");Once our socket is created, we wanted to give him some personality touches by using setsockopt()
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, 1, sizeof(int)) == -1)
perror("setting socket option SO_REUSEADDR");For reusing the socket, we must use SO_REUSEADDR, so we avoid port clash when require to use the existing address port combo. This usually happens when the program is not closed properly, so if try to re run or other program calls it, allowing a socket to bind to a local address even if the address is still in the TIME_WAIT state, avoiding fail by marking it as already in use.
SOL_SOCKET (Socket Options Level) is a constant that represents the socket option level at which certain socket options reside. It is part of the setsockopt function. When using with setsockopt or even getsockopt, the level parameter specifies the protocol level at which the option resides. SOL_SOCKET indicates that the option is at the socket level. Socket options defined at this level are generic and apply to various socket types, including previously mentioned SO_REUSEADDR!
So you guess what the 1 is for!? Exactly! to enable the option.
And last but not least, specifying the sizeof(int) points to the size of the option value being passed, which is the size of an integer. It ensures that the correct number of bytes are read from the memory location pointed to by the previous parameter (1)
There are many other options defined in sys/socket.h if you are curious
host_addr.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = 0;
memset(&(host_addr.sin_zero), '\0', 8);
if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
perror("binding to socket");
if (listen(sockfd, 5) == -1)
perror("listening on socket");host_addr contains structure for use in the bind call, similar to the socket we are using family AF_INET (IPv4).
htons stands for “host to network short,” and it is a function that converts the port number from host byte order to network byte order. Network byte order is the order in which bytes are transmitted over the network (most significant byte first!). This conversion is mostly important for ensuring portability across different architectures.
s_addr sets IP address of the socket. In our case, it sets the IP address to 0, meaning that the socket will be bound to all available network interfaces on the host. It’s a way of indicating that the socket should listen on any available IP address, translated to 0.0.0.0 memset(&(host_addr.sin_zero), ‘\0’, 8);:
memset is used for initializing remaining bytes of the struct sockaddr_in that are not explicitly set. The sin_zero field is present to make the structure the same size as struct sockaddr, which is used for generic socket operations. The memset function is used to set these additional bytes to zero (’\0’).
Once the struct host address struct is defined and clear, we move on to spark the magic! bind!
if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
perror("binding to socket");
if (listen(sockfd, 5) == -1)
perror("listening on socket");These two blocks of code are crucial steps in setting up a server socket.
After successfully bind()ing, the socket to a specific IP address and port, passed as a parameter, then server is ready to accept incoming connections.
The listen function allows the socket to queue up a certain number of incoming connections, and any further incoming connections will be rejected until the server has processed some of the existing ones.
At this very moment our server must be almost ready! Now we need to start a loop and wait for connections…
while(1){
sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
if(new_sockfd == -1)
perror("accepting connection");
printf("server: got connection from %s port %d\n",
inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));
send(new_sockfd, "Connected!\n\n", 13, 0);
while(recv_length > 0) {
printf("RECV: %d bytes\n", recv_length);
cagada(buffer, recv_length);
recv_length = recv(new_sockfd, &buffer, 1024, 0);
}
close(new_sockfd);first, accept is used to accept(surprise!) a new incoming connection on a listening socket, as seen before sin_size represents the size of the client’s address structure and new_sockfd is the file descriptor for the newly accepted connection. If accept returns -1, it indicates an error in accepting the connection, and perror is used to print an error message. If the connection is accepted successfully, the server proceeds to print information about the client’s address and port.
later, for handling the Connection, inet_ntoa converts the binary representation of the client’s IP address to a human-readable string and nthos converts the network byte order to host byte order for the client’s port number. At the end, we are humans at 1930 in the evening of the evening looking at streams of data !
Finally prints information about the client’s IP address and port and sends a simple message (“Connected!\n\n”) to the client.
Then we start receiving data in a loop, The server enters a loop to continuously receive data from the client, then passing the buffer to our debugger function cagada
If recv_length is greater than 0, the server prints the number of bytes received and processes the data using the cagada function for debugging purposes
The loop continues until recv returns 0, indicating that the connection has been closed by the client.
And finally, like any other file descriptor, is within good practice to close it!
Full code can be seen at the bottom!
Now, let’s see how it looks like in action ? Shall we ?
Let’s compile and run it!
tony@kitt:~/testing/server$ gcc server.c
tony@kitt:~/testing/server$ ./a.out Once the server run, we can use a simple netcat client to see what is going through the app, then send a random string
tony@kitt:~/testing/server$ nc 0.0.0.0 8888
Connecting
La histora es nuestra, y la hacen los pueblos. Salvador AllendeHere is the dump on the server.
tony@kitt:~/testing/server$ ./a.out
server: got connection from 127.0.0.1 port 40804
RECV: 1 bytes
f0 RECV: 63 bytes
4c 61 20 68 69 73 74 6f 72 61 20 65 73 20 6e 75 | La histora es nu
65 73 74 61 2c 20 79 20 6c 61 20 68 61 63 65 6e | estra, y la hace
20 6c 6f 73 20 70 75 65 62 6c 6f 73 2e 20 53 61 | n los pueblos. SAs we conclude this exploration into the realms of socket programming, let’s remember the importance of curiosity and shared knowledge in our journey as developers. The cold winter of 2007 not only witnessed frustrated commutes but also the ignition of a curiosity that led to a deeper understanding of the intricate workings of sockets.
Whether you are a seasoned developer or someone finding their way through the intricate world of programming, embrace the spirit of exploration and continuous learning. As we navigate the streams of data, let the fascination for understanding the underlying mechanisms propel us forward.
In the words of Salvador Allende, “La historia es nuestra, y la hacen los pueblos” (History is ours, and it is made by the people). Likewise, the history of our programming journey is shaped by the knowledge we seek, share, and pass on to future generations of developers. Happy coding!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include "cagada.h"
#define PORT 8888
int main(void) {
int sockfd, new_sockfd;
struct sockaddr_in host_addr, client_addr;
socklen_t sin_size;
int recv_length=1;
char buffer[1024];
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
perror("in socket");
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
perror("setting socket option SO_REUSEADDR");
host_addr.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = 0;
memset(&(host_addr.sin_zero), '\0', 8);
if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
perror("binding to socket");
if (listen(sockfd, 5) == -1)
perror("listening on socket");
while(1){
sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
if(new_sockfd == -1)
perror("accepting connection");
printf("server: got connection from %s port %d\n",
inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));
send(new_sockfd, "Connected\n\n", 13, 0);
while(recv_length > 0) {
printf("RECV: %d bytes\n", recv_length);
cagada(buffer, recv_length);
recv_length = recv(new_sockfd, &buffer, 1024, 0);
}
close(new_sockfd);
}
return 0;
}