Hello guys I'm back with part 2 from the series how things work, if you didn't already check out part 1 from here
To start things off we'll list some functional and non functional requirements from our server
Functional Requirements
- Server should be able to accept connections
Server should be able to return the appropriate response
For the case of simplicity we'll return the process id of the process that responds to the request. That brings us to another question; How will we design the server?
We'll use a hybrid approach (again I encourage you to check out the previous blog linked above to understand correctly what I'm about to do) A mix between the Reactor pattern and Pre-Forking. Nginx uses this approach in its web server. It allows it to scale to handle millions of concurrent connections
Non Functional Requirements
- The server should be able to handle lots of connections with a fast response.
Creating the bare bones
Let's start off by creating the bare bones of our design. Let's create a file named server.rb which will have all our main server logic inside
The Preforking process is going to be before we dive in the reactor part, the workflow is as follows:
- Main server process creates a listening socket.
- Main server process forks a configurable number of child processes. [ here is where pre-forking actually happens]
- Each child process accepts connections on the shared socket and handles them independently.
Main server process keeps an eye on the child processes.
The kernel actually load balances the incoming connections to the server socket across all the processes that listen on it. They inherit the file descriptors from the parent process hence they all have the same listening socket.
- The main server won't accept any connections; the forked processes will.
class PreForking
CRLF = "/n"
def initialize(port = 3000)
@socket = TCPServer.new(port)
end
def respond(message)
@client.write(message)
@client.write(CRLF)
end
def gets
@client.gets(CRLF)
end
⬇️
First we add a class called PreForking with a constructor that has a parameter port which is 3000 by default, Every new instance of this class will be a different server, Then 2 methods; respond which takes a message and writes it back to the client then writes the delimiter which is defined as CRLF, This is like telling the server when reading data where to stop reading so for example "I love pizza/nhelloworld" will be split into "I love pizza" and "hello world" because when we read we read streams of data so we need to have some agreed upon value (CRLF) to know where to stop. let's carry on
- The ⬇️ at the end of each snippet indicates that the class or scope didn't finish yet; I think personally this approach is better than just adding a big chunk of code
CONCURRENCY = 4
def run
child_pids = []
CONCURRENCY.times do
child_pids << spawn_child
end
trap(:INT) {
child_pids.each do |cpid|
begin
Process.kill(:INT, cpid)
rescue Errno::ESRCH
end
end
exit
}
loop do
pid = Process.wait
$stderr.puts "Process #{pid} quit unexpectedly"
child_pids.delete(pid)
child_pids << spawn_child
end
end
⬇️ The snippet above simply does 2 things:
- Spawn child process
- Monitor the child process and if any die respawn them.
We defined a global variable CONCURRENCY of value 4; this indicates that there will be 4 pre-forked processes, so the main server is going to have 4 kids basically.
Then we head to the run method; this method starts spawning child processes via the spawn child method(next part) and adds their ids in an array. It also forwards any INTERRUPT signal to the children and exits, so if you invoke the run method via terminal and press CTRL + C which sends an INTERRUPT signal to the main server, it will forward that signal to kill all of its children.
Finally It enters an infinite loop where it's always blocking and specifically always blocked on the first line with Process.wait being a blocking operation. What it does is basically Is whenever a child process dies it returns the dead child's id. This helps us in knowing if any of the child processes died. Then in prints to stderr that the process died and proceeds to create a new one.
def spawn_child
fork do
loop do
@client = @socket.accept
respond Process.pid
loop do
request = gets
if request
respond Process.pid
else
@client.close
break
end
end
end
end
end
The final method in the preforking puzzle is spawn_child; this method forks a process from the main process and goes into an infinite loop, it blocks at socket.accept waiting for a client to connect and once a client connects it returns the clients socket into the client variable, then immediately responds with the process id of the process handling the connection. After that It goes into another loop specifically related to the connection it currently has where it listens for data coming from this client and if the client decides to close the connection (sends a EOF) we close the connection and break out of this loop to start accepting new connections again.
This is all what's necessary to prefork and start the server. But there are a couple of things that we can do better
- For each request the child blocks waiting for data from one connection first then after that connection ends the child then proceeds to start accepting new connections, so currently with out design we can have a max of 4 concurrent requests where each process handles a request. Here's where the reactor pattern comes in clutch. Using this pattern we're going to be able to utilize every pre-forked process to the fullest. Each process will be able to listen to not only an ongoing request, but also new connections as well as other accepted connections waiting to write on. Let's see what we can do here
Each pre-forked server will do the following:
The server monitors the listening socket for incoming connections.
Upon receiving a new connection it adds it to the list of sockets to monitor.
The server now monitors the active connection as well as the listening socket.
Upon being notified that the active connection is readable the server reads a chunk of data from that connection and dispatches the relevant callback.
Upon being notified that the active connection is still readable the server reads another chunk and dispatches the callback again.
The server receives another new connection; it adds that to the list of sockets to monitor.
The server is notified that the first connection is ready for writing, so the response is written out on that connection.
We'll add a new class called Connection; this will help us in making each and every new connection separated with it's own methods and variables
class Connection
CRLF = "\n"
attr_reader :client
def initialize(io)
@client = io
@request, @response = "", ""
end
def on_data(data)
@request << data
if @request.end_with?(CRLF)
# Request is completed.
respond Process.pid
@request = ""
end
end
def respond(message)
@response << message + CRLF
# Write what can be written immediately,
# the rest will be retried next time time
on_writable
end
def on_writable
bytes = client.write_nonblock(@response)
@response.slice!(0, bytes)
end
def monitor_for_reading?
true
end
def monitor_for_writing?
!(@response.empty?)
end
end
I'll try my best to explain this next part, we'll update our spawn_child method to be as follows:
def spawn_child
fork do
@handles = {}
loop do
to_read = @handles.values.select(&:monitor_for_reading?).map(&:client)
to_write = @handles.values.select(&:monitor_for_writing?).map(&:client)
readables, writables = IO.select(to_read + [@socket], to_write)
readables.each do |socket|
if socket = @socket
io = @socket.accept
connection = Connection.new(io)
@handles[io.fileno] = connection
else
connection = @handles[socket.fileno]
begin
data = socket.read_nonblock(CHUNK_SIZE)
connection.on_data(data)
rescue Errno::EAGAIN
rescue EOFError
@handles.delete(socket.fileno)
end
end
end
end
end
What we do in the function above is the following: We have a local variable handles that has all the connections related to this forked process, we loop over them and check which ones do we need to monitor for reading and writing. Then we proceed to use select(2) which monitors sockets for reading and writing. Nowadays sys calls like epoll(7) and poll(2) are used because they can handle a much bigger amount of sockets than select.
Readables
Select will block until either one socket is available for reading or writing, whenever a socket is available for reading, we check if it's the servers main socket or a client socket. Since the server uses .accept to accept connections it counts as being readable in select(2). If the socket is the server's one we accept and instantiate a new connection object corresponding to the new client connection, and add the socket file descriptor number as a key to the hash handles we created while having the connection as the value. If it's not the main servers socket then it has to be a client socket ready to read from, reading blocks when there is no data being sent by the client but here we proceed to read_nonblock which basically never blocks, what it does is read a chunk of data according to CHUNK_SIZE specified, if it blocks then it'll just fall through the loop nothing special happens (Errno::EAGAIN) is the exception raised when blocked. If it didn't block then it invokes the on_data call sending it the data received to append to the request instance variable and check if it's a full request or all the data haven't been sent yet. If the client closes the connection EOFError we delete the entry from the hash as if the request never happened.
Writeables
Whenever a socket is available and ready to be written to, we just get it from the hash handles we defined and invoke the on_writeable function which basically writes nonblock from the request instance variable, and if it blocks it will slice what it sent from the request variable and fall through waiting to be ready again.
Finalizing
What we did in this article was combine 2 different web server design architectures into a hybrid model which enhances our scalability and allows more requests to come through. This idea was mentioned in the book Working with ruby but I wanted to try code it to see how everything would end up together looking like. I highly recommend this book even if you don't know ruby you'll benefit a lot. If you made it to here thanks for reading this blog and hope you learned something today, till next time!