Threads; read-it

You might have heard from your senior developers the following

Use threads for I/O stuff, it will be faster

Some seasoned dev that you love

And yes they are absolutely right and you might have heard the following as well

Nodejs is a single threaded, non blocking event driven, I/O optimised javascript runtime

Someone from the internet

Or something like this about Python

GIL in python makes threads very inefficient

Fact

And you’re like

This article will dump everything that we know about threads for you to get started. So buckle up and get ready to level up in the art of fast performance?

In the computing world; a thread is the smallest unit of execution that can be managed by whatever operating system you’re running on. Your CPU likes to keep moving with different kinds of tasks that you’ve asked for it to do like make a network request, open a file, read a file, write something, check if the network request is back and so on and you obviously don’t like waiting. So your CPU takes turns doing all these operations. It makes that request that you’ve asked for from reddit, it doesn’t just keep waiting for the result to come back; it opens the file that you’ve asked for and checks back to see if the reddit page is ready or not and so on and so forth. If you consider yourself to be a visual learner, here’s the simplest form of a diagram to explain threads.

Thread (computing) - Wikipedia

Before you understand this, let’s do a quick fact check.

  • Anything that you do in your computer must belong to a process
  • A process must have at-least one thread
  • A process can have as many threads as it is allowed by the operating system

Now, you’re running a program that reads a file from disk and writes random data to another file at the same time and these have no co-relation to each other. Your process; considering that it is not utilizing multiprocessing; will in reality do a single task at a time and switch as it deems efficient. It would read a few bytes from the disk then write some bytes and then get back to reading and this keeps happening until the execution stops. So why are threads good for I/O tasks? Well, when you’re making a network request, you are essentially waiting for the response to come back, you could be doing other things while you wait, and that’s where threads come into play. You can utilize threads to make multiple requests and wait for the responses alternately, you could open multiple files concurrently, the limits are endless until you see the caveats.

Okay, let’s see why we need to be careful about threads when using them and how python GIL is both a blessing and a curse at the same time. Because to me, this helps a lot in understanding how threads behave in a real-life example.

You have a global variable, you want to have two threads that add n numbers to the global variable one by one. Let’s look at some C++ code

#include <iostream>
#include <thread>
int X = 0;
void fun()
{
  for (int i = 0; i < 50000; i++)
  {
    X++;
  }
}
int main()
{
  std::thread thread1(fun);
  std::thread thread2(fun);
  thread1.join();
  thread2.join();
  std::cout << X << std::endl;
  return 0;
}

What is the answer to X? Make a guess, We will wait.

We will compile and execute this and show you the answer.

I bet you did not expect such randomness in our simple code. The issue here is that these threads are running pretty much independently; when thread 1 is reading a value-adding one to it and putting it back, thread 2 is doing the same at the same time and thus both threads end up with different versions of the value of the same memory location and both are rewriting it and this cycle continues and the actual number never gets read by any of these threads. Now the same code but in Python.

import threading
X = 0
def fun():
    global X
    for _ in range(50000):
        X += 1
thread_1 = threading.Thread(target=fun)
thread_2 = threading.Thread(target=fun)
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
print(X)

In python, the result is always the same and this is where the infamous GIL comes to play. GIL or global interpreter lock never lets two threads run at the same time meaning thread 1 runs only when thread 2 finishes and vice versa so they never end up with incorrect values and it works as expected. So we understand what threads are, we know how different languages deal with threads and the caveats of using threads. You probably want to evolve as a developer and use something thats a bit lighter on the os and a little less obtrusive to use. You probably want to know how Nodejs deals with so much whilst running on a single thread. Its auto-magic really. Dorjoy Chowdhury will take over from here to take you scuba diving into the world of “asynchronicity“.

Asynchronous-city in Nodejs

Nodejs is single-threaded, at the same time it’s asynchronous and it also spawns default 4 threads which it uses for some stuff when needed. You can also spawn worker_threads yourself if needed of course. So, how does nodejs actually do I/O asynchronously and why does it need the 4 default threads for? The ride will be a little bumpy from now on and you might be overwhelmed, we urge you to hang tight. Let’s start off with some UNIX(or Linux) terminologies. Unix tries to give us the abstraction of different “types” of I/O through file descriptors (which is just a 32-bit integer to a “process”). For example, a socket, a pipe, a FIFO, a terminal, or a regular file when “opened” returns us the file descriptor so that we can do I/O on that file descriptor. In fact, if you ever want to check what kind of file descriptor your process has, just do a cd on your process id in /proc/<pid>/fd and you will see a ton of file descriptors that have been created.

For these different types of files, we only need to use the same read() or write() system calls provided by the kernel which is “blocking calls” by default if you don’t explicitly add the O_NONBLOCK flag during the open() system call. Something like

int fd = open(pathname, O_RDONLY | O_NONBLOCK);

If we want to monitor large numbers of file descriptors to see if we can read from or write to those files, we can basically open the files/fd (for network I/O, file means socket basically) using the O_NONBLOCK flag and write a loop to read or write from those file descriptors continuously. But if the operations that we want to do, i.e. read and write; are likely to block the main execution, the kernel will not initiate a block, instead, it would throw the EAGAIN error and expect you to try again. Doing so means that we will keep iterating and keep trying to execute blocking operations and keep failing, kind of a conundrum aye!. The kernel will not able to anticipate that and not be able to schedule it off properly and it needs to do that because the kernel typically prefers to schedule blocking calls off until they become active again. Essentially we are wasting cpu execution time. This is turning out to be the infinite hotel paradox problem. What if we turn the problem onto itself. We can leave the work for handling asynchronicity for all these files to the kernel (assuming the relevant os provides us with such APIs) and we don’t need to write separate threads for isolated I/O we do in our programs to gain performance. Unix provides multiple APIs which basically monitor given file descriptors and let us know when I/O is possible on any or some or all of these file descriptors. So, the basic idea is, use os provided APIs to handle I/O and when we see that I/O is possible, we do the rest of the execution (running the relevant callback, i.e. running your javascript code that does something with that I/O result like parse JSON or something). Unix provides select(), poll(), epoll() (epoll is linux specific, kqueue on mac, something in windows) as APIs (these APIs are what provide the non-blocking I/O behavior of programs, i.e. in Nodejs) for monitoring large numbers of file descriptors. Nodejs uses libuv (which is multi-platform) for implementing event loop (doing things asynchronously) and it uses epoll(). Select and poll are limited (older implementations of epoll) because of their implementation and they don’t scale well (for example, select limits the number of file descriptors to watch to 1024) and are not fast enough but epoll is both fast and scales well. Nginx also uses single-threaded non-blocking I/O using epoll. Apache uses one thread per client with blocking I/O. Let’s try to emulate what nodejs/libuv tries to do in its bare metal form with some C.

#include <stdio.h>
#include <unistd.h>
#include <sys/epoll.h> // for epoll_create1(), epoll_ctl(), struct epoll_event | kqueue in macos
#include <string.h>
#include <fcntl.h>
#define MAX_EVENTS 5
#define READ_SIZE 1024
 
int main(int argc, char *argv[])
{
  int event_count, i;
  size_t bytes_read;
  char read_buffer[READ_SIZE + 1];
  struct epoll_event event, events[MAX_EVENTS];
  // creating an epoll fd
  int epoll_fd = epoll_create1(0);
  // open the fifo name given as argument
  int fifo = open(argv[1], O_RDWR);
  event.events = EPOLLIN;
  event.data.fd = fifo;
 
 // control interface for an epoll file descriptor
  if(epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fifo, &event))
  {
    fprintf(stderr, "Failed to add file descriptor to epoll\n");
    close(epoll_fd);
    return 1;
  }
 
  while(true) // running forever
  {
    printf("\nePolling for input...\n");
    event_count = epoll_wait(epoll_fd, events, MAX_EVENTS, 30000); // wait until event is available at fd or wait 30000 ms or 30 s
    printf("%d ready events\n", event_count);
    for(i = 0; i < event_count; i++)
    {
      printf("Reading file descriptor '%d'\n", events[i].data.fd);
      bytes_read = read(events[i].data.fd, read_buffer, READ_SIZE);
      printf("Total %zd bytes read.\n\n", bytes_read);
      read_buffer[bytes_read] = '\0';
      printf("%s\n", read_buffer);
      // here you can execute your “callbacks”
      // essentially this is your javascript function compiled into something easier to understand for your machine
    }
  }
  return 0;
}

Go ahead and execute that.

Do keep in mind that this is Linux only and will not work on Free-Bsd based system like Mac-os.

Here, from the same directory, on one terminal I first create a FIFO using the mkfifo command. Then I run the example program with the name of the FIFO that I just created. After the program starts, on the other terminal, I just write to the FIFO (using echo and ls) and we see the output in the other terminal. You can do this for sockets or pipes, any number of sockets or pipes or fifos or other epollable files! You can basically add “epollable” file descriptors using the epoll_ctl() as many as you want and if there are any file descriptors that are ready for I/O according to which flag you used to add the file descriptor to epoll (EPOLLIN, EPOLLOUT), you will get the file descriptors list in the events array if epoll_wait() returns. You can use sockets, fifo, pipes to monitor using epoll except for regular files(the ones that you read from disk or write to disk). You will get an EPERM error when you do epoll_ctl() on a regular file descriptor. This kind of makes sense, because sockets or pipes are such that one process reads from it and another one writes to it. But regular files are a bit different in the sense that they are always ready for reading or writing. Regular files don’t block in the sense that one process has not yet written to the file so our read is blocking because there’s nothing to read. (technically, you could “block” read for a regular file by creating a “mandatory lock” on the bytes you are trying to read from the file). But this doesn’t mean that reading or writing to a file is not time consuming(slow disk). So, it would not make sense to read or write to a regular file in the main thread. So, how could this also be done in some kind of asynchronous way? Well, there is a Unix trick where you create a pipe in the main thread and you add the read end of the pipe in epoll. Then, you create a thread where you actually read from the file and the bytes you read, you write to the write end of the pipe from the thread. Now, because the read end of the pipe was added in the epoll instance, the epoll_wait() will return when the read end of the pipe becomes readable. Remember, nodejs has 4 default threads? This is one use case where nodejs uses threads for. Some other use cases are, for dns resolving (os doesn’t provide asynchronous way to resolve dns so nodejs uses a thread for it) or doing compute heavy things like the crypto module from nodejs etc. The reason for using threads for some cases is obviously to not block the main thread for any indefinite amount of time of course.

So basically, the way nodejs gives us an asynchronous way of doing things is by just using the asynchronous api provided by the os in a smart way so that the main thread doesn’t block itself and we don’t need to use threads ourselves.

Nodejs prefers using the epoll mechanism instead of threads but if you need to, for some weird reason (we don’t judge) you can use worker_threads too. The issue with everything that we’ve tried to discuss until now is that its not available as a single source of truth, so take everything with a tiny grain of salt. Have fun, learn to grow, grow to learn.

Md Sakibul Alam

Software Development Engineer I

Published: July 17, 2023

UPD: July 20, 2023

Get In Touch

Map

The power of Technology

is now at your disposal

Address:

House # 272, Lane # 3 (Eastern Road)
DOHS Baridhara, Dhaka 1206

Phone:

+880 1730 3586 45

Email:

Our Experts are Ready to Help You

    captcha

    Contact Us Directly

    Address:

    House # 272,
    Lane # 3 (Eastern Road)
    DOHS Baridhara, Dhaka 1206

    Talk to Us

    +880 1730 3586 45