Improve Linux Real-time Performance in Multicore Devices with Light-Weight Threading
With light-weight threading, Linux comes closer to the goal of operating as well as traditional RTOS even in the most serious telecom/networking applications.By Michael Christofferson, Enea
There has been much focus in the last decade on improving Linux real-time performance and behavior, most notably the PREEMPT_RT Linux real-time extensions. And more recently there has been much work on Linux user-space solutions for multicore devices that enable direct access from user space to underlying hardware, thereby avoiding additional overhead of involving the Linux kernel in user-space applications. These user-space extensions (and there are several) have been primarily driven by the telecom/networking high performance IP packet processing market for so-called “bare metal” implementations, wherein a Linux user-space application in a multicore device can mimic the performance of an “OS-less” solution, namely a simple run-to-completion, polling loop on each core for packet processing. And while that goal has been essentially met, the solution is still for a very special use case.
Are there other use cases that demand performance improvements not completely addressed by the above? If so, then what are these use cases, and are there further Linux real-time improvements that can be applied? The answer is “yes,” with Linux user space light-weight threading (LWT). So let’s examine the issues with respect to real-time Linux, and how light-weight threading can be a solution for some applications. The focus here is driven by telecom, networking, or general communications applications, on which Enea focuses its technology. But overall, this focus on light-weight threading could be of benefit to many markets.
Real-time Linux and the Problems It Solves
Over the last 10 years, Linux has made some significant improvements in real-time performance and behavior that address a wide range of applications. These are summarized as follows:
Perhaps the most notable achievement of real-time extensions for Linux, the PREEMPT_RT package solves a particularly nasty problem in Linux for multicore devices, namely “interrupt latency.” There is very high overhead in servicing interrupts in the Linux kernel before passing the event/data to the real user-space applications – this overhead tends to delay other interrupts, therefore increasing the overall latency as measured by the time the interrupt occurs to the receiver of the information of the interrupt for processing. Likewise, there are many so-called “critical sections” in the Linux kernel wherein interrupts are disabled via spin locks. The overall interrupt latency from the standard Linux kernel does not match the most serious interrupt latency requirements for many real-time applications, especially in radio access networks (mobile) and mobile core infrastructure that demand worst-case interrupt latency in the range of 20-30 microseconds. And this applies to many other market applications. In a quick “nutshell” PREEMPT_RT solves this problem by:
- Transforming all device-driver interrupt handlers into schedulable threads, so that Linux kernel interrupt-level processing is minimal and so that new interrupts may be serviced without waiting for the previous interrupt handling to be completed. Interrupt handling then becomes priority driven, with the highest priority ones completed first as per the user desires.
- Transforming all the dead space spin-locks in the Linux kernel into mutexes that then allow other kernel threads to run in lieu of the kernel space spin lock.
Basically, PREEMPT_RT has had some success by reducing overall interrupt latency to very high performance real-time standards, and this helps very many Linux applications. Which ones? Read on.
User-space Linux adaptations
As mentioned above, recently there has been much work on Linux user-space applications. The idea is to allow user-space applications, where all the effort is placed by Linux users on their value add, to avoid the overhead of the Linux kernel itself for some specific device/interrupt interactions. Linux has a model that provides much protection of the user-space application from the kernel, wherein all user-space operations, including threads, always map to the Linux kernel for processing its requests for I/O. This gives Linux its robust behavior and characteristics. But even with PREEMPT_RT, for very high data-processing performance applications, Linux falls short because a Linux kernel context switch is always needed for accessing the hardware directly. User-space Linux implementations give the application direct access to HW and interrupts without the involvement of the Linux kernel, with a tremendous gain in performance. But this performance is only gained in very high I/O-intensive environments. Most Linux user-space adaptations focus on single-threaded applications, like high-performance packet processing, wherein there is only one thread under Linux used to emulate “OS-less” performance in multicore devices.
The Multi-Threading Issue
What is missing from the real-time Linux solutions survey is a serious examination of the usefulness of multi-threading in real-time embedded applications. Long before Linux came along – in fact, in the early 1980s – there arose the need for embedded real-time operating systems (RTOS) designed for low-latency, high-throughput, seriously real-time applications. The OS landscape has changed but the requirements have not. These RTOS solutions featured the kinds of performance, behavior and characteristics that Linux has been trying to catch up with for the last 10+ years. This is not a pitch for the return of the RTOS, as good as they were. The overall Linux value in real-time embedded solutions in terms of portability, vast ecosystem of applications and device support and general support is unmatched by any RTOS. There are two real questions:
- Why is multi-threading important?
- If multi-threading is important, then how do we add RTOS multi-threading performance, behavior and characteristics to Linux so that we can raise the bar? The key is to understand the Linux multi-threading implementation versus RTOS, and then see what can be done.
Why is Multi-threading Important?
Multi-threading requirements for real-time arose over 30 years ago as computer solutions software designers were facing complex issues that could not be solved by single-threaded solutions. Solutions that required that a single application had multiple tasks, perhaps some computational and some I/O-driven, but all closely coupled in terms of the overall execution of the task. But multiple tasks in a closely coupled environment means that there should be some sharing of CPU time for overall CPU utilization effectiveness. In many such applications, some operations had to be blocked, waiting for some I/O event or other communication from another application. So simple executives that could handle multiple threads with thread blocking and with low-latency communications amongst threads arose.
Not all real time applications require significant multi-threading support. This article does not attempt to categorize all of those. But clearly among these applications that do require this are any kind of complex protocol that induces “wait-states” in the protocol – i.e., wait for a response or an event that allows the application to proceed. In lieu of that response or event, then the application should cede control of the CPU to allow other similar threads to run.
So perhaps the above tutorial sounds simple to many of you. The important thing to note is that many, many providers of mobile infrastructure and core network equipment have come to the conclusion that while Linux is the choice for current or future systems, Linux as currently constituted does not quite measure up. Why not?
Linux Multi-threading with PTHREADS
Pthreads was created by the Portable Operating Systems Interface (POSIX) initiative from IEEE to address the high-performance, multi-threading problem in Unix, and hence adapted by Linux which is in its earliest form, a portable Unix implementation for enterprise and now for embedded.
The pthreads model was created to address the problem of the original Unix Fork/Join model for creation of Unix “child” processes. The Unix process model is very heavy weight, as it involves creation (and potential deletion) of whole memory-protected environments as well as an execution mode. A lighter-weight model for multiple threads under Unix was needed; hence pthreads.
But the Unix (and hence Linux) model was designed for complete separation of the kernel and the user-space applications, one of its advantages in protection, security and reliability over other implementations, including RTOS over the last 10 or so years. In essence, this means that every pthread in Linux user space is mirrored by a Linux kernel thread for all or most Linux system-call and especially device-driver access from user space. But user space is where virtually all embedded Linux real-time applications reside as OEMs build out their products without GPL contamination. So in every case, use of pthreads involves the invocation of Linux kernel, adding extra overhead of what could possibly be a native implementation.
But wait a second, you say. What about the Linux real-time extensions mentioned above? Well, PREEMPT_RT addresses many issues inside the Linux kernel with respect to responsiveness, but it doesn’t really address multi-threading. User-space Linux implementations address the device driver/interrupt performance issues, but they don’t really doesn’t address the multi-threading issue. Linux real-time containers address some of this, but real-time containers are simply a user-space Linux virtualization technique above standard Linux that doesn’t really address the fundamental multi-threading issue.
Light-Weight Threading (LWT) – The Real Solution for Complex Linux Applications
There are many light-weight threading models that have been proposed for Linux, but none of them have really caught on. Why? Because most of these are not very robust. What is really needed for the next-generation Linux solutions that involve complex multi-threading applications is a completely new Linux model for user-space Linux applications. This solution, called Linux light-weight threading (LWT), is outlined below (Figure 1). Put a high-performance, low-overhead, multi-threading scheduler in Linux user space, over a single pthread. Why?
- Pthread overhead
- Processes and pthreads are the only known scheduling entities that Linux knows.
- The LWT pthread is simply a Linux code execution context for the permanently running pthread. The pthread never gets suspended as the user-space scheduler maintains control – except in power-save scenarios. This is another topic outside this article.
- This user-space scheduler will run and operate exactly as some of the traditional RTOS high-performance, low-latency implementations without any involvement of the Linux kernel.
- The implementation takes advantage of the new user-space Linux implementations for direct hardware access. Again, no Linux kernel involvement.
An LWT solution as described above, will deliver dramatic performance increases in any Linux real-time application. Enea has done some prototypes of the LWT described above that show over 10x the performance compared to Linux pthreads on scheduler overhead, specifically with regard to context switching and inter-thread messaging/communications latency.
But above and beyond scheduling performance and inter-thread communications, what should a LWT solution bring? There is more to the LWT concept than just .superiority over Linux ptheads in performance (Figure 2). What about the concept of robustness of the solution? The following additional Linux constructs are also needed as time-honored RTOS real-time solutions:
- Deterministic scheduling
- Low scheduling overhead – cheap context switches
- Low inter-thread signaling overhead
- Cheap thread creation
Architectural View of the Linux Light-Weight Threading Model on a Multicore Device
The architectural view of an LWT implementation is as follows. A Linux process that involves a whole shared memory space may span many multicore cores. The LWT model, for maximum efficiency, requires a single pthread within a Linux process to be locked to a core, but that is not specifically required. Once the LWT is locked to a pthread, it can migrate to any core that Linux SMP desires.
|Figure 1: Architectural view of the Linux light-weight threading model on a multicore device|
Efficient light-weight threading (LWT) is the next Linux real-time performance and behavior issue. Again, not all real-time applications need a powerful LWT-like solution. But some, especially in telecom/networking and especially those that need some of the complex networking protocols in radio access networks, mobile infrastructure core/edge, or any other markets that that have similar real-time requirements, could benefit from Linux light-weight threading – the next-generation Linux real-time extension. Again, the entire history of real-time embedded Linux has been to prove that Linux can operate as well as the traditional RTOS solutions. Linux has made some strides, but from this author’s perspective, Linux in the most serious telecom/networking applications is not quite there yet. But perhaps with Linux light-weight threading we are getting closer to the goal. In conclusion, one focus of the Linux real-time embedded industry is for solutions for the hardest real-time applications. This goal is is depicted in the following graphic:
|Figure 2: Light-weight threading and Linux concept – best of Linux and RTOS|
Mr. Christofferson, Enea director of product marketing, has over 30 years’ experience in software development for deeply embedded telecom or networking systems. He spent the first 8 years of his career in the defense industry in SIGINT/COMINT systems. That was followed by 9 years in the telecom market working with such technologies as packet switching, SS7, SONET, fiber in the loop and DSL. For the past 16 years, Mr. Christofferson worked in product management, marketing and business development for leading industry RTOS, embedded development tools and middleware providers such as Microtec, Mentor Graphics and now Enea, for whom he has served since 1998.