Be Driven
  Device Drivers in the Be Os
     
    Kernel Scheduler

Drawing Heavily from the following 2 articles

The Kernel Scheduler and Real-Time Threads
Be Newsletter, Issue 37, August 21, 1996
By Cyril Meurillon

and

BeNewsletter, Volume III, Issue 18; May 5, 1999
High-Resolution Timing, Revisited
By Jeff Bush

Will be reworking this chapter in future, rather than just plain copies of articles

Kernel Sceduler and Thread Priorities

Not much has been disclosed about the internals of the kernel scheduler. It's time to explain in detail how it works, so that real-time application programmers know what they can expect from it and can make better use of it. In this article, I discuss the kernel scheduler as it has been implemented since Developer Release 7.

The scheduler distinguishes between two classes of threads:
"time-sharing" and "real-time."

The thread's priority level determines which category it falls into. The priority range 1-99 is used for time-sharing threads; 100-120 is used for real-time
threads. Within each class, threads are further differentiated using their relative priorities.

A real-time thread is executed as soon as it's ready. If more than one real-time thread is ready at the same time, the thread with the higher priority is executed first. The thread is guaranteed to run without being preempted (except by a real-time thread with a higher priority) until it invokes a blocking
system call (semaphore, ports, snooze, I/O, and so on).

A time-sharing thread is executed only if there are no real-time threads in the ready queue. When this condition is satisfied, a time-sharing thread is elected in a probabilistic manner based on its priority, using a logarithmic scale of base 2.

For example, a thread with a priority 8 is four times more likely to be elected than a thread with a priority 6. A new time-sharing thread is reelected regularly, once every "scheduler quantum" (currently, every 3 milliseconds).
Because the BeBox is a multiprocessor machine, things are actually a bit more complex. But the rules described still apply:

No time-sharing thread is allowed to stand in the way of a real-time thread. For example, if CPU 1 is executing real-time thread A, it's possible for a time-sharing thread B to run on CPU 2. (Note that on a single-processor system, B would have to wait until A blocks.) If real-time thread C wakes up, it
would immediately interrupt B on CPU 2 and run there, without interfering with A.

If you look at the header file OS.h (in DR8), you will see some predefined priorities. With each real-time priority, there's an associated "maximum latency" (described in the comment next to each priority constant). This latency is the maximum time that a thread can afford to sit in the ready queue without being executed. Note that these maximum latencies are *not*
enforced by the scheduler -- these are guidelines, not guarantees. As a developer, you can use the latency guidelines to find the appropriate priority for your real-time thread.

In the real-time class:

B_REAL_TIME_PRIORITY (120) -- latency 0.5-1 ms Used for the most demanding real-time tasks, such as audio. We currently use this one for the threads in the chain feeding the DAC.

B_URGENT_PRIORITY (110) -- latency 5-10 ms Used for slightly less demanding tasks. We use it for MIDI.

B_REAL_TIME_DISPLAY_PRIORITY (100) -- latency 10-30 ms We use this one for the app_server threads poller and mouser, in charge of reading the keyboard and displaying the cursor.
In the time-sharing class:

B_URGENT_DISPLAY_PRIORITY (20) We don't use this one yet. You shouldn't use it either.

B_DISPLAY_PRIORITY (15) This is the priority of window threads, where updates take place.

B_NORMAL_PRIORITY (10) This is the priority of all other system-created threads.

B_LOW_PRIORITY (5) This is for threads that run in the background, without disturbing any other threads. When designing a real-time application, analyze its real- time constraints. If the maximum tolerable latency falls below 0.5 milliseconds, then the BeBox isn't (yet) the right platform for that application. However, for anything higher than that, you should pick the priority with the matching latency.

Real-time threads must limit themselves to operations that truly require real-time attention. A healthy respect for this rule is essential to the proper behavior of the system. Memory allocation, for example, should be done by a normal priority time-sharing thread. The real-time thread should only access buffers that have already been allocated (and locked in memory, if necessary).

Another factor to keep in mind when composing your real-time thread is whether a particular function call will "lock" the underlying system resource or component. Most of the components of the BeOS(TM) are reentrant, and so calls on that component can be used in a real-time application. But some components aren't reentrant; functions that access these components are automatically locked while they're being used.

This can degrade performance. For example, the file system (in its current state) may lock the I/O system for long periods. This can be a problem for certain real- time applications as we cannot guarantee reasonable latency for disk I/O. We're quite aware of these limitations and are constantly working to eliminate them.

If you run into any problem while designing or programming your real-time application, or if you have a question, you can contact me at cyril@be.com.

High Resoloution Timing

The sample code
<ftp://ftp.be.com/pub/samples/drivers/pcspeaker.zip> demonstrates just how fine-grained you can get. The code is a driver for the built-in PC speaker. If you have a file in mono 8-bit unsigned 20k raw format, you can cat it directly to /dev/audio/pcspeaker to play it. In lieu of that, I've included a 'sine' program that writes a sine wave to stdout, which can be redirected to this device. This was tested on a 333Mhz Pentium II. On slower machines, you may have to tweak the values to keep the machine usable.

The PC speaker is a primitive piece of hardware. It basically has two states: on and off. You can drive it from a programmable rate generator, or you can directly modify the state by setting a bit. The classic way to mimic analog signals with this type of speaker is to pulse it very quickly -- faster than the speaker cone can physically move. The amount of time that current is applied to the output vs. the amount of time that it is not applied is controlled by the amplitude of the input signal, causing the speaker position to approximate the actual waveform. The sample driver will use this technique if it is compiled with the PLAY_IN_BACKGROUND macro set to 0. The down side to this approach is that it requires constant attention from the CPU. In order to get any kind of sound quality, you have to shut off interrupts, rendering the machine unusable. This is clearly undesirable.

The sample code takes a rather unique approach to this problem. It programs a timer to wake it up periodically so it can move the speaker cone. Other threads continue to run normally, but at defined intervals, an interrupt occurs and the speaker driver code is executed (at interrupt level) to control the speaker.

To get any quality, the interrupts need to occur frequently; around every 30-60us. Note that, although the machine is still useable, interrupting this frequently cuts down performance quite a bit. This is a rather extreme case, but it does illustrate how fine- grained you can get with kernel timers. You should also note that the standard user space synchronization primitives use this underlying mechanism, and you can now get very accurate delays using system calls such as snooze, read port etc, or acquire sem etc. You don't need to write a device driver to get accurate timing. Note that, although the machine is >still useable, interrupting this frequently cuts down performance >quite a bit.

From the driver level, the timers are programmed by using a new API added for Genki. The declarations for these functions are in KernelExport.h.

Programming a timer in a device driver basically involves calling the add timer function, like so:

 ret = add timer( (timer*) &my timer, 
                  handler_function, time,
                  B_ONE_SHOT_ABSOLUTE_TIMER);

The first parameter is a pointer to a timer structure. You can add your own parameters to this by defining your own structure and making kernel's timer struct the first element, for example:

 struct my timer {
     timer kernel timer;
     long my variable;
     .
     .
     .
};

The second parameter to the add timer function is a hook function that should be called when the timer expires. It has the form:

int32 timer hook(timer *t);

Note that the timer struct that you originally passed to add timer is also passed into this function, so you can access elements that you've added to that struct from the interrupt handler.

The third parameter is the time that this interrupt should occur. How this is interpreted is determined by the fourth parameter. There are three basic modes that a timer understands:

B_ONE_SHOT_ABSOLUTE_TIMER, as the name implies, is a single interrupt that occurs at a specific system time.

B_ONE_SHOT_RELATIVE_TIMER is similar, but allows you to specify an interval rather than an actual time.

B_PERIODIC_TIMER will cause the callback to be called repeatedly at a given interval.

When playing in background mode, the speaker driver determines the next time the speaker cone has to move and program a timer for that time. If it's under 20us, it doesn't actually program the timer. This is because there is some overhead to handling an interrupt. If you program a timer that's too short, you may end up wasting time and be late for the next event you want to handle. Likewise, when it sets up the interrupt, it will set it a little early to compensate for the interrupt latency. A macro called SPIN determines whether the code will spin in a loop when it is early to handle event, or just move the speaker cone early. In the case of the speaker driver, which is obviously not a high- fidelity component, this isn't really necessary. In fact, since these interrupts occur so frequently, machine performance is degraded significantly when it behaves like this. In drivers where timing is critical, spinning like this is a way to be accurate.

A quick note about the implementation of timers. You may be familiar with the way timing is implemented on many operating systems (including BeOS before Genki). Normally, an operating system sets up a periodic interrupt that occurs every couple of milliseconds. At that time, the timeout queue is checked to see if there are expired timeouts. As a consequence, you can't get a dependable resolution of less than the interrupt period. In Genki, the kernel dynamically programs a hardware timer for the next thread to wake or sleep, allowing much finer resolutions. This, coupled with the preemptive nature of the kernel, is what allows the driver to accurately set timers for 30-60us.

When it finishes playing a chunk of data, the interrupt handler releases a semaphore that the calling thread was waiting for. Normally, when a semaphore is released, the scheduler is invoked to wake up threads that may have been waiting for the semaphore. As you're probably aware, rescheduling from within an interrupt handler is very bad. Driver writers must use the flag B_DO_NOT_RESCHEDULE when releasing a semaphore from an interrupt handler. This, by itself, was a limitation, however, because it meant that a thread waiting for some device driver call has to wait until the scheduler is invoked again (for some unrelated reason) before it will be run. This can be several milliseconds later. Generally, you want it to run as soon as possible.

In Genki, an interrupt handler can now return B_INVOKE_SCHEDULER, which call the scheduler immediately after your interrupt handler returns. If you have released semaphores from an interrupt handler, you can return this to make sure that waiting threads are woken up soon. This is especially useful for drivers where low-latency access to hardware is important.

As you can see, there are a number of very useful facilities for low-latency device access and high-precision timing. In addition, many of the generic user-level synchronization primitives have become much more accurate.



For Those Requring Timing Guidance pre 4.5 release..

BeNewsletter, Volume II, Issue 27; July 8, 1998
Outsmarting the Scheduler
By Ficus Kirkpatrick

 


The Communal Be Documentation Site
1999 - bedriven.miffy.org