Programming

METRICS

I have programmed in many languages and systems. Metrics can be ascertained for all of these but most programmers won’t know what they mean. This is especially true with my style of programming. In all languages I try to realize polymorphism by moving anything that can be declarative into tables. Frequently the resulting program is radically non-redundant, more flexible and much smaller than typical programs. Without a functionally equivalent typical program for comparison, it is difficult to understand what the metrics mean. For example, is my table-driven single statement (Abbott CD3200-Algorithm Transformation) trivial compared to the 2000-statement, 16-level deep sequence of nested functions that it replaces? Nevertheless, metrics can tell something about the experience that informs and motivates my current methods.

I don’t routinely run statistics on my code but when others have done so, my code is highly rated, primarily because I comment more than most programmers. More revealing than the quantity is the high quality, as indicated by universal praise from other programmers who have to read my code. My code also gets good grades for its unusually low cyclomatic complexity, the result of my separation of procedural and declarative components. Any other statistics don’t necessarily imply quality. Class size is meaningless but the number of methods and the number of statements per method can be meaningful because the superfluous setter-getter paradigm produces many methods with a small number of statements. Class count and amount of work done by class methods as opposed to non-class functions strongly depend on the application.

I have selected for analysis only relatively large C/C++ programs that I have written entirely myself. This leaves out some significant work, such as Hitachi 747 and Abbott CD3200, where, as software lead, it was my responsibility not only to originate nearly everything but also to pass it on to others as soon as possible. For the purpose of this analysis I have excluded all source files, such as headers and libraries, that I didn’t write, even if they are essential to the program. The shortest summary of the metrics of these programs by project is:

SUMMARY

Project	Files	Lines	Statements	Functions	Classes
Abbott Instrument Dev Sys (C, C++)	285	100,908	38,632	1,442	51
IDT USB dev/demo (C++, C)	67	24,184	9,935	350	11
Elo WinCE driver and apps (C++)	104	22,587	9,530	334	90
Dataman (Win MFC C++)	59	20,958	8,387	267	34
Elo XP Utilities (Win32 C++)	26	8,218	3,912	147	10
Total	541	176,855	70,396	2,540	196

I have used the SourceMonitor program from campwoodsw to analyze my source code. I have configured it to not include blanks in line count. I have selected a subset of its metrics and abbreviated the names to fit into small tables. They are:

Files	source (c, cpp, h) file count
Lines	total number of non-blank lines (in all source files)
Stmnts	total number of statements
%Br	percentage of statements that cause a break in the execution path
%Cmt	ostensibly the percentage of commented statements but block comments are included
Cls	total number of classes defined in the source files
M/C	average number of methods per class
S/M	average number of statements per class method
Max Comp	highest complexity (only number of execution paths) of any one method or function.
Avg Comp	average complexity
Funcs	total number of functions

For suites of cooperating programs, metric totals may have some value. This is especially true of my WinCE driver group for Elo. These programs are all C++ and comprise a unified package. The serial and USB drivers are built as distinct programs but are statically linked to the agnostic driver base, which provides 75% of each driver’s total code. To report totals, Files, Lines, Stmnts, and Funcs are simply summed; for %Br, %Cmt, M/C, S/M, and Avg Comp I compute an average of each program’s metric weighted by its statement count.

ABBOTT INSTRUMENT DEVELOPMENT SYSTEM

Abbott Instrument Development System Page

Win32 GUI and DLL both in C++ combined for metrics

Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Max depth	Avg depth	Avg comp	Funcs
165	60,606	21,282	23.2	57.5	51	4.47	4.8	287	9	1.76	5.78	808

Windows XP-SYS driver in C

Files	Lines	Stmnts	%Br	%Cmt	Funcs	Avg S/F/C	Max comp	Max depth	Avg depth	Avg comp
11	4,763	1,423	19.6	54.7	68	25.5	48	6	1.42	5.38

EMBEDDED CONTROLLER in C and 68K ASM. Metrics only for the C portion. This program implements the robot mesh operating system that I designed and doesn’t use any commercial OS or framework.

Files	Lines	Stmnts	%Br	%Cmt	Funcs	Avg S/F/C	Max comp	Max depth	Avg depth	Avg comp
65	20,464	8,741	25.5	50.9	315	31.5	72	9+	1.72	7.09

Script compiler in C, YACC/Bison/BNF, and lex/flex. Metrics reported only for C files, excluding C embedded in the y (yacc) and lex files.

Files	Lines	Stmnts	%Br	%Cmt	Funcs	Avg S/F/C	Max comp	Max depth	Avg depth	Avg comp
44	15,075	7,186	23.7	41.9	251	32.4	95	9+	1.72	7.44

IDT USB DEVELOPMENT/DEMO KIT

IDT USB Development/Demo Kit Page
This comprises two programs, a C++ Win32 GUI IDE and firmware written in C and some ASM for an ARM-based NXP LPC1342 controller. The two programs cooperate to perform advanced DSP touch input analysis and are essentially one program.

C++ Win32 GUI

Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Max depth	Avg depth	Avg comp	Funcs
38	15,779	6,648	26.2	47.0	11	7.30	3.6	106	9	1.94	6.49	211

C NXP LPC1342 (ARM M3) Firmware

Files	Lines	Stmnts	%Br	%Cmt	Funcs	Avg S/F/C	Max comp	Max depth	Avg depth	Avg comp
29	15,779	3,287	18.4	50.0	139	16.1	61	9	1.45	4.07

ELO WINCE USB-SERIAL TOUCHSCREEN DRIVER SUITE

Elo WinCE Unified Driver Page
DRIVER GROUP. EloDev is the base class library, statically linked into the EloUsb and EloSer driver programs. nvCal is a single file included separately in the two drivers. EloBeep is an optional driver library component. EloCmn and EloApi are shared by drivers and applications.

Project	Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Avg comp	Funcs
EloDev	22	5,549	2,173	23.6	53.8	9	15.88	8.7	48	4.01	37
EloUsb	15	2,443	869	16.3	55.7	8	6.86	2.3	18	2.63	41
EloSer	18	2,247	943	17.0	49.6	8	6.29	5.5	30	3.37	26
nvCal	1	361	168	13.1	50.1	1	3.00	1.0	13	3.70	7
EloBeep	2	236	85	31.8	55.1	0	0	0	9	4.00	7
EloCmn	15	1,834	829	1.2	39.7	54	4.10	1.5	5	1.11	22
EloApi	2	950	307	18.2	47.3	1	1.31	5.8	13	2.54	16
Totals	75	13,620	5,374	17.3	50.73	81	5.85	3.1	48	3.13	156

ELO WINCE DRIVER SUPPORT APPLICATIONS. EloVa provides screen calibration. EloTalk is essentially an IDE, providing many utilities for OEM developers in one program. EloCpl is a Windows control panel. EloConScript is a script player combining a scripting language that I designed with Elo’s standard touchscreen configuration language. I have computed totals for these because they are part of the cooperative suite but they are fully independent, unlike the driver group.

Project	Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Avg comp	Funcs
EloVa	8	2,312	1,039	25.2	42.0	0	2.00	4.8	41	6.83	36
EloTalk	15	6,136	2,906	27.6	42.3	9	2.14	3.4	56	5.35	132
EloCpl	5	360	138	34.1	56.4	0	0	0	17	5.71	7
EloConScript	1	159	73	19.2	49.1	0	0	0	10	6.67	3
Totals	29	8,967	4,156	27	43	9	2.1	1.6	56	5.8	178

DATAMAN WINDOWS MFC C++

Dataman Flash Screencast
I released this as freeware. It is a cytometric big data analytics tool. It reads FCS (Flow Cytometry Standard) data files and displays user-selected combinations as histograms or scatter-plots. I originally wrote this for Win32 API but ported to MFC to add floating dockable toolbars. All code provided by MFC is excluded from these metrics. The relatively high M/C and S/M values are motivated by the application, not by MFC. The code changed very little when I ported to MFC.

Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Max depth	Avg depth	Avg comp	Funcs
59	20,958	8,387	21.5	47.5	34	11.96	9.5	107	9	1.62	4.32	267

ELO WINXP APPLICATIONS

ELO Configuration Programs Page
EloUpdater and EloStudio are both C++ Win32 API programs that can be invoked in non-GUI mode for interactive command line or scripted operation.

EloUpdater updates the firmware of touchscreen controllers. It is more complicated than might be expected because it has to assemble a unique combination of program pieces from its internal database based on an analysis of the current firmware (the reported version number is often wrong) of the target.

Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Max depth	Avg depth	Avg comp	Funcs
14	3,694	1,985	24.1	28.9	6	2.20	2.5	64	8	1.81	5.97	88

EloStudio is a scripting system for testing and configuring and updating the firmware of touchscreen controllers. It is essentially an IDE for developing, debugging, and executing scripts written in a language that I developed for the unique requirements of this application.

Files	Lines	Stmnts	%Br	%Cmt	Cls	M/C	S/M	Max comp	Max depth	Avg depth	Avg comp	Funcs
12	4,524	1,927	15.5	36.6	4	1.50	2.0	69	9+	1.29	5.47	59

PRODUCTIVITY

The most important determinant of productivity is the amount of time spent waiting vs. thinking, developing, and testing ideas. Minimizing repetitive tasks is the most effective way to improve this ratio. For a programmer, this means minimizing build time and scripting anything and everything.

When I took over Elo’s WinCE program I inherited a one-hour build time. The productive code development and test time averaged only three minutes. Even though I was under extreme pressure to solve a customer problem and the code changes appeared simple, I refused to work on the code for three days while I figured out how to reduce the build to 15 minutes. This afforded me 30 coding cycles per day instead of only eight. By the end of my fourth day, I had achieved only 30 cycles vs. the 32 that I would have achieved without my up front effort. But on the fifth day I had achieved 60 cycles vs. 40. When I later reduced the build time to two minutes, the three-minute coding effort dominated the cycle, effecting a quantum change in the development process to essentially continuous problem solving.

SCRIPTING

There is no consensus on the general definition of scripting. Everyone would agree that bash and bat are scripting languages for Linux and Windows and any “program” written in them is a script. Most would agree that Perl and Python are scripting languages but they are not only more portable but also more abstract and generic. Being interpreted instead of compiled doesn’t make a program a script; Java byte codes are interpreted but the same program can also be compiled.

I define scripts as programs that automate computer user operations, typically procedures that the user would otherwise do interactively. I mix languages in my scripting. For example, when I started as the software lead on Abbott’s CD3200 I initially wrote instructions for programmers to follow to properly use the PVCS version control system to avoid stepping on each other’s toes. But then I realized that, as long as they were directly interacting with PVCS they would not change their own procedures. However, by scripting everything they needed I could make their job easier even while enforcing my new rules. This could not be done without generic string processing, which bat can’t do. We were using Polymake to build our programs and it offered much of the capability of Perl (but in its own language) so I used it for my scripts. For missing capabilities, I wrote bat scripts and OS-specific compiled C programs. The programmer would invoke a bat script, which would invoke a Polymake script, which might invoke other bat scripts and my compiled utilities.

Windows bat and Linux bash have odd syntax due to their evolution from more primitive forms (in the case of bash, from three different languages). However, I use them when I can because they are likely to be present in a consistent form in any computer. This is always true for bat. It is true for bash in Linux distributions but, notably, not the Android kernel, which has only the very primitive ash shell.

Windows bat is much less capable than other languages in current use. However, it is not as limited as commonly believed. Its lack of general string processing is significant but it can parse strings in the limited but important context of files and directories. For example, in my Elo WinCE release scripting system, the statement for /F "tokens=1,2" %%p in (..\Specific\projList) do (... parses the project names from the name/version tuples listed in the projList file, e.g.
eloUsb 1.25 eloSer 1.22 eloTalk 1.10 eloVa 2.40

The release scripting system comprises a couple of dozen bat files totaling more than two thousand lines of code. It automates the entire process from checking out all sources from version control, building multiple programs, libraries, and drivers for four different CPUs in three versions of the OS, and packaging everything into a distributable file. It could be argued that such complexity should not be handled with bat but my partitioning into independent domains significantly reduced the complexity and, coincidentally, created opportunities to share domain-specific scripts with customers, for which bat is a good choice.

I write scripts whenever it will save me time in the long run. I spent a month developing the Elo WinCE release scripting system but I replaced a two-week interactive task with a two-hour walkaway. I often write small scripts that just make my routine work easier. For example, in both Windows and Linux I often work at a command line. Linux does everything it can to prevent root from working in a GUI. I find it distracting to have to type long path names to change the directory, especially to return to one recently visited. Windows and Linux both have the simple change directory command cd. This is easy to use but provides no assistance. They also have pushd and popd, which can simplify retracing the directory change path but it is hard to remember to use pushd instead of cd and they have no forward retrace, which I want almost as often as backward. I would also like to be able to select a directory at random from the record. An additional problem in Windows is that the command window title doesn’t show the current directory. This is very annoying if several of these are minimized.

To solve the directory path problem in Linux, I wrote a bash script that performs all of the functions of cd, pushd, and popd plus my improvements. The central feature is a circular path list, which enables forward as well as backward traversal. The startup bashrc (e.g. Fedora /etc/bashrc, Ubuntu /etc/bash.bashrc) invokes my cdlist script, which only defines functions, one of which is called cd. Subsequently invoking cd executes this function. This accepts a few arguments that would be illegal to cd. For all other command lines, it silently pushes the current directory onto its own list and then uses builtin cd to invoke the intrinsic version. The arguments -b and -f step backward and forward through the circular path list. The argument -s displays the paths in a numbered list for single key random selection.

d.bat

if _%1 == _? (
echo D replaces CD, PUSHD, POPD
echo D makes directory title
echo D / is popd
echo D dir is pushd dir
    set /p dummy=Press Enter to close
    goto done
)

if not _%1 == _ (
    if not _%1 == _. ( 
        if _%1 == _/ (popd) else (pushd %*)
    )
)
title %CD%
:done

My bash solution does everything I wanted in Linux but Windows bat scripting is too primitive for the circular path list. However, a very simple script addresses the window title and usage complexity problems. In Windows it is not possible to override cd but I named the script d.bat (as in directory) so it is easy to remember and it replaces cd, pushd, and popd. With no arguments, d sets the window title. Argument "/" invokes popd; any other passes the command line to pushd. In all cases, the window title is assigned the destination path.

OBJECT-ORIENTED PROGRAMMING

The purpose of object-oriented programming is to make programs that are easier to maintain. Object-oriented languages help by providing standard syntax for many OO concepts but they are not required for nor do they automatically produce object-oriented programs. I wrote my first OO program, an in-circuit emulator IDE, in 6502 assembly language. As I describe in Hybrid Tool For Universal Microprocessor Development published in Computer Design, I implemented polymorphic functions using target CPU-agnostic code with indirectly addressed (i.e. “this”) CPU-specific data tables (i.e. objects). In contrast, the common practice of “modernizing” an old and clearly not OO program by redefining every global as a class with trivial setter and getter methods does not miraculously transform the program.

Classes are at the core of OO programming but most programmers don’t really design classes; they derive application-specific classes from existing ones often defined by a framework, such as STL, Android, or iOS. This makes code reuse simple, reliable, and predictable but it does not yield the game changing breakthroughs that are possible with object-oriented analysis. For example (BM-Hitachi 747-Error Reporting) I solved Hitachi’s immediate “buzz off” problem by recognizing that it was symptomatic of an endemic problem that required a comprehensive OO solution. My generic alarm class enabled divorcing control domains from presentation. Not only did this solve the thorny buzz off problem but also many persistent user complaints, some of which I didn’t even know about until users started asking me how I knew that they had been pestering Hitachi for years to solve them. I wrote that program in C.

Object-oriented design is not necessarily limited to single programs. Abbott’s instrument control languages were at once both promiscuous and rigid because they treated each specific device as an independent fully-realized entity. I redesigned the entire system around the concept that every device was an instance of an abstract class. When I also redefined major operations as table-driven procedures, these too became classes. These changes allowed the scripting language, compiler, and execution units’ firmware to only have to know how to deal with a relatively small number of classes in order to handle an infinite variety of objects without redesign. See Abbott Instrument Development System>Scripting System. Despite the obvious object orientation of my new design, I wrote only the user interface program in C++. I wrote the script compiler in C with Bison-BNF grammar parser and FLEX-RE scanner. I wrote the instrument configuration compiler (with my own ad hoc parser) in C and execution unit firmware in C and 68K ASM.

Object orientation can be useful in declarative program design. In particular, inheritance can simplify defining variations without restricting more complete definitions. This is especially evident in my IDT development/demo configuration language, which allows unrestricted combinations of characteristics to be named and used directly or as base classes for other combinations. Recursive multiple inheritance with unrestricted override of individual characteristics enables named combinations to be reused even when they are not entirely appropriate, reducing the proliferation of nearly identical objects.

My Elo WinCE release package polymorphic scripting system supports a very large combinatorial space by defining mutually virtual domains. Without this object-oriented design, each release would have to be fully defined either by an enormously complex script or by a less complex script in combination with a BLOB database manager. Both of these would be far more difficult to implement and maintain and would not provide any support for automation within any single domain. My polymorphic design creates domain-specific scripts that are useful independently of the release process.

C++ CLASS DESIGN

RATIONALE

Some elements of C++ are good in all contexts. For example, the member access syntax of inheritance is always better than that of aggregation. Other elements are good when used properly and can be very bad otherwise. Most of these are hiding mechanisms, which a good programmer may use to ensure the integrity of a class but a bad programmer will use to hide their mess. I have seen many programs written by other people and have encountered relatively few of the kinds of problems, such as uninitialized objects and misuse of data by ignorant functions, that Stroustrup claims to be endemic. The worst and most widespread problem is cut-and-paste coding at all levels, from repetition within single statements to nearly identical complete functions. At least when this can’t be hidden there is some hope of seeing and correcting it but if hidden by access and virtual functions, it will never get fixed.

I define access functions to member data only if simple access is not possible or alternative implementations are reasonable, for example a bit- or byte-array for boolean collections or an aliased structure-array (providing efficient array and safer structure access).

Virtual functions are the poorest mechanism for polymorphism and I use them only when I can’t define an equivalent algorithm or table-driven computation. For Abbott’s CD3200 I transformed a 2000-statement, 16-level deep sequence of nested functions into one statement with a small table. The object-oriented approach would have transformed this into a 3000-statement class with virtual functions. In many of the situations where virtual functions might be useful I need object- rather than class-specific functions. In my Hitachi 747 user interface I define each screen with an array of structures, each defining an element, including element-specific as well as class-specific functions to call in response to user action on that element.

Templates have often disappointed me. Even when I have a situation that calls for generics, I often find that the template mechanism is too constrained. For example, in my Elo WinCE touchscreen driver API, functions that make sense to the application are created from structure-based primitives. It is impossible to predict all of the application-specific functions that OEM developers might want but they fall into a limited number of generic patterns. Templatizing these patterns would simplify the effort of creating useful functions. However, templates can’t make a generic constructor or anything involving a variable number of parameters. I used macros, which not only provided the required flexibility but also were simpler and easier for the developers to understand than templates.

I define constructors and destructors only when they serve some purpose beyond the red herring notion of rampant uninitialized data use. Stroustrup recommends defining constructors even for classes with a single intrinsic data memory to ensure initialization. Passing such an object to a function requires its construction. Instead of simply pushing a literal onto the stack, the program has to allocate heap memory, assign a value, and then read this value back to push it onto the stack. This incurs a horrible run-time penalty but it doesn’t interfere with good program design. This is not always the case. A type with an explicitly declared constructor cannot be used in a statically initialized table. This prevents separating the procedural and declarative aspects of a program, which is a far more effective means of achieving polymorphism than anything suggested by C++. For example, at Hitachi a programmer who did not know any assembly language was able to diagnose an error in one of my assembly language drivers by spotting an inconsistency in the table pattern. See BM/Hitachi 747: Communication and Coupling and my “Dr. Dobb’s Journal” article Software Partitioning for Multitasking Communication.

Many programmers use exceptions excessively, even within a single function, apparently to avoid using goto. Exceptions are complex and incur a significant run-time penalty. Extra information must be put on the stack. Only classes can be thrown. These classes must be defined. Other programmers must examine these definitions. When a class is thrown, it must first be constructed. When the catcher goes out of scope the thrown object must be destroyed. There is a reason for this complexity. Exceptions are not intended as routine control flow mechanisms but as a means of reporting errors from domains that don’t know the error handling context, for example from a library function. This is how I use them.

SMALL EXAMPLES FROM MY CODE

I have designed hundreds of general and application-specific classes. Most of these are too large to show as examples and snippets would be meaningless. However, some short examples address issues that I consider important, such as whether wrapping a global in a trivial class could ever be justified (it can), when are explicit constructors and destructors of real value, and appropriate use of setters and getters.

Wrappers

class CriticalSection : public CRITICAL_SECTION
{
public :
    CriticalSection( void ) { InitializeCriticalSection( this ); }
    ~CriticalSection( void ) { DeleteCriticalSection( this ); }
    void enter( void ) { EnterCriticalSection( this ); }
    void leave( void ) { LeaveCriticalSection( this ); }
};

class Semaphore
{
public:
    HANDLE hSem;
    Semaphore( long initialCnt = 0, long maxCnt = 32000, 
      LPCTSTR name = 0 ) 
    {   hSem = CreateSemaphore( 0, initialCnt, maxCnt, name ); }
    ~Semaphore() { CloseHandle( hSem ); }
    operator++() { ReleaseSemaphore( hSem, 1, 0 ); }
};

enum { 
    REGKEY_NOCREATE = 9876, 
    REGKEY_NONVOL = REG_OPTION_NON_VOLATILE, 
    REGKEY_VOLATILE = REG_OPTION_VOLATILE 
};

class RegKey 
{
public:
    HKEY    key;
    RegKey( LPCWSTR name, DWORD create = REGKEY_NOCREATE );          
    ~RegKey () { RegCloseKey( key ); } 
private:
    bool open( LPCWSTR name, DWORD create = REGKEY_NOCREATE )
    { 
        if( RegOpenKeyEx( 
            HKEY_LOCAL_MACHINE, // HKEY hKey    
            name,               // LPCWSTR lpSubKey
            0,                  // DWORD ulOptions
            KEY_ALL_ACCESS,     // REGSAM samDesired
            & key                // PHKEY phkResult
            ) == ERROR_SUCCESS || create != REGKEY_NOCREATE &&
            RegCreateKeyEx( 
                HKEY_LOCAL_MACHINE, // HKEY hKey    
                name,               // LPCWSTR lpSubKey
                ...
        ) == ERROR_SUCCESS )
            return true;
        else
        {
            key = 0;
            return false;
        }
    }

I define classes for simple objects that require run-time initialization and destruction even if they need no other methods. These examples are specific to Win32. They may seem trivial but they reduce repetitive coding and prevent the common problem of failing to initialize or destroy the objects. The RegKey class requires the application to define a constructor. Typically this provides a default key name, for example the driver’s registry path, but accepts an overriding argument. If a specialized key is appropriate, the constructor can call this, having to provide only the key name. Otherwise, these classes are very easy to use. A simple declaration creates and initializes the object and the object is automatically destroyed when it goes out of scope, for example:

RegKey::RegKey( LPCWSTR name ) { 
  open( name == 0 ? 
  KEYNAME_TOUCH_DRIVER : name ); }

CriticalSection csApi;
Semaphore semChan;
RegKey rk( "MyKeyName" );

Setters and Getters

class CalStore
{
public:
  USHORT  mCalStore; // Registry 
// e.g. "CalStore" = dword:33
// low nibble = src, high = dest. 
// 1 = controller, 2 = Registry, 3 = both.
  bool wantCalReadNv( void )  { 
    return ( mCalStore & CALSTORE_READNV ) != 0; }
  bool wantCalReadReg( void ) { 
    return ( mCalStore & CALSTORE_READREG ) != 0; }
  bool wantCalWriteNv( void ) { 
    return ( mCalStore & CALSTORE_WRITENV ) != 0; }
  bool wantCalWriteReg( void ){ 
    return ( mCalStore & CALSTORE_WRITEREG ) != 0; }
  bool wantCalRegFlush( void ) { 
    return ( mCalStore & CALSTORE_FLUSHREG ) != 0; }
};

I define setters and getters for member data for which there are legitimate alternative realizations. This includes nearly all cases of bit-mapped values or potential booleans. Merging these into traditional C bit fields serves no purpose if the motivation is just to save memory but being able to perform group operations, such as clearing or testing, with a single intrinsic instruction can significantly improve performance while reducing code size. My Elo WinCE touchscreen driver API provides a function to commit calibration changes to non-volatile memory, which can be the registry or in the controller or both. A single intrinsic is bitmapped for this purpose. It is wrapped in class CalStore, which has only setter and getter methods. Superficially it resembles a pseudo-objectified global but the purpose is very different and legitimate.

Templates

template < class T > void TqSort( T *list, int cnt )
{
    T   pivot;
    int left;
    int right;

    pivot = list[ cnt / 2 ];
    list[ cnt / 2 ] = list[0];

    left = 0;
    right = cnt - 1;

    while( 1 )
    {
        for( ; list[ right ] >= pivot ; --right )
        {
            if( left >= right )
                goto done;
        }
        list[ left++ ] = list[ right ];

        for( ; list[ left ] <= pivot ; ++left )
        {
            if( left >= right )
                goto done;
        }
        list[ right-- ] = list[ left ];
    }
done:
    list[ left ] = pivot;
    if( left > 1 )
        TqSort( list, left );
    if( left < cnt - 2 )
        TqSort( list + left + 1, cnt - left - 1 );
}

I do occasionally define templates, for example my template version of quicksort. The standard quicksort library function calls a comparison function. This enables it to handle any type but at considerable run-time expense, especially on RISC and any deep pipeline CPU. If the type being sorted is intrinsic, a type-specific version is considerable faster. Most implementations of quicksort seem to be copied from K+R. This recurses one more level than necessary and performs useless swaps (particularly in the loop when index = last). My template version doesn’t have these errors. In fact, it doesn’t do any swapping. I simultaneously sort the lower and upper sublists, freeing each destination for copying without swapping. This alone increases the speed by 30%. Overall, the template versions are more than twice as fast as the standard.
See template quick sorts on my Algorithms page.

LARGER EXAMPLES

Elo WinCE: Unified Driver: Object-Oriented API
Most of my classes do real work and are not simply cosmetic shells. This is a counter-example. For my Elo touchscreen control API for WinCE or XP, I created a small set of primitive real functions based on structure rather than application. This was far more capable than the architecture it replaced and ten times smaller but very arcane for users (OEM programmers). It is essentially an API assembly language. To make it easier to use I designed shell classes to translate application-oriented commands into the appropriate primitives. I also created generic classes to simplify creating new shell classes. These function similarly to templates but do some things that templates can’t do.

Elo WinCE Touchscreen Driver Class illustrates two interesting features. One is the use of multiple inheritance to inject independent classes into a class stack. These classes are in separate domains and should not be exposed to each other. While it is possible to achieve this using an elaborate scheme of protected members, multiple inheritance is simpler. In this particular case, it also enables some of the classes to be used independently in another context. The other notable feature is the use of pure virtual functions to, in effect, interleave specialization. The class stack is specialized only at the last derived class but a few generic operations require a specialized instance of certain functions. Without virtual functions a significant portion of the generic class would have to be prematurely specialized.

Communication DLL header and Communication DLL code comprise the communication interface library for my Abbott communication driver. The driver comprises two parts, a kernel level (SYS) driver and an application level DLL. I create a ring buffer in non-pageable (kernel) memory that is also mapped into the application’s data space, enabling a very efficient zero-copy remote DMA but without normal protection. The DLL is needed not only because communication management is too complex for ordinary applications but also because it would be dangerous to allow them to directly manipulate the unprotected memory. The API is presented to applications by several classes, which automate and control all operations. To open a non-standard device communication channel, Windows applications normally are required to know device details and to communicate through the IoCtl interface. This library replaces all of that. The driver supports multiple simultaneous applications over one communication link and multiple types of links. An application can suggest its preferred means of communicating but if another application has already opened a different link, that one will be used instead. Constructors and destructors are essential. Applications don’t even know whether they are the first to open communication or the last to close. The code also shows a legitimate use of exceptions. Communication methods called by an application throw an exception on unrecoverable communication failure. The ThrowAbort class enumerates and describes the failure.

MULTITASKING

RESOURCE SHARING

Multitasking always involves some degree of resource sharing. This is most often data but it may also be a facility, such as I/O hardware or display system. Sharing always involves, at some level, temporary exclusive access by one task. Tasks must cooperate in this because the hardware that can guarantee exclusive access is very limited. The memory management units of advanced CPUs can be configured to control access based on privilege level but this is used only to prevent application access to kernel resources. There is no other general means of controlling access to a memory range. However, most CPUs have certain instructions that can provide brief exclusive access to one location. Cooperating tasks can agree to use this as the access key for anything. The only other exclusion mechanism exists in the specific, but common, case where the tasks can block each other either by disabling interrupts (spin lock if multi-core) or freezing the thread dispatcher.

Some programmers believe that a critical section somehow guards a region of code. It doesn’t guard anything. It is just an in-process mutex. A mutex has larger scope and can be shared by multiple processes and the kernel. We don’t know what a mutex actually is; only what it does. It can be read and written in an indivisible (atomic) action. CISC single-core CPUs traditionally have had uninterruptible RMW (read-modify-write) instructions, which can be used to implement a mutex. CISC multi-core CPUs typically require two instructions, a lock followed by the RMW instruction. Some older CPUs have quite sophisticated RMW instructions. In one atomic operation, Motorola 68020’s CAS instruction compares a memory location to a register and, if they differ, writes a new value into the location. Most have atomic increment, decrement, clear, and exchange. Some newer CISC and most RISC CPUs, the PowerPC being a notable exception, have few or no atomic RMW instructions. For example, Renesas RX200/600 (CISC) has only one, XCHG. XCHG reads a location in memory and unconditionally writes a new value. This can easily implement a mutex but nothing more sophisticated. ARM (RISC) SWP is the same thing but the thumb instruction set, which ARM now promotes, doesn’t support this instruction. For mutexes, ARM now suggests simulating atomic operations using the exclusive load/store instructions LDREX/STREX or bit operations in the bit-band alias region (Arm Semaphores). Both of these are multi-step non-atomic operations, which fail some important test cases.

A mutex doesn’t do any real work but is required for multiple processes to reliably share a resource as equals. A mutex may not be needed for asymmetric access, such as a producer-consumer buffer, arguably the most common example of resource sharing. Exclusive access in this case is needed to ensure data coherency. If the producer overwrites an existing message while the consumer is still reading it, the consumer may see portions of both messages as one. If the buffer holds only one message, a single non-atomic flag can control access. When the producer has a new message and sees the flag clear, it inserts the message and sets the flag. When the consumer see the flag set it processes the message and then clears the flag. One single-message buffer performs poorly in real situations but two of them, i.e. ping-pong, may be adequate. However, unless all messages are the same length, a ring buffer performs much better because it lets the producer insert as many messages as the buffer can hold minus the ones that the consumer hasn’t yet processed.

Ring buffer access can’t be controlled by a boolean but by two pointers (or indices), the producer’s insertion point and the consumer’s extraction point. Basically, the producer does not insert messages past the extraction point and the consumer does not try to read messages beyond the insertion point. Implementation details vary and corner cases, particularly wrap-around can be tricky. However, pointer manipulation is similar to the single-buffer access flag. The insertion point is producer write and consumer read while the extraction point is consumer write and producer read. Real atomic operations are not needed (assuming the pointers are atomically written).

A ring buffer with multiple consumers or producers requires more complex access management. My Abbott Instrument Development communication system supports remote DMA (RDMA) communication between multiple applications and a clinical instrument. One ring buffer is used for all applications to receive messages. My driver maps the kernel memory ring buffer into each application’s process space. The (kernel driver) producer inserts messages contiguously, as in an ordinary ring buffer, but different processes’ messages may be interleaved. The producer effects content-based routing by tagging each message with consumer-specific information and making a linked list of each consumer’s messages. Consumers iterate over their own linked list, ignoring other messages. If the consumer is sleeping, the driver signals it to wake up.

Producer-consumer coordination is too complicated for applications to implement individually. A shared DLL handles the application side. To get its next message, a consumer calls the RxClient class method getMsg (in the DLL). The non-atomic extraction/insertion pointer technique is not feasible. Instead, a message counter, incremented by the producer and decremented by the consumer, provides the information that the two need to ensure data coherency. The RMW increment and decrement operations must be atomic. This is similar to a semaphore but semaphores are not explicitly decremented and are more expensive than a simple thread-safe counter. The producer’s instruction is automatically safe because an application consumer can never interrupt the kernel driver. In contrast, the driver can interrupt an application. If the decrement is atomic there will no problem. This cannot be guaranteed in C++ but it can be in ASM. My AtomicDec16At macro does it in this case. This macro is one of several that I have defined using RMW instructions that are uninterruptible and, therefore, atomic for single core. The macros include a lock prefix in case of multi-core execution. The getMsg function also uses AtomicOr16At and AtomicAnd16At to set and clear status flags, which, among other things, tell the producer whether the consumer is sleeping. These are not part of data coherency control. Code Excerpts.

With no data copying, no ring transitions, and no mutex/semaphore overhead, this driver system is two to ten times faster (and more efficient) than any standard mechanism in Windows or Unix/Linux, especially when heavily loaded. However, it is complicated and breaches the kernel-application boundary. In this case, performance is very important but, if it were the only motivation, it would have to be essential to justify this approach. However, there are other compelling reasons. For example, the system has to support multiple link types. At least two of these are proprietary to Abbott and require custom kernel drivers. In fact, another product group has used my driver for this reason alone. Absent compelling reasons, it is better to use standard mechanisms, like fileIO, pipes, fifos, memory-mapped files, or sockets, all provided natively or by libraries in Linux and Windows.

COMPLEX ACCESS CONTROL

A mutex is a simple brute force means of granting exclusive access (assuming cooperating processes). Using it with a large resource for an extended time can degrade performance. More precise mechanisms may deliver much better performance by protecting only the minimum necessary. Ring buffer insertion/extraction controls, for example, take into account that the producer can safely write into any portion of the buffer that doesn’t contain messages not yet processed by the consumer. With one producer and one consumer, non-atomic insertion and extraction pointers suffice. In a more complex situation, a semaphore or thread-safe counter can be useful. As with a mutex, multiple processes can safely share these objects but their integer content conveys more information than the mutex’s boolean. If this information can be made useful, as in my Abbott communication driver, it may afford much better performance than brute force full resource control using a mutex.

Most new CPUs don’t even offer an atomic increment, much less the complex (and very useful) CAS instruction of the 68020. However, arbitrarily complex access mechanisms can be made thread-safe by guarding with a mutex. Essentially, guarded access to a large resource is derived from a thread-safe complex access mechanism, which is derived from a mutex. A monitor is literally such a class stack.

In one of my Elo touchscreen XP support applications a producer thread passes fixed-size touch event messages to a consumer thread. The two threads have the same priority and the consumer will generally keep pace with the producer. Some buffering is needed but not the complexity of a ring or even ping-pong. A thread-safe counter would suffice for guarding data coherency but there are additional requirements. When there are no messages, the consumer thread needs to sleep and be subsequently awakened when a message is inserted. A semaphore could do all of this. But it could not meet the last requirement, that the producer sleep when the buffer is full and subsequently be awakened when the consumer removes a message. I designed a monitor-like class to provide this functionality. BufferCounter class code

Guarding a continuous block of memory, such as the unprocessed messages in a buffer, doesn’t require the full power of a monitor. A monitor can guard the coherency of a set of discontinuous resources. I needed this capability in the Hitachi 747. Result data and instrument status information nearly continuously streamed into the controller via GPIB. In my controller program DMA moved this data into main memory, leaving most of the CPU bandwidth for other tasks. The analyzer ran on a rigid schedule and would not allow its GPIB output to be paused. Consequently, the controller CPU could never be given exclusive access to the buffer. Further, the data contained several discontinuous coherency sets. For example, the instrument status block contained the count of test result blocks located in another area. If the count from one communication cycle were used to guide reading the results from a previous cycle with a different count (usually the case) the data could easily be random. To avoid this, I implemented a monitor-like access mechanism. To read any coherency sensitive group, the application calls a function, passing a list of source (GPIB buffer) and destination (elsewhere in main memory) addresses. This function monitors the DMA DAR (destination address register) to move the data as a coherent set, either following behind the DAR or temporarily configuring DMA to prematurely but briefly pause before overrunning portions of the set not yet transferred. I implemented this in X86 assembly language. See BM/Hitachi 747: Communication and Coupling and my “Dr. Dobb’s Journal” article Software Partitioning for Multitasking Communication.

THREADS

A story, perhaps true, tells that when Microsoft first introduced threads in Windows, a version of Office was written with more than 100 simultaneous threads. The story doesn’t tell what they expected to accomplish but presumably they thought the program would magically run faster. That is not why threads were invented and not what they will do unless the CPU has many cores, the OS supports symmetric multiprocessing (SMP) of threads, and the threads have useful work to do simultaneously. In many cases, DMA, DSP instructions or coprocessors, or vectorized instructions yield much greater speed improvements.

The primary purpose of threads is to make a program more responsive or robust. For example, if the user interface never directly calls a blocking process then the user can always intervene to abort a stall. If the program has several independent lightweight jobs to do, threads provide an easy way to make these apparently simultaneous. Threads are not suited to hard real-time processing alone but can be useful in combination with interrupts; an ISR performs only very timely work and defers the rest to a support thread.

I first used threads in my Hitachi 747 program. Hitachi had not asked for a multi-tasking program. They had experimented with Unix and knew that its application task switching overhead would overwhelm a computer used for controlling a machine and early versions of Windows appeared to be similar. I wanted to use DOS anyway. I intended to deliver high performance with device drivers, for which Unix and Windows at the time required rebooting for every code change. As I explain in my Doctor Dobb’s Journal article, I invented lightweight threads for DOS. For fast but flexible tasking I paired the non-real-time threads with real-time ISRs written in assembly language. In most programs that pair ISRs with threads, strategic decisions are reactive, carried out by the thread, which is activated by the ISR. I implemented anticipatory strategy by designing table-driven ISRs. By changing table entries a thread can tell its ISR partner what it would do if it were invoked under various circumstances.

Most of my programs interact with external systems beyond their direct control, potentially creating blocking conditions. For these, I always create at least two threads with one guaranteed to never block and always respond to a request to abort. Although this is usually the user interface thread, it isn’t always. For example, in my IDT controller, which has no user interface, it is the USB communication thread. In this case, a very cheap non-preemptive round-robin task loop “schedules” the threads and can’t guarantee that the USB thread will be periodically invoked. However, USB input from the host triggers an ISR, which checks the health of the USB thread and clears blocks if necessary.

In my Abbott Instrument Development communication system, when an application invokes RxClient::getMsg it can request blocking or immediate return if there are no messages for it. A blocking call should be made from a thread. The block may end on a new message, communication failure or user request (from the unblocked UI thread) to abort. The library contains all of the communication functions but the application has to create certain structures in its own process space. Classes defined by my library simplify application programming, especially if a message receiving thread is used. To create a communication thread, a function is defined to do the application-specific message processing. A small portion of this is boilerplate code required for communication. The rxThreadProc function is an example. The application creates new RxClient and ComThreadApi objects. It then invokes ComThreadApi::start, passing the message processing function, a communication driver reference, and an Event that the application can set at any time to terminate the thread. The start method sets up a mailbox for asynchronous communication between the thread and the rest of the application and starts the application function as a thread. The library then handles all communication. The application only needs to call the RxClient getMsg method to get its messages.