. | . | . | . | David McCracken |
Programmingupdated:2018.09.07 |
I have programmed in many languages and systems. Metrics can be ascertained for all of these but most programmers won’t know what they mean. This is especially true with my style of programming. In all languages I try to realize polymorphism by moving anything that can be declarative into tables. Frequently the resulting program is radically non-redundant, more flexible and much smaller than typical programs. Without a functionally equivalent typical program for comparison, it is difficult to understand what the metrics mean. For example, is my table-driven single statement (Abbott CD3200-Algorithm Transformation) trivial compared to the 2000-statement, 16-level deep sequence of nested functions that it replaces? Nevertheless, metrics can tell something about the experience that informs and motivates my current methods.
I don’t routinely run statistics on my code but when others have done so, my code is highly rated, primarily because I comment more than most programmers. More revealing than the quantity is the high quality, as indicated by universal praise from other programmers who have to read my code. My code also gets good grades for its unusually low cyclomatic complexity, the result of my separation of procedural and declarative components. Any other statistics don’t necessarily imply quality. Class size is meaningless but the number of methods and the number of statements per method can be meaningful because the superfluous setter-getter paradigm produces many methods with a small number of statements. Class count and amount of work done by class methods as opposed to non-class functions strongly depend on the application.
I have selected for analysis only relatively large C/C++ programs that I have written entirely myself. This leaves out some significant work, such as Hitachi 747 and Abbott CD3200, where, as software lead, it was my responsibility not only to originate nearly everything but also to pass it on to others as soon as possible. For the purpose of this analysis I have excluded all source files, such as headers and libraries, that I didn’t write, even if they are essential to the program. The shortest summary of the metrics of these programs by project is:
Project | Files | Lines | Statements | Functions | Classes |
Abbott Instrument Dev Sys (C, C++) | 285 | 100,908 | 38,632 | 1,442 | 51 |
IDT USB dev/demo (C++, C) | 67 | 24,184 | 9,935 | 350 | 11 |
Elo WinCE driver and apps (C++) | 104 | 22,587 | 9,530 | 334 | 90 |
Dataman (Win MFC C++) | 59 | 20,958 | 8,387 | 267 | 34 |
Elo XP Utilities (Win32 C++) | 26 | 8,218 | 3,912 | 147 | 10 |
Total | 541 | 176,855 | 70,396 | 2,540 | 196 |
I have used the SourceMonitor program from campwoodsw to analyze my source code. I have configured it to not include blanks in line count. I have selected a subset of its metrics and abbreviated the names to fit into small tables. They are:
Files | source (c, cpp, h) file count |
Lines | total number of non-blank lines (in all source files) |
Stmnts | total number of statements |
%Br | percentage of statements that cause a break in the execution path |
%Cmt | ostensibly the percentage of commented statements but block comments are included |
Cls | total number of classes defined in the source files |
M/C | average number of methods per class |
S/M | average number of statements per class method |
Max Comp | highest complexity (only number of execution paths) of any one method or function. |
Avg Comp | average complexity |
Funcs | total number of functions |
For suites of cooperating programs, metric totals may have some value. This is especially true of my WinCE driver group for Elo. These programs are all C++ and comprise a unified package. The serial and USB drivers are built as distinct programs but are statically linked to the agnostic driver base, which provides 75% of each driver’s total code. To report totals, Files, Lines, Stmnts, and Funcs are simply summed; for %Br, %Cmt, M/C, S/M, and Avg Comp I compute an average of each program’s metric weighted by its statement count.
Abbott Instrument Development System Page
Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Max depth | Avg depth | Avg comp | Funcs |
165 | 60,606 | 21,282 | 23.2 | 57.5 | 51 | 4.47 | 4.8 | 287 | 9 | 1.76 | 5.78 | 808 |
Files | Lines | Stmnts | %Br | %Cmt | Funcs | Avg S/F/C | Max comp | Max depth | Avg depth | Avg comp |
11 | 4,763 | 1,423 | 19.6 | 54.7 | 68 | 25.5 | 48 | 6 | 1.42 | 5.38 |
Files | Lines | Stmnts | %Br | %Cmt | Funcs | Avg S/F/C | Max comp | Max depth | Avg depth | Avg comp |
65 | 20,464 | 8,741 | 25.5 | 50.9 | 315 | 31.5 | 72 | 9+ | 1.72 | 7.09 |
Files | Lines | Stmnts | %Br | %Cmt | Funcs | Avg S/F/C | Max comp | Max depth | Avg depth | Avg comp |
44 | 15,075 | 7,186 | 23.7 | 41.9 | 251 | 32.4 | 95 | 9+ | 1.72 | 7.44 |
IDT USB Development/Demo Kit Page
This comprises two programs, a C++ Win32 GUI IDE and firmware written in C and
some ASM for an ARM-based NXP LPC1342 controller. The two programs cooperate to
perform advanced DSP touch input analysis and are essentially one program.
Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Max depth | Avg depth | Avg comp | Funcs |
38 | 15,779 | 6,648 | 26.2 | 47.0 | 11 | 7.30 | 3.6 | 106 | 9 | 1.94 | 6.49 | 211 |
Files | Lines | Stmnts | %Br | %Cmt | Funcs | Avg S/F/C | Max comp | Max depth | Avg depth | Avg comp |
29 | 15,779 | 3,287 | 18.4 | 50.0 | 139 | 16.1 | 61 | 9 | 1.45 | 4.07 |
Elo WinCE Unified Driver Page
DRIVER GROUP. EloDev is the base class library, statically linked into the
EloUsb and EloSer driver programs. nvCal is a single file included separately
in the two drivers. EloBeep is an optional driver library component. EloCmn and
EloApi are shared by drivers and applications.
Project | Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Avg comp | Funcs |
EloDev | 22 | 5,549 | 2,173 | 23.6 | 53.8 | 9 | 15.88 | 8.7 | 48 | 4.01 | 37 |
EloUsb | 15 | 2,443 | 869 | 16.3 | 55.7 | 8 | 6.86 | 2.3 | 18 | 2.63 | 41 |
EloSer | 18 | 2,247 | 943 | 17.0 | 49.6 | 8 | 6.29 | 5.5 | 30 | 3.37 | 26 |
nvCal | 1 | 361 | 168 | 13.1 | 50.1 | 1 | 3.00 | 1.0 | 13 | 3.70 | 7 |
EloBeep | 2 | 236 | 85 | 31.8 | 55.1 | 0 | 0 | 0 | 9 | 4.00 | 7 |
EloCmn | 15 | 1,834 | 829 | 1.2 | 39.7 | 54 | 4.10 | 1.5 | 5 | 1.11 | 22 |
EloApi | 2 | 950 | 307 | 18.2 | 47.3 | 1 | 1.31 | 5.8 | 13 | 2.54 | 16 |
Totals | 75 | 13,620 | 5,374 | 17.3 | 50.73 | 81 | 5.85 | 3.1 | 48 | 3.13 | 156 |
ELO WINCE DRIVER SUPPORT APPLICATIONS. EloVa provides screen calibration. EloTalk is essentially an IDE, providing many utilities for OEM developers in one program. EloCpl is a Windows control panel. EloConScript is a script player combining a scripting language that I designed with Elo’s standard touchscreen configuration language. I have computed totals for these because they are part of the cooperative suite but they are fully independent, unlike the driver group.
Project | Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Avg comp | Funcs |
EloVa | 8 | 2,312 | 1,039 | 25.2 | 42.0 | 0 | 2.00 | 4.8 | 41 | 6.83 | 36 |
EloTalk | 15 | 6,136 | 2,906 | 27.6 | 42.3 | 9 | 2.14 | 3.4 | 56 | 5.35 | 132 |
EloCpl | 5 | 360 | 138 | 34.1 | 56.4 | 0 | 0 | 0 | 17 | 5.71 | 7 |
EloConScript | 1 | 159 | 73 | 19.2 | 49.1 | 0 | 0 | 0 | 10 | 6.67 | 3 |
Totals | 29 | 8,967 | 4,156 | 27 | 43 | 9 | 2.1 | 1.6 | 56 | 5.8 | 178 |
Dataman Flash Screencast
I released this as freeware. It is a cytometric big data analytics tool. It
reads FCS (Flow Cytometry Standard) data files and displays user-selected
combinations as histograms or scatter-plots. I originally wrote this for Win32
API but ported to MFC to add floating dockable toolbars. All code provided by
MFC is excluded from these metrics. The relatively high M/C and S/M values are
motivated by the application, not by MFC. The code changed very little when I
ported to MFC.
Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Max depth | Avg depth | Avg comp | Funcs |
59 | 20,958 | 8,387 | 21.5 | 47.5 | 34 | 11.96 | 9.5 | 107 | 9 | 1.62 | 4.32 | 267 |
ELO Configuration Programs Page
EloUpdater and EloStudio are both C++ Win32 API programs that can be invoked in
non-GUI mode for interactive command line or scripted operation.
Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Max depth | Avg depth | Avg comp | Funcs |
14 | 3,694 | 1,985 | 24.1 | 28.9 | 6 | 2.20 | 2.5 | 64 | 8 | 1.81 | 5.97 | 88 |
Files | Lines | Stmnts | %Br | %Cmt | Cls | M/C | S/M | Max comp | Max depth | Avg depth | Avg comp | Funcs |
12 | 4,524 | 1,927 | 15.5 | 36.6 | 4 | 1.50 | 2.0 | 69 | 9+ | 1.29 | 5.47 | 59 |
The most important determinant of productivity is the amount of time spent waiting vs. thinking, developing, and testing ideas. Minimizing repetitive tasks is the most effective way to improve this ratio. For a programmer, this means minimizing build time and scripting anything and everything.
When I took over Elo’s WinCE program I inherited a one-hour build time. The productive code development and test time averaged only three minutes. Even though I was under extreme pressure to solve a customer problem and the code changes appeared simple, I refused to work on the code for three days while I figured out how to reduce the build to 15 minutes. This afforded me 30 coding cycles per day instead of only eight. By the end of my fourth day, I had achieved only 30 cycles vs. the 32 that I would have achieved without my up front effort. But on the fifth day I had achieved 60 cycles vs. 40. When I later reduced the build time to two minutes, the three-minute coding effort dominated the cycle, effecting a quantum change in the development process to essentially continuous problem solving.
There is no consensus on the general definition of scripting. Everyone would agree that bash and bat are scripting languages for Linux and Windows and any “program” written in them is a script. Most would agree that Perl and Python are scripting languages but they are not only more portable but also more abstract and generic. Being interpreted instead of compiled doesn’t make a program a script; Java byte codes are interpreted but the same program can also be compiled.
I define scripts as programs that automate computer user operations, typically procedures that the user would otherwise do interactively. I mix languages in my scripting. For example, when I started as the software lead on Abbott’s CD3200 I initially wrote instructions for programmers to follow to properly use the PVCS version control system to avoid stepping on each other’s toes. But then I realized that, as long as they were directly interacting with PVCS they would not change their own procedures. However, by scripting everything they needed I could make their job easier even while enforcing my new rules. This could not be done without generic string processing, which bat can’t do. We were using Polymake to build our programs and it offered much of the capability of Perl (but in its own language) so I used it for my scripts. For missing capabilities, I wrote bat scripts and OS-specific compiled C programs. The programmer would invoke a bat script, which would invoke a Polymake script, which might invoke other bat scripts and my compiled utilities.
Windows bat and Linux bash have odd syntax due to their evolution from more primitive forms (in the case of bash, from three different languages). However, I use them when I can because they are likely to be present in a consistent form in any computer. This is always true for bat. It is true for bash in Linux distributions but, notably, not the Android kernel, which has only the very primitive ash shell.
Windows bat is much less capable than other languages in current use. However,
it is not as limited as commonly believed. Its lack of general string
processing is significant but it can parse strings in the limited but important
context of files and directories. For example, in my Elo WinCE release
scripting system, the statement
for /F "tokens=1,2" %%p in (..\Specific\projList) do (...
parses the project names from the name/version tuples listed in the projList file, e.g.
eloUsb 1.25
eloSer 1.22
eloTalk 1.10
eloVa 2.40
The release scripting system comprises a couple of dozen bat files totaling more than two thousand lines of code. It automates the entire process from checking out all sources from version control, building multiple programs, libraries, and drivers for four different CPUs in three versions of the OS, and packaging everything into a distributable file. It could be argued that such complexity should not be handled with bat but my partitioning into independent domains significantly reduced the complexity and, coincidentally, created opportunities to share domain-specific scripts with customers, for which bat is a good choice.
I write scripts whenever it will save me time in the long run. I spent a month
developing the Elo WinCE release scripting system but I replaced a two-week
interactive task with a two-hour walkaway. I often write small scripts that
just make my routine work easier. For example, in both Windows and Linux I
often work at a command line. Linux does everything it can to prevent root from
working in a GUI. I find it distracting to have to type long path names to
change the directory, especially to return to one recently visited. Windows and
Linux both have the simple change directory command cd
. This is
easy to use but provides no assistance. They also have pushd
and
popd
, which can simplify retracing the directory change path but
it is hard to remember to use pushd instead of cd and they have no forward
retrace, which I want almost as often as backward. I would also like to be able
to select a directory at random from the record. An additional problem in
Windows is that the command window title doesn’t show the current directory.
This is very annoying if several of these are minimized.
To solve the directory path problem in Linux, I wrote a
bash script that performs all of the functions of
cd
, pushd
, and popd
plus my improvements.
The central feature is a circular path list, which enables forward as well as
backward traversal. The startup bashrc (e.g. Fedora /etc/bashrc,
Ubuntu /etc/bash.bashrc) invokes my cdlist script, which only defines functions, one of
which is called cd
. Subsequently invoking cd
executes this function. This accepts a few arguments that would be illegal to
cd
. For all other command lines, it silently pushes the current
directory onto its own list and then uses builtin cd
to invoke the intrinsic
version. The arguments -b and -f step backward and forward through the circular
path list. The argument -s displays the paths in a numbered list for single
key random selection.
if _%1 == _? ( echo D replaces CD, PUSHD, POPD echo D makes directory title echo D / is popd echo D dir is pushd dir set /p dummy=Press Enter to close goto done ) if not _%1 == _ ( if not _%1 == _. ( if _%1 == _/ (popd) else (pushd %*) ) ) title %CD% :done
My bash solution does everything I wanted in Linux but Windows bat scripting is
too primitive for the circular path list. However, a very simple script
addresses the window title and usage complexity problems. In Windows it is not
possible to override cd
but I named the script d.bat (as in
directory) so it is easy to remember and it replaces cd
,
pushd
, and popd
. With no arguments, d
sets the
window title. Argument "/" invokes popd
; any other passes the
command line to pushd
. In all cases, the window title is assigned
the destination path.
The purpose of object-oriented programming is to make programs that are easier to maintain. Object-oriented languages help by providing standard syntax for many OO concepts but they are not required for nor do they automatically produce object-oriented programs. I wrote my first OO program, an in-circuit emulator IDE, in 6502 assembly language. As I describe in Hybrid Tool For Universal Microprocessor Development published in Computer Design, I implemented polymorphic functions using target CPU-agnostic code with indirectly addressed (i.e. “this”) CPU-specific data tables (i.e. objects). In contrast, the common practice of “modernizing” an old and clearly not OO program by redefining every global as a class with trivial setter and getter methods does not miraculously transform the program.
Classes are at the core of OO programming but most programmers don’t really design classes; they derive application-specific classes from existing ones often defined by a framework, such as STL, Android, or iOS. This makes code reuse simple, reliable, and predictable but it does not yield the game changing breakthroughs that are possible with object-oriented analysis. For example (BM-Hitachi 747-Error Reporting) I solved Hitachi’s immediate “buzz off” problem by recognizing that it was symptomatic of an endemic problem that required a comprehensive OO solution. My generic alarm class enabled divorcing control domains from presentation. Not only did this solve the thorny buzz off problem but also many persistent user complaints, some of which I didn’t even know about until users started asking me how I knew that they had been pestering Hitachi for years to solve them. I wrote that program in C.
Object-oriented design is not necessarily limited to single programs. Abbott’s instrument control languages were at once both promiscuous and rigid because they treated each specific device as an independent fully-realized entity. I redesigned the entire system around the concept that every device was an instance of an abstract class. When I also redefined major operations as table-driven procedures, these too became classes. These changes allowed the scripting language, compiler, and execution units’ firmware to only have to know how to deal with a relatively small number of classes in order to handle an infinite variety of objects without redesign. See Abbott Instrument Development System>Scripting System. Despite the obvious object orientation of my new design, I wrote only the user interface program in C++. I wrote the script compiler in C with Bison-BNF grammar parser and FLEX-RE scanner. I wrote the instrument configuration compiler (with my own ad hoc parser) in C and execution unit firmware in C and 68K ASM.
Object orientation can be useful in declarative program design. In particular, inheritance can simplify defining variations without restricting more complete definitions. This is especially evident in my IDT development/demo configuration language, which allows unrestricted combinations of characteristics to be named and used directly or as base classes for other combinations. Recursive multiple inheritance with unrestricted override of individual characteristics enables named combinations to be reused even when they are not entirely appropriate, reducing the proliferation of nearly identical objects.
My Elo WinCE release package polymorphic scripting system supports a very large combinatorial space by defining mutually virtual domains. Without this object-oriented design, each release would have to be fully defined either by an enormously complex script or by a less complex script in combination with a BLOB database manager. Both of these would be far more difficult to implement and maintain and would not provide any support for automation within any single domain. My polymorphic design creates domain-specific scripts that are useful independently of the release process.
Some elements of C++ are good in all contexts. For example, the member access syntax of inheritance is always better than that of aggregation. Other elements are good when used properly and can be very bad otherwise. Most of these are hiding mechanisms, which a good programmer may use to ensure the integrity of a class but a bad programmer will use to hide their mess. I have seen many programs written by other people and have encountered relatively few of the kinds of problems, such as uninitialized objects and misuse of data by ignorant functions, that Stroustrup claims to be endemic. The worst and most widespread problem is cut-and-paste coding at all levels, from repetition within single statements to nearly identical complete functions. At least when this can’t be hidden there is some hope of seeing and correcting it but if hidden by access and virtual functions, it will never get fixed.
I define access functions to member data only if simple access is not possible or alternative implementations are reasonable, for example a bit- or byte-array for boolean collections or an aliased structure-array (providing efficient array and safer structure access).
Virtual functions are the poorest mechanism for polymorphism and I use them only when I can’t define an equivalent algorithm or table-driven computation. For Abbott’s CD3200 I transformed a 2000-statement, 16-level deep sequence of nested functions into one statement with a small table. The object-oriented approach would have transformed this into a 3000-statement class with virtual functions. In many of the situations where virtual functions might be useful I need object- rather than class-specific functions. In my Hitachi 747 user interface I define each screen with an array of structures, each defining an element, including element-specific as well as class-specific functions to call in response to user action on that element.
Templates have often disappointed me. Even when I have a situation that calls for generics, I often find that the template mechanism is too constrained. For example, in my Elo WinCE touchscreen driver API, functions that make sense to the application are created from structure-based primitives. It is impossible to predict all of the application-specific functions that OEM developers might want but they fall into a limited number of generic patterns. Templatizing these patterns would simplify the effort of creating useful functions. However, templates can’t make a generic constructor or anything involving a variable number of parameters. I used macros, which not only provided the required flexibility but also were simpler and easier for the developers to understand than templates.
I define constructors and destructors only when they serve some purpose beyond the red herring notion of rampant uninitialized data use. Stroustrup recommends defining constructors even for classes with a single intrinsic data memory to ensure initialization. Passing such an object to a function requires its construction. Instead of simply pushing a literal onto the stack, the program has to allocate heap memory, assign a value, and then read this value back to push it onto the stack. This incurs a horrible run-time penalty but it doesn’t interfere with good program design. This is not always the case. A type with an explicitly declared constructor cannot be used in a statically initialized table. This prevents separating the procedural and declarative aspects of a program, which is a far more effective means of achieving polymorphism than anything suggested by C++. For example, at Hitachi a programmer who did not know any assembly language was able to diagnose an error in one of my assembly language drivers by spotting an inconsistency in the table pattern. See BM/Hitachi 747: Communication and Coupling and my “Dr. Dobb’s Journal” article Software Partitioning for Multitasking Communication.
Many programmers use exceptions excessively, even within a single function, apparently to avoid using goto. Exceptions are complex and incur a significant run-time penalty. Extra information must be put on the stack. Only classes can be thrown. These classes must be defined. Other programmers must examine these definitions. When a class is thrown, it must first be constructed. When the catcher goes out of scope the thrown object must be destroyed. There is a reason for this complexity. Exceptions are not intended as routine control flow mechanisms but as a means of reporting errors from domains that don’t know the error handling context, for example from a library function. This is how I use them.
I have designed hundreds of general and application-specific classes. Most of these are too large to show as examples and snippets would be meaningless. However, some short examples address issues that I consider important, such as whether wrapping a global in a trivial class could ever be justified (it can), when are explicit constructors and destructors of real value, and appropriate use of setters and getters.
class CriticalSection : public CRITICAL_SECTION { public : CriticalSection( void ) { InitializeCriticalSection( this ); } ~CriticalSection( void ) { DeleteCriticalSection( this ); } void enter( void ) { EnterCriticalSection( this ); } void leave( void ) { LeaveCriticalSection( this ); } }; class Semaphore { public: HANDLE hSem; Semaphore( long initialCnt = 0, long maxCnt = 32000, LPCTSTR name = 0 ) { hSem = CreateSemaphore( 0, initialCnt, maxCnt, name ); } ~Semaphore() { CloseHandle( hSem ); } operator++() { ReleaseSemaphore( hSem, 1, 0 ); } }; enum { REGKEY_NOCREATE = 9876, REGKEY_NONVOL = REG_OPTION_NON_VOLATILE, REGKEY_VOLATILE = REG_OPTION_VOLATILE }; class RegKey { public: HKEY key; RegKey( LPCWSTR name, DWORD create = REGKEY_NOCREATE ); ~RegKey () { RegCloseKey( key ); } private: bool open( LPCWSTR name, DWORD create = REGKEY_NOCREATE ) { if( RegOpenKeyEx( HKEY_LOCAL_MACHINE, // HKEY hKey name, // LPCWSTR lpSubKey 0, // DWORD ulOptions KEY_ALL_ACCESS, // REGSAM samDesired & key // PHKEY phkResult ) == ERROR_SUCCESS || create != REGKEY_NOCREATE && RegCreateKeyEx( HKEY_LOCAL_MACHINE, // HKEY hKey name, // LPCWSTR lpSubKey ... ) == ERROR_SUCCESS ) return true; else { key = 0; return false; } }
I define classes for simple objects that require run-time initialization and destruction even if they need no other methods. These examples are specific to Win32. They may seem trivial but they reduce repetitive coding and prevent the common problem of failing to initialize or destroy the objects. The RegKey class requires the application to define a constructor. Typically this provides a default key name, for example the driver’s registry path, but accepts an overriding argument. If a specialized key is appropriate, the constructor can call this, having to provide only the key name. Otherwise, these classes are very easy to use. A simple declaration creates and initializes the object and the object is automatically destroyed when it goes out of scope, for example:
RegKey::RegKey( LPCWSTR name ) { open( name == 0 ? KEYNAME_TOUCH_DRIVER : name ); } CriticalSection csApi; Semaphore semChan; RegKey rk( "MyKeyName" );
class CalStore { public: USHORT mCalStore; // Registry // e.g. "CalStore" = dword:33 // low nibble = src, high = dest. // 1 = controller, 2 = Registry, 3 = both. bool wantCalReadNv( void ) { return ( mCalStore & CALSTORE_READNV ) != 0; } bool wantCalReadReg( void ) { return ( mCalStore & CALSTORE_READREG ) != 0; } bool wantCalWriteNv( void ) { return ( mCalStore & CALSTORE_WRITENV ) != 0; } bool wantCalWriteReg( void ){ return ( mCalStore & CALSTORE_WRITEREG ) != 0; } bool wantCalRegFlush( void ) { return ( mCalStore & CALSTORE_FLUSHREG ) != 0; } };
I define setters and getters for member data for which there are legitimate
alternative realizations. This includes nearly all cases of bit-mapped values
or potential booleans. Merging these into traditional C bit fields serves no
purpose if the motivation is just to save memory but being able to perform
group operations, such as clearing or testing, with a single intrinsic
instruction can significantly improve performance while reducing code size. My
Elo WinCE touchscreen driver API provides a function to commit calibration
changes to non-volatile memory, which can be the registry or in the controller
or both. A single intrinsic is bitmapped for this purpose. It is wrapped in
class CalStore
, which has only setter and getter methods.
Superficially it resembles a pseudo-objectified global but the purpose is very
different and legitimate.
template < class T > void TqSort( T *list, int cnt ) { T pivot; int left; int right; pivot = list[ cnt / 2 ]; list[ cnt / 2 ] = list[0]; left = 0; right = cnt - 1; while( 1 ) { for( ; list[ right ] >= pivot ; --right ) { if( left >= right ) goto done; } list[ left++ ] = list[ right ]; for( ; list[ left ] <= pivot ; ++left ) { if( left >= right ) goto done; } list[ right-- ] = list[ left ]; } done: list[ left ] = pivot; if( left > 1 ) TqSort( list, left ); if( left < cnt - 2 ) TqSort( list + left + 1, cnt - left - 1 ); }
I do occasionally define templates, for example my template version of quicksort.
The standard quicksort library function calls a comparison function. This
enables it to handle any type but at considerable run-time expense, especially
on RISC and any deep pipeline CPU. If the type being sorted is intrinsic, a
type-specific version is considerable faster. Most implementations of quicksort
seem to be copied from K+R. This recurses one more level than necessary and
performs useless swaps (particularly in the loop when index = last). My
template version doesn’t have these errors. In fact, it doesn’t do any
swapping. I simultaneously sort the lower and upper sublists, freeing each
destination for copying without swapping. This alone increases the speed by
30%. Overall, the template versions are more than twice as fast as the standard.
See template quick sorts on my Algorithms page.
Elo WinCE: Unified Driver: Object-Oriented API
Most of my classes do real work and are not simply cosmetic shells.
This is a counter-example. For my Elo touchscreen control API for WinCE or XP,
I created a small set of primitive real functions based on structure rather
than application. This was far more capable than the architecture it replaced
and ten times smaller but very arcane for users (OEM programmers). It is
essentially an API assembly language. To make it easier to use I designed shell
classes to translate application-oriented commands into the appropriate
primitives. I also created generic classes to simplify creating new shell
classes. These function similarly to templates but do some things that
templates can’t do.
Elo WinCE Touchscreen Driver Class illustrates two interesting features. One is the use of multiple inheritance to inject independent classes into a class stack. These classes are in separate domains and should not be exposed to each other. While it is possible to achieve this using an elaborate scheme of protected members, multiple inheritance is simpler. In this particular case, it also enables some of the classes to be used independently in another context. The other notable feature is the use of pure virtual functions to, in effect, interleave specialization. The class stack is specialized only at the last derived class but a few generic operations require a specialized instance of certain functions. Without virtual functions a significant portion of the generic class would have to be prematurely specialized.
Communication DLL header and Communication DLL code comprise the communication interface library for my Abbott communication driver. The driver comprises two parts, a kernel level (SYS) driver and an application level DLL. I create a ring buffer in non-pageable (kernel) memory that is also mapped into the application’s data space, enabling a very efficient zero-copy remote DMA but without normal protection. The DLL is needed not only because communication management is too complex for ordinary applications but also because it would be dangerous to allow them to directly manipulate the unprotected memory. The API is presented to applications by several classes, which automate and control all operations. To open a non-standard device communication channel, Windows applications normally are required to know device details and to communicate through the IoCtl interface. This library replaces all of that. The driver supports multiple simultaneous applications over one communication link and multiple types of links. An application can suggest its preferred means of communicating but if another application has already opened a different link, that one will be used instead. Constructors and destructors are essential. Applications don’t even know whether they are the first to open communication or the last to close. The code also shows a legitimate use of exceptions. Communication methods called by an application throw an exception on unrecoverable communication failure. The ThrowAbort class enumerates and describes the failure.
Multitasking always involves some degree of resource sharing. This is most often data but it may also be a facility, such as I/O hardware or display system. Sharing always involves, at some level, temporary exclusive access by one task. Tasks must cooperate in this because the hardware that can guarantee exclusive access is very limited. The memory management units of advanced CPUs can be configured to control access based on privilege level but this is used only to prevent application access to kernel resources. There is no other general means of controlling access to a memory range. However, most CPUs have certain instructions that can provide brief exclusive access to one location. Cooperating tasks can agree to use this as the access key for anything. The only other exclusion mechanism exists in the specific, but common, case where the tasks can block each other either by disabling interrupts (spin lock if multi-core) or freezing the thread dispatcher.
Some programmers believe that a critical section somehow guards a region of code. It doesn’t guard anything. It is just an in-process mutex. A mutex has larger scope and can be shared by multiple processes and the kernel. We don’t know what a mutex actually is; only what it does. It can be read and written in an indivisible (atomic) action. CISC single-core CPUs traditionally have had uninterruptible RMW (read-modify-write) instructions, which can be used to implement a mutex. CISC multi-core CPUs typically require two instructions, a lock followed by the RMW instruction. Some older CPUs have quite sophisticated RMW instructions. In one atomic operation, Motorola 68020’s CAS instruction compares a memory location to a register and, if they differ, writes a new value into the location. Most have atomic increment, decrement, clear, and exchange. Some newer CISC and most RISC CPUs, the PowerPC being a notable exception, have few or no atomic RMW instructions. For example, Renesas RX200/600 (CISC) has only one, XCHG. XCHG reads a location in memory and unconditionally writes a new value. This can easily implement a mutex but nothing more sophisticated. ARM (RISC) SWP is the same thing but the thumb instruction set, which ARM now promotes, doesn’t support this instruction. For mutexes, ARM now suggests simulating atomic operations using the exclusive load/store instructions LDREX/STREX or bit operations in the bit-band alias region (Arm Semaphores). Both of these are multi-step non-atomic operations, which fail some important test cases.
A mutex doesn’t do any real work but is required for multiple processes to reliably share a resource as equals. A mutex may not be needed for asymmetric access, such as a producer-consumer buffer, arguably the most common example of resource sharing. Exclusive access in this case is needed to ensure data coherency. If the producer overwrites an existing message while the consumer is still reading it, the consumer may see portions of both messages as one. If the buffer holds only one message, a single non-atomic flag can control access. When the producer has a new message and sees the flag clear, it inserts the message and sets the flag. When the consumer see the flag set it processes the message and then clears the flag. One single-message buffer performs poorly in real situations but two of them, i.e. ping-pong, may be adequate. However, unless all messages are the same length, a ring buffer performs much better because it lets the producer insert as many messages as the buffer can hold minus the ones that the consumer hasn’t yet processed.
Ring buffer access can’t be controlled by a boolean but by two pointers (or indices), the producer’s insertion point and the consumer’s extraction point. Basically, the producer does not insert messages past the extraction point and the consumer does not try to read messages beyond the insertion point. Implementation details vary and corner cases, particularly wrap-around can be tricky. However, pointer manipulation is similar to the single-buffer access flag. The insertion point is producer write and consumer read while the extraction point is consumer write and producer read. Real atomic operations are not needed (assuming the pointers are atomically written).
A ring buffer with multiple consumers or producers requires more complex access management. My Abbott Instrument Development communication system supports remote DMA (RDMA) communication between multiple applications and a clinical instrument. One ring buffer is used for all applications to receive messages. My driver maps the kernel memory ring buffer into each application’s process space. The (kernel driver) producer inserts messages contiguously, as in an ordinary ring buffer, but different processes’ messages may be interleaved. The producer effects content-based routing by tagging each message with consumer-specific information and making a linked list of each consumer’s messages. Consumers iterate over their own linked list, ignoring other messages. If the consumer is sleeping, the driver signals it to wake up.
Producer-consumer coordination is too complicated for applications to implement individually. A shared DLL handles the application side. To get its next message, a consumer calls the RxClient class method getMsg (in the DLL). The non-atomic extraction/insertion pointer technique is not feasible. Instead, a message counter, incremented by the producer and decremented by the consumer, provides the information that the two need to ensure data coherency. The RMW increment and decrement operations must be atomic. This is similar to a semaphore but semaphores are not explicitly decremented and are more expensive than a simple thread-safe counter. The producer’s instruction is automatically safe because an application consumer can never interrupt the kernel driver. In contrast, the driver can interrupt an application. If the decrement is atomic there will no problem. This cannot be guaranteed in C++ but it can be in ASM. My AtomicDec16At macro does it in this case. This macro is one of several that I have defined using RMW instructions that are uninterruptible and, therefore, atomic for single core. The macros include a lock prefix in case of multi-core execution. The getMsg function also uses AtomicOr16At and AtomicAnd16At to set and clear status flags, which, among other things, tell the producer whether the consumer is sleeping. These are not part of data coherency control. Code Excerpts.
With no data copying, no ring transitions, and no mutex/semaphore overhead, this driver system is two to ten times faster (and more efficient) than any standard mechanism in Windows or Unix/Linux, especially when heavily loaded. However, it is complicated and breaches the kernel-application boundary. In this case, performance is very important but, if it were the only motivation, it would have to be essential to justify this approach. However, there are other compelling reasons. For example, the system has to support multiple link types. At least two of these are proprietary to Abbott and require custom kernel drivers. In fact, another product group has used my driver for this reason alone. Absent compelling reasons, it is better to use standard mechanisms, like fileIO, pipes, fifos, memory-mapped files, or sockets, all provided natively or by libraries in Linux and Windows.
A mutex is a simple brute force means of granting exclusive access (assuming cooperating processes). Using it with a large resource for an extended time can degrade performance. More precise mechanisms may deliver much better performance by protecting only the minimum necessary. Ring buffer insertion/extraction controls, for example, take into account that the producer can safely write into any portion of the buffer that doesn’t contain messages not yet processed by the consumer. With one producer and one consumer, non-atomic insertion and extraction pointers suffice. In a more complex situation, a semaphore or thread-safe counter can be useful. As with a mutex, multiple processes can safely share these objects but their integer content conveys more information than the mutex’s boolean. If this information can be made useful, as in my Abbott communication driver, it may afford much better performance than brute force full resource control using a mutex.
Most new CPUs don’t even offer an atomic increment, much less the complex (and
very useful) CAS instruction of the 68020. However, arbitrarily complex access
mechanisms can be made thread-safe by guarding with a mutex. Essentially,
guarded access to a large resource is derived from a thread-safe complex access
mechanism, which is derived from a mutex. A monitor
is literally such a class
stack.
In one of my Elo touchscreen XP support applications a producer thread passes fixed-size touch event messages to a consumer thread. The two threads have the same priority and the consumer will generally keep pace with the producer. Some buffering is needed but not the complexity of a ring or even ping-pong. A thread-safe counter would suffice for guarding data coherency but there are additional requirements. When there are no messages, the consumer thread needs to sleep and be subsequently awakened when a message is inserted. A semaphore could do all of this. But it could not meet the last requirement, that the producer sleep when the buffer is full and subsequently be awakened when the consumer removes a message. I designed a monitor-like class to provide this functionality. BufferCounter class code
Guarding a continuous block of memory, such as the unprocessed messages in a buffer, doesn’t require the full power of a monitor. A monitor can guard the coherency of a set of discontinuous resources. I needed this capability in the Hitachi 747. Result data and instrument status information nearly continuously streamed into the controller via GPIB. In my controller program DMA moved this data into main memory, leaving most of the CPU bandwidth for other tasks. The analyzer ran on a rigid schedule and would not allow its GPIB output to be paused. Consequently, the controller CPU could never be given exclusive access to the buffer. Further, the data contained several discontinuous coherency sets. For example, the instrument status block contained the count of test result blocks located in another area. If the count from one communication cycle were used to guide reading the results from a previous cycle with a different count (usually the case) the data could easily be random. To avoid this, I implemented a monitor-like access mechanism. To read any coherency sensitive group, the application calls a function, passing a list of source (GPIB buffer) and destination (elsewhere in main memory) addresses. This function monitors the DMA DAR (destination address register) to move the data as a coherent set, either following behind the DAR or temporarily configuring DMA to prematurely but briefly pause before overrunning portions of the set not yet transferred. I implemented this in X86 assembly language. See BM/Hitachi 747: Communication and Coupling and my “Dr. Dobb’s Journal” article Software Partitioning for Multitasking Communication.
A story, perhaps true, tells that when Microsoft first introduced threads in Windows, a version of Office was written with more than 100 simultaneous threads. The story doesn’t tell what they expected to accomplish but presumably they thought the program would magically run faster. That is not why threads were invented and not what they will do unless the CPU has many cores, the OS supports symmetric multiprocessing (SMP) of threads, and the threads have useful work to do simultaneously. In many cases, DMA, DSP instructions or coprocessors, or vectorized instructions yield much greater speed improvements.
The primary purpose of threads is to make a program more responsive or robust. For example, if the user interface never directly calls a blocking process then the user can always intervene to abort a stall. If the program has several independent lightweight jobs to do, threads provide an easy way to make these apparently simultaneous. Threads are not suited to hard real-time processing alone but can be useful in combination with interrupts; an ISR performs only very timely work and defers the rest to a support thread.
I first used threads in my Hitachi 747 program. Hitachi had not asked for a multi-tasking program. They had experimented with Unix and knew that its application task switching overhead would overwhelm a computer used for controlling a machine and early versions of Windows appeared to be similar. I wanted to use DOS anyway. I intended to deliver high performance with device drivers, for which Unix and Windows at the time required rebooting for every code change. As I explain in my Doctor Dobb’s Journal article, I invented lightweight threads for DOS. For fast but flexible tasking I paired the non-real-time threads with real-time ISRs written in assembly language. In most programs that pair ISRs with threads, strategic decisions are reactive, carried out by the thread, which is activated by the ISR. I implemented anticipatory strategy by designing table-driven ISRs. By changing table entries a thread can tell its ISR partner what it would do if it were invoked under various circumstances.
Most of my programs interact with external systems beyond their direct control, potentially creating blocking conditions. For these, I always create at least two threads with one guaranteed to never block and always respond to a request to abort. Although this is usually the user interface thread, it isn’t always. For example, in my IDT controller, which has no user interface, it is the USB communication thread. In this case, a very cheap non-preemptive round-robin task loop “schedules” the threads and can’t guarantee that the USB thread will be periodically invoked. However, USB input from the host triggers an ISR, which checks the health of the USB thread and clears blocks if necessary.
In my Abbott Instrument Development communication system, when an application invokes RxClient::getMsg it can request blocking or immediate return if there are no messages for it. A blocking call should be made from a thread. The block may end on a new message, communication failure or user request (from the unblocked UI thread) to abort. The library contains all of the communication functions but the application has to create certain structures in its own process space. Classes defined by my library simplify application programming, especially if a message receiving thread is used. To create a communication thread, a function is defined to do the application-specific message processing. A small portion of this is boilerplate code required for communication. The rxThreadProc function is an example. The application creates new RxClient and ComThreadApi objects. It then invokes ComThreadApi::start, passing the message processing function, a communication driver reference, and an Event that the application can set at any time to terminate the thread. The start method sets up a mailbox for asynchronous communication between the thread and the rest of the application and starts the application function as a thread. The library then handles all communication. The application only needs to call the RxClient getMsg method to get its messages.