CDX SYSTEM AND HARDWARE DESIGN
Synopsis: concatenated cdxsys and related reports.
REPORTS
CDNEXT-APU/FPM STEPPER MOTOR CONTROLLER
9/19/99
[CD3200-Class Instrument I/O]Synopsis: design description of proposed stepper motor control system hardware and software.
MOTOR.DOC (Word 97 @ k:\cdx\doc\analyzer)
David McCracken
REQUIREMENTS
ARCHITECTURE
The scan chain is implemented as a buffered array of 56 bits embedded in the FPGA. Buffering allows the scan chain to be written at any time without disturbing the pattern currently being serially shifted out. The input registers of the scan chain are mapped into the microprocessor's memory space.
Each motor is controlled by four bits of the scan chain. Two bits select power and two select a motor winding phase pattern (a two-phase motor has four phase patterns). When the power selected for a motor changes, the microprocessor writes the power selection directly into the scan chain in the FPGA. The FPGA continually writes the phase pattern bits into the scan chain as it interprets the microcoded program prepared for it by the microprocessor. Under normal operation, the microprocessor never directly writes to the phase pattern bits in the scan chain.
The scan chain shift clock operates at 2MHz. Thus, all 56 bits are shifted out to the motor drivers in 28 usec. The scan chain requires an additional 4 usec. for housekeeping, resulting in 32 usec. for one complete scan image update to take place. After the image is serially scanned out, all bits are simultaneously broadside loaded into the actual outputs.
While one scan chain image is being shifted out, the FPGA creates the next image. Each image represents the scan chain state for one 32 usec. period. The microprocessor finishes preparing a microcode program comprising 256 periods (8.192 msec. total) before the FPGA is allowed to execute it. To enable the FPGA to continually update the scan chain, a pair of ping-pong memory buffers are used. While the FPGA executes the program in one buffer, the microprocessor writes the program for the next 8 msec. into the other.
The microcode program is very low level, comprising time-stamped motor control patterns which effect stepping. Thus, the microprocessor must be continually engaged in controlling the motors. The "motor control" portion of FPGA serves only as a time shifter and to synchronize to the scan chain. Only the power bits are of a "set and forget" nature.
DESIGN
HARDWARE
MOTOR DRIVERS
Each motor driver comprises a relatively sophisticated circuit, such as two (one for each winding) Ericsson (or Rifa, SGS-Thompson, and others) PBL3717, which translates the two power bits into actual power levels and the two phase bits into the four phase patterns.
The power bits are called I0 and I1. They select the power level as follows:
The phase control bits are called PH0 and PH1. Each controls the direction of current flow in one of the motor's windings. The output drivers are full bridges, whose motor terminals are arbitrarily called A and B. When the phase bit is 0, positive current flows from B to A; when 1, from A to B. Motors are connected to the drive terminals such that the controls operate as follows (for full-step, two phase on motor drive):
The drivers are connected to the scan chain in nibbles (groups of four) as follows:
Each motor's control nibble is located in the scan chain at the motor number times four. For example, the control bits for motor 9 are: 36 = PH0, 37 = PH1, 38 = I0, 39 = I1.
MICROPROCESSOR-FPGA INTERFACE
All 24 address bits and 16 data bits of the microprocessor are connected to the FPGA. The FPGA is selected by the microprocessor's CS3, which may be mapped to any address but which has been mapped to 0x00500000 through 0x005FFFFF (1 Mbyte).
Some of the flip-flops in the FPGA are assembled into control and data registers that the microprocessor can directly read and write. The motor scan chain input register is one of these. The FPGA's motor control logic also writes into the motor scan chain, suggesting that the input register would have to be dual-port. However, only the microprocessor writes to the motor power control bits and only the FPGA motor logic writes directly to the phase control bits in normal operation. The power control bits clearly do not have to be dual-port. However, under test situations, being able to directly write to the phase control bits without FPGA intervention could be useful.
The microcode program prepared by the microprocessor for the FPGA motor control logic to interpret requires at least 2 Kbytes of RAM (preferably 4 Kbytes). This is too much memory to be implemented by FPGA internal resources. External memory must be used. Two architectures are feasible, DMA access to main memory and a dedicated RAM that the microprocessor accesses through the FPGA. The DMA approach uses considerably fewer FPGA resources, as the DMA request/grant facility of the microprocessor provides more than half of the multiplexing circuitry. The dedicated RAM not only consumes at least 21 FPGA pins (8 data, 11 address, RD, WR, CS) but also the internal resources to support all of the multiplexed access means. The DMA approach steals time from the microprocessor, which the dedicated RAM approach does not do. However, the overall effect of this should be nearly inconsequential, due to the microprocessor's instruction pipelining and the fact that there are no other bus masters. From a software viewpoint, the two approaches are essentially identical, with only minor addressing differences.
FIRMWARE
As already explained, two microcode programs exist simultaneously, one being written by the microprocessor and one being executed by the FPGA. The program describes 256 32 usec. periods in one 8.192 msec. period. It is divided into two parts, a fixed 256-byte event count array and a variable event array. Each element of the event count array tells how many step events occur in the corresponding time slice. All of the events that occur in one time slice will occur simultaneously one (possibly two) scan chain cycle (32 usecs) after the event count is interpreted. After processing the 256 event count elements of one program, the interpreter immediately continues at the beginning of the other program.
An event count value of 0 indicates that there is no step activity in the 32 usec. time slice. The interpreter performs a read-modify-write on each event count byte, latching its current value and replacing it with 0. The purpose of this is to clear the program memory for reuse. The microprocessor could do this but it can be done more efficiently by the FPGA. Thus, the microprocessor only needs to write active event count elements, i.e. time slices in which one or more motors experience a step (phase change). The read-modify-write cycle should be reduced to just read when the current value is already 0 if this can reduce DMA hold time. However, it may actually be faster to always write the 0 without considering the current value.
When the event count is 0, it is not necessary to clock out the scan chain, unless the microprocessor has written to any of the power control bits. Not clocking the scan chain reduces power and electrical noise, but the logic to determine whether to do it may be excessive for the rather minor benefit. The motor drive system itself is unaffected by redundant updating.
When the event count is not 0, the microcode program interpreter (motor control state machine) reads the event or events from the event array. Each element (byte) of the event array tells the logic value of one of the phase control bits. Bits 0 through 5 select a scan chain bit address, which will always be one of the phase control bits, e.g. 0, 1, 4, 5, 8, 9, etc. Bit 6 is unused. Bit 7 tells the value for the selected bit. Phase control patterns require only single-bit changes to effect a step.
Like the event count array, the event array is ordered by time, but it does not exhibit the same strict association of time slice to array element. The interpreter maintains two pointers, one that iterates through the event count array at a constant 31.25 kHz (1/32usec.) rate and one that steps through the event array only as directed by the event counts. For example, assume that the beginning of the event count array is:
0 |
0 |
1 |
0 |
3 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
1 |
0 |
Call the event count pointer (implemented in the FPGA state machine) ecp and the event pointer ep. The event array is located immediately after the event count array so this means that ep equals ecp plus 256. Assume that the event count array begins at 0x100 and the event array at 0x200. Further assume that both ecp and ep point to the next element and are incremented after use (post-increment). At each 32 usec. period the motor control state machine operates as follows:
The main job of the state machine, done once in each 32 usec. period, is straight-forward:
In many periods, the FPGA needs to access the microcode memory only one time, to read a 0 (although the read-modify-write could be considered a 1.5 access). If one motor steps in a period, two accesses are needed, one to the event count and one to the event. In the worst case, all 14 motors experience a step in the same period, for which the FPGA must access the memory 15 times. Even this worst case should be easily achieved, especially given the static RAM used in the FPM unit. If, for some reason, the FPGA is unable to interpret all of the indicated events before the next scan chain update, it should interrupt the microprocessor, presenting an error code in a status register that the microprocessor can read.
The two microcode programs will both be located on page boundaries. When ecp reaches 0xxxFF, the current program is exhausted and the next one should be started. If the next one is the later one in memory, ecp could simply keep incrementing. However, if it is the lower one, then one or more of ecp's bits higher than 7 must be changed. In all cases, ep must be changed to point to the other program's event array. At the program change, the FPGA should interrupt the microprocessor and present an appropriate (i.e. different from any error codes) value in the status register. This will alert the microprocessor to the availability of the memory occupied by the exhausted program. It could be helpful if the status code would indicate which memory buffer, lower or upper were being made available, but the microprocessor can also keep track of this itself.
Although it would be possible to map the two microcode program buffers into fixed memory locations in main memory, it would be better to map them to addresses defined by software. Since the event count array is a fixed 256 bytes and the event memory immediately follows this, selecting an event count array address automatically defines the event address. If the buffers were a fixed size, such as 2K (2048 total with 256 bytes of event count and 1792 bytes of event) then selecting the lower buffer's address automatically defines the upper buffer's location. Thus, it is possible for one 24-bit address to define all four addresses. However, it may not be desirable to do this. For one thing, it would be better to allow the size of the event buffer to be controlled by software. The easiest way to do this would be to have separate addresses. The motor control state machine does not have to guard against reading past the end of the event array, as the software will do this. Therefore, the FPGA doesn't need to be informed of the size of event memory. However, the size of the low buffer event array determines the address of the upper buffer. If the size varies, then software needs to tell the FPGA the address to use for the upper buffer. Another reason for possibly having the microprocessor write a full address for each of the four arrays into the FPGA is that this may simplify the logic of the state machine for buffer swapping. Basically, at this point the ecp and ep pointers are loaded with the event count and event array addresses for one or the other buffer. The reload values are taken either directly from "reload" registers written by the microprocessor at boot time or from the one base address (register written by the microprocessor) plus hard-wired offsets. The former not only affords greater flexibility but also may be simpler. To summarize, the FPGA must provide at least one 24-bit register into which the microprocessor writes the starting address of the lower microprogram buffer. Preferably, two such registers will be provided. The microprocessor will write the address of the lower buffer into one and the address of the upper buffer into the other. The FPGA motor controller state machine will toggle ecp between these two addresses and ep between these addresses plus 256. If it simplifies firmware, the FPGA may provide another two registers into which the microprocessor will write the two ep addresses. The microprocessor writes these registers only when it initializes the system. The buffer addresses do not change during operation.
SOFTWARE
Software is the program executed by the microprocessor. Its job is to translate motor move commands and ramp definitions into the step event microcode. As long as a motor is active, the program must continually perform this translation, writing a new microcode program (one program for all active motors) every 8 msec. As previously explained, the FPGA motor controller only serves to time-synchronize the microprogram.
At each interrupt from the FPGA, the program reads the FPGA status register to determine whether the interrupt has been issued in response to a motor state machine error (or unrelated event) or because the current microprogram has been exhausted and its memory buffer can now be reused. If the latter then the microprocessor performs the core motor control task.
CORE TASK
The core task of the program is to determine the position (in time and, therefore, in the event count array) of every step of each motor in the 8 msec. period encompassed by the microprogram under construction. The simplest way to do this would be to iterate through the 256 elements of the event count array, examining each motor's condition and requirements to determine whether it should step at that 32 usec. time slice. If so, then that event count would be incremented and the appropriate event written to the current event address. The program would handle the event count and event pointers similarly to the program interpreter, except that the data at the pointers would be written instead of being read. In one time slice, the event count and the event pointer are both incremented for each motor that steps at that time.
The simple approach is inefficient because it requires the next step time of each motor to be repeatedly computed as the algorithm visits each time slice and it requires that every time slice be visited. Alternatively, after computing the next step time of all motors, the program could immediately jump to the event count element of the earliest step, skipping all intervening do nothing time slices (these have already been reset by the FPGA's read-modify-write). This approach requires a more complicated program but can significantly improve the execution time for average cases. For the worst case of every motor stepping at every time slice, the greater complexity yields somewhat slower execution. It should be noted that this worst case would require an event array of 3,584 bytes (256 * 14) which is twice as much as the suggested 1792 bytes (allocating 2K to each complete microprogram).
STEPS
The fundamental measure of a step is its duration in terms of the number of 32 usec. periods. For example, for a motor to operate at 250 steps per second, each step would have a duration of 125 counts. The average 8.192 msec. microprogram would contain slightly more than two steps for this motor. The number of residual counts for the last step of one microprogram is carried over into the computation of the first step for that motor in the next microprogram, thereby providing continuous stepping for a virtually unlimited amount of time even though each microprogram encompasses only 8 msec.
Steppers are not normally operated at just a constant rate but with a ramp up to that rate (which is called the slew rate if it is the maximum speed that the motor can attain in the mechanical system) and a ramp down to the final position. An up ramp is simply a list of decreasing step durations while a down ramp is a list of increasing step durations. All motors are controlled by ramp tables that define at least the constant ramp duration and optionally up and down ramps. Each up/down ramp entry tells the duration of one step, while the single constant speed entry tells the duration of every step in the constant speed segment of the trajectory. At each step, the position record of the motor (after it makes that step) is incremented for CW rotation and decremented for CCW rotation.
RAMP TRUNCATION
If a motor move command specifies a number of steps that is less than the total number of steps in the up and down ramps in the ramp table selected for the motor, the command interpreter will create a temporary ramp table that symmetrically removes the top (shortest steps) of both the up and down ramps so that the total number of steps equals the requested move. This is assigned to the motor for the duration of its move and then discarded.
CDNEXT: CD3200-CLASS INSTRUMENT I/O
9/20/99
[Stepper Motor Controller] [Excerpts From Ad46]CDX32IO.DOC (Word 97 @ k:\cdx\doc\analyzer)
Synopsis: Proposed partitioning of CD3200 I/O into a CD3200-class instrument based on APU and FPM units.
CONCEPT
The APU replaces CPUDCM, SPM, and MAM boards. Generally, any I/O related to these boards or to flow scripts will be placed on the APU I/O scan chain. The FPM replaces MPM, TWR, LDR, and FCM boards. I/O related to these boards or to TWR and LDR macros will be placed either on the FPM I/O scan chain or will be controlled by specialized hardware/firmware on the FPM board. All motors controlled by the MPM, TWR, and LDR will be placed on the FPM's motor scan chain.
The APU will execute the old flow scripts, converted to the new script language, while the FPM will execute converted TWR and LDR macros. FPM script command interpreters will natively access I/O and steppers on behalf of local scripts and provide remote access to these resources on behalf of the APU.
The fluid checks script will be moved from the APU to the FPM to enable the strobed fluid sensors to be accessed by a local script. In the event of fluid error, the FPM fluid check script will send a fault message to the APU in the same way that the TWR and LDR units send fault messages to the CPUDCM.
Remote I/O will be apportioned to individual boards to first minimize local wiring and secondarily to try to avoid having to connect individual boards to both the APU and the FPM I/O scan chains.
SPECIFIC DEVICES
"PB" = Peripheral Bus
SENSORS
FPM SENSORS
Name |
0/1 Sense |
CD3200 location |
New Location |
LOADER GROUP |
Loader SHM |
||
MixVert |
Vertical/Tilted |
J10 pin 3 |
|
LiftUp |
Down/Up |
J10 pin 6 |
|
MixTop |
J10 pin 9 |
||
LdEmpty |
NotEmpty/Empty |
J9 pin 3 |
|
UnldFull |
NotFull/Full |
J9 pin 6 |
|
UnldWarn |
NotWarn/Warn |
J9 pin 9 |
|
TubePos3 |
NoTube/TUBE |
J11 pin 3 |
|
TubePos4 |
NoTube/TUBE |
J11 pin 8 |
|
TOWER GROUP |
Tower SHM |
||
ProbeHome |
NotHome/index |
J10 pin 3 |
|
LowerTubeSensor |
Unblock/Block |
J10 pin 6 |
|
UpperTubeSensor |
Unblock/Block |
J10 pin 9 |
|
SampCover |
Closed/Opened |
J9 pin 3 |
|
DoorClosed |
Closed/Opened |
J9 pin 6 |
|
MAIN GROUP |
|||
CsDoor |
Open/Closed |
||
RbcDilPress |
Over/Ok |
||
Va1WetSens |
Wet/NotWet |
||
Va2WetSens |
Wet/NotWet |
||
STROBED GROUP |
|||
SensWbcLyse |
NotEmpty/Empty |
||
SensHgbLyse |
NotEmpty/Empty |
||
SensDilSheath |
NotEmpty/Empty |
||
SensWasteLevel |
Full/NotFull |
||
SensDilLevel |
Full/NotFull |
||
SensSheathLevel |
Full/NotFull |
The fluid strobe may be assigned to a scan chain bit or addressed uniquely.
APU SENSORS
Name |
0/1 Sense |
CD3200 Location |
New Location |
PaddleSwitch |
Pressed/NotPressed |
||
SvInput |
NoFluid/Fluid |
Ultrasonic (old Sensor1) |
|
SvOutput |
Blood/Air |
Optical (old Sensor2) |
|
Shear Valve position |
[0] = SvSensFail (failed sensors) |
PB sensors 1,0 |
|
[1] = SvCcw (full CCW position) |
|||
[2] = SvCw (full CW position) |
|||
[3] = SvUnknown |
ACTUATORS
FPM ACTUATORS
Name |
Description or 0/1 states |
CD3200 Location |
New Location |
TOWER GROUP |
TWR SHM |
||
TubeBlockLatch |
|||
DoorRelease |
|||
LOADER GROUP |
LDR SHM |
||
RackAdvance |
|||
LoadArm |
|||
TubeLifter |
|||
TubeGripper |
|||
UnloadArm |
APU ACTUATORS
Name |
Description or 0/1 states |
CD3200 Location |
New Location |
solenoid power control |
SDM1:0-7 |
||
ShvDilFlush |
(NO) Shear Valve Diluent Flush |
SDM1:SOL11 |
|
SmplAspirationVac |
(NO) Sample Aspiration Vacuum |
SDM1:SOL12 |
|
ShvDrain |
(NO) Shear Valve Drain |
SDM1:SOL13 |
|
OpenProbeDrain |
(NO) Open Sample Probe Drain |
SDM1:SOL14 |
|
OpenProbeDilFlush |
(NC) Open Sample Probe Diluent Flush |
SDM1:SOL15 |
|
WC4Vac |
(NO) Waste Chamber 4 Vacuum Supply |
SDM1:SOL17 |
|
WC4Press |
(NO) Waste Chamber 4 Pressure Supply |
SDM1:SOL18 |
|
solenoid power control |
SDM2:16-23 |
||
RbcMixDrain |
(NO) RBC Mixing Chamber Drain |
SDM2:SOL21 |
|
SmplSyrSheath |
(NO) Sample Delivery Syringe Sheath |
SDM2:SOL22 |
|
MainWbcLyse |
(NC) MainWbc Lyse Supply |
SDM2:SOL23 |
|
HgbLyseSyrOut |
(NO) HGB Lyse Syringe Output |
SDM2:SOL27 |
|
WbcLyseSyrOut |
(NO) Wbc Lyse Syringe Output |
SDM2:SOL25 |
|
HgbDil |
(NO) Diluent Syringe HGB Diluent |
SDM2:SOL26 |
|
RbcDil |
(NO) Diluent Syringe RBC Diluent |
SDM2:SOL27 |
|
MainHgbLyse |
(NC) Main HGB Lyse Supply |
SDM2:SOL28 |
|
solenoid power control |
SDM3:32-39 |
||
WC2Vac |
(NO) Waste Chamber 2 Vacuum Supply |
SDM3:SOL31 |
|
WC2Press |
(NO) Waste Chamber 2 Pressure Supply |
SDM3:SOL32 |
|
RbcDilSyr |
(NO) RBC Diluent Syringe Supply |
SDM3:SOL33 |
|
CloseProbeDilFlush |
(NC) CS Probe Diluent Flush |
SDM3:SOL34 |
|
CloseVentTrapDrain |
(NO) CS Vent Trap Drain |
SDM3:SOL36 |
|
CloseProbeDrain |
(NO) CS Probe Drain |
SDM3:SOL37 |
|
CloseVentTrapVent |
(NO) CS Vent Trap Vent |
SDM3:SOL38 |
|
solenoid power control |
SDM4:48-55 |
||
NocStaging |
(NO) NOC Sample Staging |
SDM4:SOL41 |
|
solenoid power control |
SDM5:64-71 |
||
HgbFlowCellVent |
(NO) HGB Flow Cell Vent |
SDM5:SOL51 |
|
StagingPumpIn |
(NO) Staging Pump Inlet |
SDM5:SOL52 |
|
WbcCupVent |
(NO) WBC Cup Vent |
SDM5:SOL53 |
|
RbcPltStaging |
(NO) RBCPLT Sample Staging |
SDM5:SOL54 |
|
WbcStaging |
(NO) WBC Sample Staging |
SDM5:SOL55 |
|
OpFlowCellOut |
(NO) Optical Flow Cell Output |
SDM5:SOL56 |
|
OpFlowCellDrain |
(NO) Optical Flow Cell Drain |
SDM5:SOL57 |
|
solenoid power control |
SDM6:80-87 |
||
SheathResVac |
(NO) Sheath Reservoir Vacuum |
SDM6:SOL61 |
|
DilResVacuum |
(NO) Diluent Reservoir Vacuum |
SDM6:SOL62 |
|
DilResPress |
(NO) Diluent Reservoir Pressure |
SDM6:SOL63 |
|
SheathResPress |
(NO) Sheath Reservoir Pressure |
SDM6:SOL64 |
|
SheathResOut |
(NC) Sheath Reservoir Output |
SDM6:SOL65 |
|
DilResOut |
(NC) Diluent Reservoir Output |
SDM6:SOL66 |
|
WC1Press |
(NO) Waste Chamber 1 Pressure |
SDM6:SOL67 |
|
WC1Vent |
(NO) Waste Chamber 1 Vent |
SDM6:SOL68 |
|
solenoid power control |
SDM7:96-103 |
||
MainPresVent |
(NO) |
SDM7:SOL71 |
|
MainVacVent |
(NO) |
SDM7:SOL72 |
|
solenoid power control |
SDM9:128-135 |
||
OpFlowCellSheath |
(NO) Optical Flow Cell Sheath Flow |
SDM9:SOL91 |
|
WbcCupDrain |
(NO) WBC Cup Drain |
SDM9:SOL92 |
|
HgbFlowCellDrain |
(NO) HGB Flow Cell Drain |
SDM9:SOL93 |
|
HgbFlowCellDilFlush |
(NO) HGB Flow Cell Diluent Flush |
SDM9:SOL94 |
|
RbcMixDilFlush |
(NO) RBC Mixing Chamber Diluent Flush |
SDM9:SOL95 |
|
WbcMixDilFlush |
(NO) WBC Mixing Chamber Diluent Flush |
SDM9:SOL96 |
|
WC3Vac |
(NO) Waste Chamber 3 Vacuum |
SDM9:SOL97 |
|
WC3Press |
(NO) Waste Chamber 3 Pressure |
SDM9:SOL98 |
|
Green |
Off/On No power down |
Status/Alert PCB |
|
Yellow |
Off/On No power down |
Status/Alert PCB |
|
Red |
Off/On No power down |
Status/Alert PCB |
|
Beep |
Off/On No power down |
Status/Alert PCB |
|
Beep volume |
DAC 1 |
I/O Scan (6 bits) |
|
Pressure |
Off/On No power down |
||
Vacuum |
Off/On No power down |
||
Yvalve |
Bidirection Motor |
||
Off/On |
PB 62 |
||
CCW/CW |
PB 63 |
||
ShearValve |
TriMotor |
||
0/1 = Off/On |
BP 177 |
||
0/1 = CCW/CW |
BP 176 |
||
See Shear Valve position feedback sensors at BP 0 and 1 (byte 0 bits 0,1) |
MULTISTATE DEVICES
There are three multistate devices and all of these are instances of the same type of device. These are pressure/vacuum regulators, which can be Opened, Closed, or Servoed. Their names are SecPress2, SecPress3, and SecVac2. These will be implemented by dedicated hardware/firmware on the FPM board. Preferably, the control signals will be mapped into the FPM's I/O scan chain but unique addressing is acceptable.
CHANNEL DEVICES
Located directly on FPM:
Dac CHANNELS = 1-32 RANGE = 0-4095
Adc CHANNELS = 0-99 RANGE = 0-4095
AdcBipolar CHANNELS = 0-99 RANGE = 0-4095
IoBus CHANNELS = 0-255 RANGE = 0-255
Gain, prescale, and threshold are addressed as unique devices on the APU.
STEPPER MOTORS
Name |
CD3200 ID |
CD3200 Location |
CDX32 Location |
|
MAIN GROUP |
||||
SampInjMtr |
A |
|||
HgbLyseMtr |
B |
|||
WbcLyseMtr |
C |
|||
RbcDilMtr |
D |
|||
OpenProbeMtr |
E |
|||
F |
F |
|||
FillLineMtr |
G |
|||
H |
H |
|||
I |
I |
|||
J |
J |
|||
K |
K |
|||
PeriPump |
L |
|||
TOWER GROUP |
||||
Probe |
||||
Spinner |
DC motor in CD3200 |
|||
LOADER GROUP |
||||
Mixer |
UNDETERMINED
DATALATCH = PB 72-79 (byte 30)
PERI_VACACC_WET = PB 88-95 (byte 32)
PERI_VPMMUX = PB 104-111 (byte 34)
PERI_VPM_CNTRL = PB 112-119 (byte 35)
UNIQUE DEVICE REQUIREMENTS
STROBED FLUID SENSORS
The strobed fluid sensors are capacitively coupled and require both a precharge and charge time. They are derived from the CD3000 FCM circuit, in which a signal triggers a one-shot that generates the 10 usec. charge time, during which each sensor charges a hold capacitor through a coupling capacitor. The hold voltages are compared to fixed thresholds. The comparator outputs are latched on the falling edge of the one-shot output, which also starts the precharge to prepare for the next measurement.
To reduce electrolysis of the sensors, they should be strobe as little as possible. The best way to do this is to not automatically read them but only in response to a read request from software. In the CD3200, the analyzer program decides when to strobe and subsequently read the sensors. In the APU system, the checks script being continuously executed by the FPM CPU will do it.
Software would be simplified if the strobe control bit would reset itself after being written. Then the script would only have to write once to the bit and later read the latched data. If this is inconvenient for hardware then the script can set the control and clear it, but the the control will have to be edge-sensitive, because the time between executing the first and second writes is indeterminate.
It would be convenient for software if the strobe and readback data were mapped to the scan chain. However, it is important that the data presented to the scan chain be latched, because there may be a substantial delay between the time that the strobe is written and the data is read. The raw output of the comparators changes too quickly after the end of the capture strobe to have any validity when the data is finally read.
HARDWARE-ASSISTED BUBBLE/EDGE DETECTION
DESIGN MEETING AD16
HARDWARE ASSISTANCE FOR OBSERVE
To eliminate the CPU stalling effect of OBSERVE, we would like to replace all or part of the function with hardware. If this hardware can be made general-purpose, then we might also want to use it to improve the resolution of WATCH. Jack stated that OBSERVE and WATCH do not occur simultaneously but a review of the flow sequences shows that this is not correct. If WATCH is to take advantage of the same hardware assist as OBSERVE, the hardware must either be duplicated or designed in such a way that both command functions can use the same raw data. It is not clear why the program doesn’t already allow the WATCH and OBSERVE processes to share a common raw data source.
Dave R suggested that some kind of filtering of the raw data be used in any function to avoid false interpretation. John mentioned that there have been reports of instruments not detecting bubbles that are large enough to seriously degrade results.
Both David McCracken and Jack proposed hardware assistance based on the fact that the scan chain updates the I/O status every 132 usec. Jack suggested a circuit (programmed into the FPGA) with an XOR gate fed by the current value of the sensor and its previous value in order to detect any change (output = 1). David suggested a capture function, which would be essentially a one bit logic analyzer. To capture a trace of the activity on one input bit, the scan chain address of that bit would be passed to the capture hardware, which would copy the bit from that address in the scan chain sequentially into a bit array at the end of each scan (or some multiple of scans). A software function could subsequently examine the trace to determine the condition of the sensor using DSP (Digital Signal Processing) techniques to provide general and/or application-specific filtering.
Dave R said that the FPGA could provide one or two 1K bit arrays for the trace. Jack had explained that the OBSERVE can operate for as long as seven seconds, which would cause the trace array to overflow if the sensor were sampled at 132 usec. Dave said that this could be avoided by allowing the sample rate to be set to a multiple of the scan rate. David suggested that the high sampling rate could be used if the OBSERVE function were to periodically copy the bit array and reset the trace counter (or allow it to wrap). Jack suggested that the OBSERVE function process each captured period, determine the sensor condition for that period and then discard the raw trace.
At the 132 usec sampling rate, 1K (1024) bits would be collected in 135 msec, which is considerably longer than the worst case flow sequence instruction dispatch delay. Therefore, the hardware doesn’t need to provide selectable sampling rates. It would be helpful for software to be able to reset (or write) and read the trace counter as the simplest means of determining the beginning and end of each trace record.
DESIGN MEETING AD17
LOW LEVEL BLOOD SENSOR MONITORING
Two general means exist for the low level monitoring: hardware assisted and pure software. We have considered two specific methods within each of these. In the hardware assisted means, we can either trace a sensor's state by recording its condition in a bit array (David McCracken's suggestion) or we can set or clear a single flag to indicate that the sensor changed state within some time period (Jack's XOR suggestion). In the pure software means, the CPU can be almost completely dedicated to the task (as OBSERVE does now) or sampling can occur at the normal flow sequence dispatch rate (approximately what WATCH tries to do now).
We cannot detect many small bubbles (smaller than 20 uL) entrained in the sample. Since results have been reasonable, we have to assume that this is an acceptable deficiency. To get the most detailed information that the ultrasonic sensor can produce, we would want to sample it every .2 sec. We have no information on the optical sensor's bubble response and will assume that sampling it at this same rate is sufficient.
From Jack's observations, sampling at the dispatch rate will only show 33% of the bubble events detected by the sensor. This may actually be acceptable and the improved flow sequence task dispatcher should significantly improve the sample rate. However, the dedicated CPU would see nearly all of the events. The existing system seems to be able to tolerate dedicating the CPU during sampling, but it is unlikely that the new FPM will be able to do this.
Combining the WATCH and OBSERVE functions makes it possible to have an effective dedicated CPU monitor, but the dispatched monitor affords greater flexibility by allowing other flow scripts to execute more quickly (at the expense of the monitor rate). It is not necessary to design either a dedicated or a dispatched monitor, because, in fact, they are nearly identical, the only difference being the duration of sampling without returning to the dispatcher. This time period can be a command parameter, with 0 indicating just one sample. This time would not necessarily coincide with the blood edge detector filter (currently fixed 40 msec) which should also be a command parameter.
The FPM needs hardware assistance to do the WatchBlood function. The intent of the trace method is to allow a DSP-like filtering of the bubble events, assuming that a single event does not necessarily indicate a sampling failure. However, given that we already are undersampling the bubbles, we probably don't need to be so discriminating. Also, we now know that we need to sample two sensors, which would consume a considerable amount of FPGA resources. Therefore, we will use the change detector hardware. If any change occurs on a monitored sensor during the time that it is supposed to see only blood, the WatchBlood function will consider the sample suspect. Given that this may be too severe, we will have to change the fact that the data station now considers short sample and bubble to be identical problems.
CONCLUSIONS
7. The FPM (and APU ?) hardware will provide two identical bubble/edge event detectors. The interface to this circuit comprises a change flag (1 bit RO), an I/O address to be monitored (1 byte RO), and a control means. There are two commands, monitor given address and stop monitoring. The change flag is automatically cleared on the first command. The CPU decides when to start and stop monitoring.
IMPLEMENTATION REPORT 18
WATCH CALLBACK
In the CDM, if a blocking watch is active, the bubble detector can piggyback onto the continuous sampling time to get very high resolution at practically no additional cost (which is good considering that stable state detection consumes nearly 100% of foreground bandwidth). This is not true in the APU. The hardware assist only needs to record any single transition to conclude that the signal is not stable. But for bubble detection it needs to record the number of 0 samples and the number of 1 samples. These are two different functions. In AD16 – Observe and Watch Commands – Hardware Assist for Observe, I suggested a hardware trace function to reveal the pattern of bubbles while Jack advocated a change detector (based on XOR between each sample and the next). The single change function serves the stable state filter but not the bubble detector. However, the trace consumes significant hardware and software (to analyze the trace) to generate a picture that is more detailed than necessary. It would be much better for the hardware to count 1s and 0s on the two monitored sensors. If this is possible then it is all that is needed for both bubble and stable state detection.
HARDWARE-BASED DECIMATION
IMPLEMENTATION REPORT 27
GATHER IMPLEMENTATION REVIEW
SINGLE-PASS AND HIGH THROUGHPUT ANALYZERS
Dave R has proposed decimating in the ADC-DMA subsystem. Combined with the APU3’s collecting of only selected channels, this would entirely eliminate the software parsing function, leaving dmaParse only the job of transferring data from the DMA buffer to the communication system. This could also reduce coincidence if the hardware can convert the decimation determining channel first and not convert the others if the cell is to be discarded. Dave even suggested having hardware insert the raw data directly into communication buffers, but this would seem to be considerably more complicated than a data comparator. Never the less, it may be needed in order to achieve the desired performance level.
IMPLEMENTATION REPORT 31
HARDWARE-BASED DECIMATION
OVERVIEW
The gather process collects list data by transferring ADC output data to a memory buffer, called the dma buffer, via DMA and subsequently parsing this data to create message packets that are sent to the data station. Parsing is done by a periodic software function. Parsing optionally includes decimation, a process by which the amount of list data of a predominant fraction of a sample is reduced without reducing other (usually one) fractions. For example, the amount of RBC list data might be reduced five-fold, without affecting the PLT list data count, by discarding the data associated with four of every five apparent RBC cells. Decimated cells are identified by the value of the data in one channel, which is normally a measure of the light refracted by the cell at a particular angle.
Decimating in the parse function is fairly expensive. First, it means that time is wasted converting (by ADC) and storing (by DMA) data that will ultimately be discarded, thereby increasing coincidence and potentially reducing the maximum gather rate. It also means that the dma buffer memory is used inefficiently to store data that will be discarded. Finally, it imposes a heavy processing burden on the parsing function, which may not be able to keep pace with raw data collection.
REQUIREMENTS
Decimation requires the following information:
HARDWARE SOLUTION
It would be feasible to replace the decimation process with equivalent hardware located between the ADC output and the DMA controller. The control parameters listed in the preceding section could be implemented in registers programmed by the gather command interpreter, procGather. The two maximum count values can be loaded into registers that are counted down toward 0. However, the programmed decimation factor must be retained for reloading, so an additional 8-bit (possibly 6-bit) counter would be needed for decimation cycle counting.
The window comparator is trivial. Using its result, however, may entail some complexity. One potential source of complexity is the need to base the decimation decision on a channel that can be selected at run time. Converting this channel before any other affords the greatest efficiency but impinges on the channel selection mechanism. This approach also introduces the problem of where to store the data from this channel relative to the other channels. Typically, the channel used for the decimation decision is also selected for storage (although this should not be assumed). If, by coincidence, this is the first channel normally stored then the value could be stored in the dma buffer immediately after being used by the comparator. Then the remaining selected channels could be converted and stored. However, if it is not normally the first channel stored, it would be better to retain the value and store it in its normal position relative to the other channels. If this behavior is too difficult to implement efficiently in hardware, then the data station will have to be programmed to unscramble the channels, possibly by literally rearranging the list data but, more likely, by assuming a different channel ordering for some types of cells. If the out-of-order storage approach is taken, it should be used for the undecimated as well as the decimated fraction so that the two fractions can be manipulated together without having to normalize one of them.
With software-based decimation the data parsing function counts both fractions and terminates the gather if both reach their specified maximum count. When not decimating, the parse function relies on the DMA TC ISR to stop collecting and to mark the end of the raw data for the current gather (the ISR calls setDmaEnd to do this). Much of the benefit of hardware-based decimation would be lost if the parse function were required to count the two fractions. The program would be most consistent if the DMA TC ISR could take responsibility for ending the collection phase in all cases. The ISR executes only when the DMA reaches a predetermined count (BTC counts down to 0). Without decimation, this is simply the maximum specified count, possibly with intermediate steps to avoid collisions. With decimation in software, the DMA is programmed to count more than both fraction maxima combined in order to guarantee a sufficient supply of raw data for both fractions. Hardware-based decimation affords a means to stop collecting immediately when both fractions reach their maximum counts. Ideally, when hardware detects that both fractions have reached their maximum counts (actually counted down to 0) it would generate an interrupt that would provide an opportunity for an ISR to terminate the collection phase. For greatest program efficiency, this IRQ would not be the same as the DMA TC, whose handler must sort out the several possible causes for being invoked. However, it would also be feasible to share the DMA IRQ, with the decimation hardware providing an end count flag that the ISR could read to determine whether this was the source of the interrupt.
The ideal decimation behavior can be described as follows:
Sept. 30, 1999
[CD3200-Class Instrument I/O] [Report1]k:\cdx\doc\analyzer\ad46.doc
SYSTEM DESIGN REVIEW MEETING 46
HARDWARE PARTITIONING
The interim system architecture erroneously assumed that the FPM would be responsible for nearly all control functions, but to allow CD3200 flow scripts to be reused, the APU should control everything that had been accessed through the Peripheral Bus. These items should be located on the APU's I/O scan chain.
The interim APU-FPM partitioned the systems into the following boards:
Most of the boards do not fully consume the available spaces (in a CD3200 box). The available spaces are:
VPM
When we examined the interim system, several changes were obvious. One is that the vacuum/pressure circuitry could and should be reduced to the minimum needed. It will be controlled through the APU scan chain and, therefore, doesn't belong on the FPM board. The transducers complicate board swapping, so, for maintenance concerns, the circuitry should be minimized. The vacuum/pressure circuit should be located on the Pneumatic Unit but no other circuitry should be there. Therefore, the new design will have a VPM board, which will only control the vacuum and pressure. It will be located on the Pneumatic Unit and will be controlled through the APU I/O scan chain.
Responsibility for the strobed fluid sensors as well as vacuum/pressure control was taken away from the FPM and given back to the APU (via the APU scan chain). Thus, the name FPM (Flow Process Module) no longer has any meaning. The unit now strictly replaces the MPM (Motor Process Module) and SHM (Sample Handler Modules-- Tower and Loader). Therefore, it has been renamed MSM.
MSM
The core of the MSM unit comprises the 68340, FPGA, APU interface (connector and drivers), motor scan chain interface (now augmented with an ADC), and I/O scan chain interface (if tower and loader remain separate boards). This consumes approximately 36 sq. in. In the interim system, the MSM (FPM) controls the tower and loader boards through scan chains. Therefore, it could be located nearly anywhere in the box. The new smaller MSM, however, could be located closer to the tower and loader that it controls. We physically examined the system to determine whether it would be feasible and desirable to move the MSM to the plate where the Left Panel Board is located and to connect directly to the tower and/or loader I/O devices, thereby eliminating the tower and loader boards. This could improve reliability by reducing components and connections, reduce exposure of circuitry to fluids, and improve our image with customers.
TOWER
The tower board clearly should be replaced. Locating the MSM at the bottom of the left panel (inside) plate with its tower connectors on the lower left and threading the tower wires through a hole near the bottom of the left front panel allows the wires to be only slightly longer than they are with a tower board. This MSM position is also the best location for the board to connect to the loader.
Disconnecting the tower I/O from the MSM located at the bottom of the inside left panel would not be physically difficult. There are several inches of clearance between this location and the Pneumatic Unit, which can slide out if this isn't enough. It would still be advisable to use connectors with ejector tabs to facilitate disconnect.
LOADER
Whether to replace the loader board with direct I/O from the MSM is not as clear as for the tower. The interim loader board has 30 input lines and 13 outputs. The 43 wires of a direct I/O approach compares favorable to the 40 wires used in the scan chain design (two 16-pin scan chains plus the 8-pin power cable). However, some of the inputs may be sensitive to EMI and the stepper driver is an EMI source. Another consideration is future flexibility. If direct I/O is used, every change, such as adding a sensor, potentially affects the wire count. If we choose direct I/O, thinking that it fits into a DB37 connector, for example, then the decision can be rendered obsolete by a relatively minor functional change.
The interim loader board's cabling is inadequate for maintenance. Three cables have to be disconnected to release the loader and none of them will be very accessible. The board will have little vertical clearance, yet its power plug must be pulled up with considerable force to disconnect. The single DB37 cable used in the CD4000 represents a much better approach even if it does increase the cabling cost. The interim board approach to cabling is not inherent in the scan chain interface. In fact, better cabling favors the scan chain approach, which isolates the wire count from future I/O count changes.
The two analog motor feedback wires increase the wire count of the scan chain interface but not of the direct I/O. But the interim design uses more wires in the scan chain connector than is actually necessary. It has two scan chain connectors that are nearly identical. Units can't be daisy-chained to a scan chain, because the data wires require unique input and output points. There are two logical data lines, downstreaming from the master (MSM) and upstreaming from the slaves. Each of these requires two wires for differential signaling. These four wires have to be different in the connectors, but all of the others are identical. Further, the loader board in the new architecture would be the only unit on the MSM's external I/O scan chain. The last unit in a scan chain doesn't need to provide a downstream connector. Therefore, the I/O scan chain doesn't even need the four data loop wires.
The situation for the loader's motor scan chain is different. The MSM master is located physically between the loader and the (new-- see stepper review) right panel motor driver board. One of the two slave boards must provide the data loop. While reducing the wire count in the MSM-Loader interface is desirable, it could only be done by physically looping through the right panel motor driver board and coming back through the MSM and down to the loader board. This would require either two normal scan chain cables or one special looping cable. The looping cable would reduce the total wire count but at the expense of system flexibility, as only standard cabling supports "mix and match" boards. Special cabling should only be used in a point-to-point link that we are sure is not going to change, such as between the MSM and loader.
If the loader board provided the motor loop, the logical interface signals would be:
The total wire count would be 24. A DB37 could support these while providing 13 power pins, which should be sufficient. A right-angle DB37 on the loader board would solve the disconnect problem, particularly if the connector itself can be bolted to the loader frame.
STEPPERS
The MSM will have three stepper interfaces for driving the two steppers located in the tower-- probe and spinner (this has been changed from a DC motor to a stepper)-- and the peristaltic pump, located on the left-hand side of the left panel. If direct loader I/O replaces the loader board then a fourth interface is needed. These are located on the motor scan chain but with the single-ended electrical interface that can be locally used instead of differential.
The four syringe motors and one wash block motor are all located on the right front panel. A stepper driver board will be located on the (inside) right panel under the wash block motor and, therefore, next to the right-most syringe. This board will contain five stepper driver interfaces. If the board is too large for this space, it may move to the right several inches, where there is a large open area that was occupied by an SDM in the CD3200. All stepper interfaces are removed from the Right Panel Module. The shear valve driver interface doesn't similarly migrate because it is not a stepper, but is controlled through the APU I/O scan chain.
The size of the syringe/wash block stepper driver board is determined by the following elements:
LEFT PANEL
The peristaltic pump driver will move from the interim Left Panel Board to the MSM. The interim Left Panel Board provides a scan chain connector to the Tower Board. This will move to a mechanically similar I/O connector on the MSM. The MSM area is determined by the following subsections:
The Left Panel Board continues is still needed to provide at least solenoid drivers. The interim design also includes fluid sensor circuitry (including for the strobed sensors). The board area is determined by the following subsections:
Since the Left Panel Board and MSM are located adjacent to each other, it would be possible to simply combine them into one. However, with such a large number of connectors, servicing would be easier if the two were separate. If separate, another 3 sq. in. should be allowed for separation. The total area consumed by the two boards would be 122 sq. in. The left panel plate provides 168 sq. in. The MSM will be located on the bottom of the plate and the Left Panel Board above it.
STROBED FLUID SENSORS
Two of the six strobed fluid sensors, are located on the front left panel. The remaining four are probes inserted into reagent and waster containers outside of the instrument. Their wires enter the instrument through the back panel (of a CD3200) but they could enter at any convenient point. Since only the two sensors on the front panel have a fixed location, it is reasonable to locate the sensor circuitry near to them and route the remainder to this point through the most convenient path. The Left Panel Board is the obvious choice for this. The sensor wires entering through the back panel can traverse to the front if necessary, but it would be better to provide an access point closer to the Left Panel Board if possible.
FLUID IN-LINE SENSORS
The pre-shearValve and post-shearValve fluid sensors attach to two small boards precariously dangling from a flagpole on the right front panel behind the Y-valve. The optical sensor board is our own and its circuitry should be merged into the Right Front Panel. The ultrasonic sensor board is mated to its sensor and is tuned by the manufacturer. This board consumes less than 2 sq. in. and the Right Front Panel board has virtually unlimited space, so it would be reasonable to provide an option to mount the sensor board on the panel board. Remote mounting should not be precluded, because the sensor cable has a fixed length that may not support every possible position of the sensor unless the board is mounted on the front panel.
STATUS BOARD
The interim Status Board has two combined motor and I/O scan chain connectors for attaching to the FPM. These must at least be replaced by connectors for the APU I/O scan chain, which doesn't include a motor interface. The wiring can be much more aggressively reduced by using a point-to-point loop-back cable to the Right Panel Board. This requires just one 14-pin cable.
RIGHT PANEL
The interim design has two combined motor and I/O scan chain connectors for attaching to the FPM. The new design has two similar connectors for attaching to the APU I/O scan chain. Neither the MSM's I/O or motor scan chain attaches to the new board. All of the stepper driver circuits are moved to the new right panel motor driver board.
A point-to-point loop-back cable connector will be added for connecting the Status Board to the APU I/O scan chain.
CABLING
The interim design uses consistent cabling, which would support mixing I/O boards, i.e. any board that attaches to a particular scan chain can be located anywhere on the chain. This essentially doubles the amount of cabling and connectors. The supposed flexibility that this should buy is really an illusion. Each I/O board is designed according to where it is located. Any movement would be within a narrow range that would not upset the bus topology. The bussing flexibility is inconsistent with the reality of these boards. Therefore, cabling should be reduced wherever feasible by using special point-to-point arrangements as specifically described above. The two cases that stand out are the Right Panel Board to Status Board and the MSM to Loader board.
The standard I/O control cables are as follows:
I/O control cabling usage is as follows:
HARDWARE ROADMAP
APU
FPM
No longer exists. Its vacuum/pressure control moves to the new VPM. The remainder moves to the new MSM.
MSM (replaces FPM)
VPM (new)
LOADER BOARD AND CABLE
LEFT PANEL BOARD
RIGHT PANEL BOARD
STATUS BOARD
RIGHT PANEL MOTOR DRIVER BOARD (new)
9/28/99
[Excerpts From Ad46] [Report2]Synopsis: some hardware changes needed on the APU and MSM (nee FPM) boards to better support software.
APU-MSM CLOCK
PROBLEM
The APU-MSM interface does not include a clock, limiting the maximum baud rate to the maximum that can be embedded in the data stream.
SOLUTION
Replace the SYNC signal from the APU with the 500KHz RXCLK, which is transmitted by the data station. Disconnect APU CPU PA3 from 75174 line driver input and replace it with RXCLK. The APU will be programmed to use SCLK for transmit and receive for both channels A and B.
APU-MSM INTERFACE PINOUT
PROBLEM
The APU-MSM interface is electrically and functionally nearly identical to the Data Station - APU interface but the two pinouts are different. This prevents the Data Station from communicating directly with the MSM, which would be useful during development.
SOLUTION
Arrange the pins of the APU-MSM connector to match the HSSL connector, with the APU playing the role of Data Station and MSM the role of the APU. Put ATN on the ACLOCK (TXCLK from APU to Data Station) pins 3 and 4. A special cable is needed for the Data Station to talk directly to the MSM. In this cable, pins 3 and 4 are disconnected and the wires connected to pins 9 and 10, i.e. the Data Station's own transmit clock is looped back as its receive clock in the cable itself. The MSM's ATN signal is not used.
MSM MASTER COMMUNICATION CLOCK
PROBLEM
The MSM (FPM) CPU does not connect to an external baud rate clock so it can't use (hardware) synchronous clocking.
SOLUTION
Cut the CPU's SCLK (pin 88) connection to ground and connect it instead to the old SYNC input (after translation by 75175) which has been replaced by the global communication clock on the APU board.
MSM COMMUNICATION HANDSHAKING
PROBLEM
The MSM (old FPM) has no input handshaking, i.e. nothing to prevent the APU from overrunning the UART input buffer. The CPU output that could have been used for input flow control, RTSA (pin 80), was instead used for output impedance control for a multi-drop communication system, which we don't have.
SOLUTION
Disconnect RTSA from the output drivers' (75174) enable. Hard-wire the enable to always on. Disconnect TXRDY (CPU pin 83) from the interface CTSB driver, replacing it with RTSA. Configure RTSA for input flow control.
For any multi-drop system developed in the future using this same CPU, transmitter impedance can be controlled by simple I/O. At the final interrupt of a message, the ISR can turn off the transmitter.
CDNEXT: APU SYSTEM DESIGN REPORT 2
10/4/99
[Report1] [Report3]DISTRIBUTION OF FPM SUBSYSTEMS
VACUUM ACCUMULATOR WET SENSOR CIRCUITS
The vacuum accumulators are located in the vacuum/pressure module. Consequently, their signal conditioning circuitry should move from the FPM to the new VPM board. These sensors are strobed but, unlike the strobed fluid sensors, they are operated continuously. The circuit shown in the FPM schematic is very similar to the circuit on the old VPM, but it is not identical, leading to some questions.
VACUUM/PRESSURE CONTROL
DAC
The vacuum/pressure control system will move from the FPM to the new VPM. Several improvements can be made to the circuit now on the FPM.
The circuit on the FPM simply provides buffered control signals to some remote actuator system, comprising solenoids and pumps. The solenoid control logic signals should be replaced by real solenoid drivers. There should be three of these, two for the pressure step downs and one for vacuum. Since the two pumps operate on 117VAC, we probably will want to continue outputting logic controls for them.
The DAC and comparator feedback control system is inefficient and inflexible. Each of the five servo control sections comprises an analog signal that is compared to one DAC channel output, generating a logic signal, which is sent to the FPGA. The same analog signal is fed to an ADC, whose output is sent to the FPGA. The FPGA writes the reference voltage to the DAC. The FPGA outputs the pump and solenoid control signals. The FPGA doesn't need the DAC to tell it when the measured signal is below the reference. It can do this itself by comparing the ADC output to the reference value.
Eliminating the DAC and comparator affords many advantages, such as:
CONTROLLER
The vacuum/pressure controller combines relatively high level commands with the digitized sensor data to determine whether to turn the pumps and solenoids on. On and off commands are simple. For each of the three solenoids, the scan chain provides two bits that tell whether the solenoid is open, closed, or servoed. The scan chain provides one bit for each of the two pumps, which are either off or servoed.
Each of the five controlled sections (three solenoids and two pumps) needs at least one 10-bit reference level. For programmable hysteresis, two 10-bit references or one 10-bit level and one (e.g.) 4-bit hysteresis value would be needed. The total of 50, 64, or 100 bits would consume too much of the scan chain to be used without encoding. The most obvious encoding scheme is to demultiplex one set of 10 bits to registers selected by three or four address bits.
A simple controller could rotate continuously through the five channels, comparing each ADC output to the corresponding level (if servoing has been selected for that channel) and turning the output on if appropriate. This bang-bang control could excessively toggle the electromechanical devices. The simplest way to avoid this is by using upper and lower thresholds, turning on the device when the signal drops below the lower threshold and off when it climbs above the upper. By making these levels programmable through the scan chain, any system can be properly tuned by scripts. It would even be possible to adapt the hysteresis to changing conditions. For example, the vacuum may be maintained loosely (large hysteresis) most of the time to reduce solenoid wear but more tightly (smaller hysteresis) during the cell counting period to reduce vacuum oscillations during this critical period.
Bang-bang control, even with hysteresis, represents a primitive control means, in which overshoot can be controlled only by excessive cycling. A more sophisticated means, such as Proportional-Integral-Derivative, is more commonly used for servo mechanisms. A PID control system could not be feasibly implemented in an FPGA. A more reasonable approach would be to use a DSP. Any standard DSP can provide the computational power for this application. The other major concerns would be the time required to write the program and the circuit cost.
Analog Device's ADSP2105 itself costs only $3, but the full circuit comprises more than this one device. Like many DSP's, this device has on-chip data and program RAM and can boot itself from an 8-bit external EPROM (like our RAM-based FPGA) so its support circuit cost is quite low. It also has a built-in clocked serial port that can interface to the serial scan chain with little or no additional logic. Since the 2105's serial port isn't flow through (input looping back to output) as is normal for our scan chain, it would have to operate in snoop mode. That is, it could capture the scan chain signals as they pass by. The only reasonable way to map the device onto the scan chain would be to locate it at the logical end. Otherwise, it would inadvertently share addresses with some real device in the chain. It could actually be located physically anywhere at or before its logical address. As with the flow-through scan registers, it can't see bits physically located earlier in the chain (except on the next cycle, which could be confusing but may be usable anyway).
An ADSP2105 located at the physical beginning of the scan chain could capture the entire 256 bits by programming the serial interface for auto-buffering, using a DAG (on-chip Data Address Generator) to automatically transfer each group of 16 bits to a memory buffer. However, this isn't needed for the vacuum/pressure controller (it may prove useful for other situations where this chip might be valuable) which is located on the physical end of the APU's I/O scan chain and will only need 22 output bits. This is larger than the 2105's input holding register so auto-buffering might still be needed. The 2105's serial transmit can be used to send status information to the APU. There would be no reason to send raw ADC data, as the vacuum/pressure control system is entirely self-contained and, with a DSP, smart enough to identify and report errors.
The ADSP2105's serial port is designed for interfacing with serial peripherals, like CODECs and ADCs. However, it has only one serial port. Other devices in the ADSPxx series have two serial ports but also many other features that we don't need. It may be cheaper to use a parallel ADC (if we can find one-- the TI family that we have been using offers multiplexed ADCs only with serial interfaces) with the 2105 than to use one of the more expensive parts just to be able to use a serial ADC.
The DSP program can be relatively simple. A bang-bang controller with hysteresis wouldn't be more difficult to implement than the same functionality in VHDL for the FPGA. However, all of the tools and development methods would have to be procured and set up. Further, only one of us (as far as I know) has any experience programming the ADSP2105. The two solutions would have essentially identical system interfaces and would cost approximately the same. The basic tradeoff is, for an up-front cost (more in startup time than equipment, as the DSP tools are cheap) the DSP affords greater flexibility in control algorithms (and self-testing) which could result in more dramatic savings due to improved reliability.
Whether implemented by a DSP or FPGA, the bang-bang + hysteresis architectural and behavioral description is as follows:
Other than pressure and vacuum control, the new VPM board will contain only the two vacuum accumulator wet sensors, which will be inputs to the APU scan chain.
MOTOR CONTROL, TOWER, AND LOADER
APU-MSM COMMUNICATION CABLE PINOUT
The intelligence of the FPM, comprising the CPU and the bulk of the FPGA, was primarily responsible for motor control plus tower and loader script execution (tower and loader SHM units in the CD3200). This has been moved to the new MSM (essentially, FPM with other functions removed, has been renamed MSM).
The APU communicated with the FPM using the old SHM serial link, which is half-duplex, 19.2KBaud, asynchronous RS485. For APU-MSM this will be replaced by full-duplex, 500KBaud, separate data and clock lines, RS422, which is the same as the "HSSL" link between the data station and APU.
The APU and old FPM circuit (now MSM) both must be changed for the new link. In addition to circuit changes within each board, the connector pinout is changed to match the HSSL cable to allow the data station to link directly with the MSM for program development (software will also provide a link through the APU). The cable pinout is:
The APU's selectable barcode interface should be removed. This can only be enabled at the expense of MSM communication and the APU cannot control an analyzer without the MSM.
To support a fast Baud rate in the serial interfaces (data station to APU and APU to MSM) a separate clock is transmitted in addition to the data. The HSSL link was originally developed around the 68681 DUART (and the nearly identical 2681, which is still used on the data station's HSSL card). The DUART can associate a distinct clock for each data line. The HSSL interface contains separate clocks for the data in both directions.
The 68340 CPU supports clocked data but provides only one clock input, SCLK for transmit and receive data on both channels A and B. When we replaced the 68681 with the 68340, to provide the outgoing data clock (to the data station) we simply turned around the clock transmitted by the data station, i.e. the CPU SCLK. This clock and the CPU's transmit data experience equivalent delays through the buffers but the data still lags the clock by the delay time from SCLK to data out in the CPU. This delay does not exist when both the data and the clock are output by a 68681. The delay is calculated from the data sheet for a 17MHz 68340 operating at 15.97MHz (see MC68340 User's Manual, Electrical Specifications p. 11-22).
The worst case delay is: 1.5 tcyc + tcs + tvld, where tcyc = 63 ns, tcs = 5 ns, and tvld = .5*tcyc = 31 ns. The result is 130 ns.
The input data setup time relative to SCLK is: .5*tcyc + tcs + tchrx, where tchrx = 15ns. The result is 51 ns.
Since the data period at 500KBaud is 500 ns, the lagging data reduces the worst case data-clock margin from 449 ns. to 319 ns, which is unquestionable safe.
The situation is more complicated in the APU-MSM interface. Unlike the data station, with its multi-clock 2681, the APU won't accept a separate clock for the data transmitted by the MSM. Therefore, this data will lag the clock used to sample it significantly more than in the APU-Data Station case. To reduce the lag, the differential clock from the data station to the APU is routed directly from the HSSL connector to the MSM connector (see item 5 above). This still leaves the following (max) delays:
Given the 51 ns. data setup time, the worst case margin is 64 ns. It would be possible to improve this by delaying the data on the APU by one full clock period minus the data lag, thus moving the data into next clock period but forward in time for that period. This is possible because the link, although clocked, is still asynchronous, with the receiver extracting the start time from the data. If the data were moved too far forward in the next cycle, i.e. not delayed enough, an apparent clock lag would be created. Thus, we have to consider the minimum delays. Assuming that minima are 50% of maxima, the minimum data lag is 192 ns. The CPU-UART data hold time is the same as the 51 ns. setup time. Therefore, the ideal data time would split the min-max lag (192 ns.) in the middle of the bit time, producing a symmetrical pair (clock lag relative to minimum data delay and clock lead relative to maximum data delay) of worst case margins of 154 ns. This could be achieved by a 308 ns. (500 - 192) delay in the data at the APU. This is obviously much better than the 64 ns. margin, but since that also results from a worst case analysis, it is acceptible. Therefore, unless we encounter unreliable communication for unexpected reasons, the undelayed data design will be used. If it is convenient, we may want to include an unstuffed delay line on the APU data from the MSM as a contingency option.
The MSM has no other uses for the CPU's two DMA controllers so they may as well be used for its communication with the APU. For consistency with the APU, DMA 2 will be used for transmit data (from MSM). CPU TxRdyA pin 83 will be connected to CPU DREQ2 pin 96. CPU RxRdyA pin 82 will be connected to CPU DREQ1 pin 93.
STEPPER MOTORS
DISTRIBUTION OF DRIVERS
System Design Review Meeting 46 Minutes (AD46.DOC) provides an overview of stepper driver requirements. Five drivers are required on the right panel motor driver board, three on the MSM, and one on the loader board. The motor controller can support as many as fourteen motors and providing spare drivers does not impact the existing motors. Each motor interface consumes four scan chain outputs and four inputs. Our standard scan chain shift registers (HC594 for output and HC597 for input) are octal parts. Coincidentally, each of the three boards is required to support an odd number of motors so adding one spare to each more effectively uses the scan chain address space while still using standard logic.
The right panel motor driver will provide six stepper interfaces and the MSM four. If the loader provides two, all three boards will have one spare, but there will still be two empty slots. We have considered replacing one or more pneumatic loader actuators with steppers. If the loader board can be expanded to support two more spare stepper drivers, it could save us a future board turn. Since two interfaces represent complete octal shift registers, they can be easily jumped over (serial in connected directly to serial out, bypassing the shift register) in order to recover the scan chain address space for a driver located elsewhere in the system.
MOTOR SCAN CHAIN INPUTS
As explained in AD46, the stepper interfaces will include four inputs for flags. One HC597 octal shift register will support two motors. For a circuit example, see Status Panel or Left Panel. The '597 is double-buffered. On the rising edge of NPCS0 (scan chain parallel load) the input capture register is loaded by asynchronous RCK. During the period in which NPCS0 is high, at each positive transition of SCK0, the captured input is loaded into the input scan chain because the active low synchronous load signal SRLOAD is driven by inverted NPCS0.
At least two of the scan chain inputs are taken to standard three-pin flag jacks. The other two pins provide +5 and digital ground to an opto-interrupter. The flag pin is connected to the '597 input and to a pullup resistor.
CDNEXT: APU SYSTEM DESIGN REPORT 3
10/8/99
[Report2] [Report4]MSM MOTOR CONTROLLER BUS MASTERING
Andy O has suggested the possibility of the motor controller reading step event data during the four microsecond dead time of the scan chain, thereby avoiding having to create a shadow motor scan chain. As Andy explained, in any 32 usec. scan period, the maximum number of events (bytes) to be read would be 14 because no motor can have more than one step in this period.
The memory bus is not available for the full 4 usec, because the FPGA has to arbitrate with the CPU for ownership. Motorola states that the 68340 has lower bus arbitration priority than external devices (User's Manual 3.6 Bus Arbitration p. 3-40). The User's Manual also states that DMA uses the bus arbitration protocol to request bus mastership (6.5 Bus Arbitration p. 6-18). This suggests that the FPGA would have higher priority than DMA, but it isn't always clear what is considered an external device. For example, although both the DMAC and UART are internal to the 68340, the UART RDY signals must be externally wired to the DMAREQ inputs.
In any case, the worst case bus lockout due to an instruction occurs when the CPU has just begun a long read/modify/write, such as ADD.L #12345678,someAddress. Assuming that the CPU uses a three cycle memory access, the lockout duration is 12 cycles for the CPU's instruction (from MC68340 User's Manual page 5-97, 98 Instruction Timing Tables) + 4 cycles for the extra cycle in each of two reads and writes (Motorola's example uses 2-cycle memory access). The total is 16 cycles, which, at 15.9744 MHz, is 1usec (16 * 62.6 nsec).
If DMA has equal or higher priority than the FPGA and if both RX and TX DMAREQs assert before the bus request then each of the two DMA transactions would add 7 cycles to the lockout period. DMA could require as much as 9 cycles for two-cycle DMAREQ recognition and a subsequent dual-address transaction (even though one of the addresses is that of the internal UART, the CPU will still keep the bus locked). However, both DMA requests' recognition times are hidden in the CPU instruction execution time, so the real worst case for one DMA is 7 cycles. This would increase the worst case bus lockout period to 1.88 usec.
If the motor controller can read 14 bytes from shared memory in less than 2.12 usec. then we don't need to further investigate DMA priority. If it can't meet this timing requirement, but it can perform the necessary reads in 1 usec. then we need to determine whether DMA has lower priority.
The furthest downstream board in the chain is the end of the chain. The serial output of its MOSI tail register is connected to the serial input of its MISO head register. This connects the MOSI and MISO chains into one loop. The Master can verify the integrity of both MOSI and MISO chains by shifting twice the normal chain length without toggling NPCS.
A board in a scan chain may have one of three connector/bus configurations. These are:
A variant of the end-of-chain and daisy-chain boards contains a downstream loopback connector for attaching a loopback board. A downstream loopback connector contains the same pins as the upstream loopback but the signal directions are reversed. No board may contain only a downstream loopback connector. Normally, downstream loopback connectors are found only on daisy-chain boards. The downstream loopback should be located in the board's chain as if it were a downstream board. The loopback MOSI and MISO wires (two each) are routed directly to the downstream connector's MOSI and MISO pins. The normal MOSI and MISO wires from the downstream loopback connector are buffered (output to MOSI and input from MISO). Downstream communication requires the attachment of a loopback board or a flow through adapter in which the four normal MOSI and MISO wires are jumpered to the corresponding loopback wires. A daisy-chain board with a downstream loopback connector requires the same number of receivers (5) and transmitters (2) as a daisy-chain board without the loopback.
The scan chain registers are connected end-to-end as one long shift register. All of the registers must shift nearly simultaneously, so SCK has to be distributed in parallel to the boards. Thus, a daisy-chain board acts as an upstream-to-downstream bridge for SCK rather than as a repeater (where the translated signal would be rebuffered). NPCS should be treated the same way in order to minimize the skew between it and SCK. Reset should be treated similarly in order to avoid using an otherwise unnecessary output buffer.
Data signals are connected serially but they will not accumulate lag relative to SCK because, on each clock, data shifts only one position, i.e. there is no ripple effect. In fact, a small data output lag would be desirable to reduce the potential for a fast output being prematurely captured by a slow input device. Each differential data input (upstream MOSI and downstream MISO) is a transmission line endpoint with a characteristic impedance of 130 Ohm. In addition to the parallel terminating resistor, a parallel terminating capacitor can be used to improve the input data hold margin. A reasonable value would be 1000 pF-- the time constant of 65 Ohm and 1000 pF is 60 nsec (130 is divided by two because the capacitor sees the line and resistor in parallel). A MISO loopback should be similarly terminated.
The 75174 outputs can drive up to 30 receivers, but only the endpoints (at the driver and at the last receiver) should be terminated (only with 130 Ohm resistors-- the capacitor is for data lines). The other receivers should be unterminated but located very close to the differential signals to avoid reflections in the tap stubs. An end-chain board, i.e. one with only one normal connector, obviously should terminate SCK, NPCS, and Reset. However, while loopback and daisy-chain boards are designed to operate in the middle of the chain, they may also be used at the end of a chain, especially during system development. Consequently, they must support configurable termination. A terminating resistor is located near the receiver for each of these three signals. One leg (pad) of each resistor is connected directly to the wire. The other is connected via a removable jumper. If space is tight, the jumper may be a small wire between two vias.
SHIFT REGISTER COMPONENTS
The interim hardware uses 74HC594 for scan chain output and 74HC597 for input. The '594 is an extremely rare part that will probably soon be extinct. The parts we have now are made by National Semiconductor. I checked the web sites of National (back to Fairchild again), Intersil, Motorola (standard logic now "ON Semiconductor"), and Toshiba. None of them list a '594 in any technology. We cannot use this part.
The '597 appears to be more readily available, as I found it in TI, Toshiba, and Motorola data books. TI and Toshiba data books also list an HC595, which appears identical to the '594 shown in the schematics. Since I can't find any data on the '594, I can't say for sure that the two are identical, but the '595 looks like it would work in our scan chain (although, without an explicit timing definition of our own scan chain, the compatibility analysis must be considered incomplete).
For scan chain output, there are no standard alternatives to the '595, as it is the only part with double buffering. If the '595 is not commonly available then we will have to synthesize the equivalent functionality in FPGAs. Even if it is available, there are two reasons why, in some situations, we would want to use FPGA logic instead. One is that, unless all outputs are actually used, the octal parts waste scan chain bits (addresses). For inputs this is not important, as relatively few of the 256 bits are actually used. For outputs, this is more important, because most of the 256 outputs are used. The other situation in which synthesis may be preferable is where the scan register function can be folded into other functions. For example, the Status Panel contains a '594 and a '597, each only half-utilized. Additionally, it has an LM555 whose frequency is set by a digital potentiometer and an output amplifier whose volume is set by the other channel in the dual digital pot. The mixed signal buzzer circuit could be completely replaced by an FPGA, with frequency set by counting a clock and volume by PWM. The 4-bit output and input registers could be folded into this FPGA, replacing everything but buffers and drivers with one relatively small FPGA.
SCAN CHAIN OPERATION
Since nearly any behavior that we want can be implemented in an FPGA, scan chain operation should be compatible with '597 and '595 components, whose behavior can't be changed.
The '597 scan chain input register operates as follows:
The '595 scan chain output register operates as follows:
The input and output registers have two important features in common.
SCAN CHAIN CONTROLLER
The document CD3200R FPM Hardware Design Description describes scan chain operation in general. The dimensionless state diagram in the document intimates that falling NPCS and rising SCK are coincident. Since data is serially shifted on the rising edge of SCK (the document doesn't state this, but it is what the circuits do) this could produce a race condition and should be avoided if possible. Considering the potential for skew between NPCS and SCK as they pass through multiple buffers through the chain, NPCS should not change near the positive edge of SCK. One way to avoid this would be to have NPCS change at negative transitions of SCK, thereby providing 250 nsec. for data settling time and skew margin. Another solution would be to stop SCK prior to any changes in NPCS, which could provide any margin deemed appropriate.
Settling time could be fairly important if an input register other than the '597 is used. The '597 nearly completely eliminates metastability, because the asynchronous external data captured on the rising edge of NPCS has 4 usec. in which to settle before being latched into the shift register by NPCS becoming low. Although a metastable state could theoretically persist forever, a standard rule-of-thumb is that it will not persist for longer than three times the normal propagation time. In this case, that would be 75 nsec. Longer delays than this only marginally improve an already acceptable failure rate. Simpler (and more common) parts like the 74HC165/166 could be reliable in this application if given at least 75 nsec. for captured data to settle before serial shifting. The '166's parallel load is synchronous to the shift clock, which is not perfect for this situation, because the scan chain clock would repeatedly load the asynchronous data while NPCS were high so that only one clock period would be provided for metastability resolution. The '165's load is asynchronous, but it actually provides less settling time because the parallel load is level sensitive, essentially wasting all of the NPCS high time that could have been used for settling had the data been captured on the rising edge of NPCS. The '165's settling time is further diminished by the worst case skew between NPCS and SCK, whereas the '166 would provide one full clock period of 500 nsec. for the parallel loaded data to settle before the first serial shift. Although the '597 would be theoretically better, this is sufficient margin for the '166 to perform as well as the '597.
If an FPGA is used for a scan chain input register, fewer resources would be consumed by emulating the single-buffered '166 instead of the double-buffered '597. Therefore, whether we continue to use actual '597s or switch to '166s for standard parts, accommodating the '166 is likely to increase the efficiency of our I/O implementations. This requires that SCK continue to operate while NPCS is high. As already mentioned, NPCS would change only on negative SCK edges and the maximum tolerable skew between SCK and NPCS would be 250 nsec.
As for the output side of the scan chain, the '595 is the only acceptable standard part and a synthesized replacement needs to act similarly. However, the synthesized part could improve on the '595 by making the reset output more reliable and by reducing external inverters by resetting to 1 or 0 as appropriate. If we know enough about a subsystem's final configuration to put inverters in the circuit then we know enough to program an output inversion in the PLD/FPGA.
CONCLUSIONS
STEPPER MOTOR I/O
GENERAL
The interim boards contain a stepper driver circuit that will be changed for the new version. The changes will be:
At least two of the scan chain inputs will be taken to standard three-pin flag jacks. The other two pins of each jack provide +5 and digital ground to an opto-interrupter. The flag pin is connected to a scan chain input register bit and to a 3.3K pullup resistor. If the board has room, all four inputs will be similarly connected. A standard IC ('597 or '166) will be used for the scan chain input register unless a programmable device already used for other reasons has spare pins.
The following is an arbitrary assignment of wires to the standard (not loopback) motor control bus (scan chain input and output plus winding feedback wires):
The loop back motor control bus has 16 wires instead of the standard 14. Wires 1 through 12 are identical to the standard bus. The remaining wires are used as follows:
RIGHT PANEL MOTOR DRIVER BOARD REQUIREMENTS
This board provides the interface to the four syringe motors, one wash block motor and one spare motor. It is connected to the motor scan chain as a loopback board in order to reserve the scan chain end position (with its reduced wire count) for the Loader Board.
Each motor driver section, including connector, consumes 1.6 sq. in. using a non-optimal layout on the interim Right Panel board. If this same layout were used, the six driver sections would consume 9.6 sq. in. The power and motor scan chain connectors and input power capacitor consume 2.5 sq. in. Three 74x166, two 75175, one 75174, and related discretes consume 4 sq. in. Together, these groups consume 16 sq. in. of the 29 sq. in. board area, leaving a reasonable amount of space for a PLD/FPGA, printed wiring, mounting standoffs, and border.
In our meeting of 10/5/99, we concluded that this board would require very little design effort, because the interim stepper driver design could be cut and pasted into the circuit. This would be nearly true if we would accept the poor boot/reset output control afforded by the 74HC595. However, the superior PLD/FPGA approach would not be a difficult design-- it just wouldn't be the near-zero design effort that we predicted. The synthetic improved 74HC595 would be used in other places in the system, so this is a fairly efficient use of design resources.
CDNEXT APU SYSTEM DESIGN REPORT 2 (CDXSYS2.DOC) APU-MSM Communication Cable Pinout describes how the data station's HSSL clock is routed directly to the clock sent down to the MSM. Not mentioned is what to do about terminating this new multi-drop line. In the real instrument, MSM will be the end of the HSSL line and should have the termination resistor. APU should be a non-terminating drop. The APU's printed wiring should locate the APU's clock receiver (75175) as close as possible to the clock lines traversing the APU board from the HSSL connector to the MSM connector (i.e. minimal "stub length").
Normally, the APU would not have a terminating resistor but, during development, it may connect to the data station without an MSM. For these occasions, it should have a terminating resistor with one leg jumpered. The jumper is removed when an MSM is connected.
10/11/99
[Report3] [Report5]SCAN CHAIN (ADDITIONAL DETAILS)
INTEGRITY TEST, END-CHAIN BOARD
As mentioned in cdxsys3, the end-chain (furthest downstream from the master) board connects the serial output of its MOSI tail register to the serial input of its MISO head register, creating one loop from the MOSI and MISO chains. Each chain is half the length of the loop. In normal operation, the loop is not seen, because the MISO chain is reloaded by NPCS after shifting only a chain length.
To test the integrity of both the MOSI and MISO chains, the controller shifts the full loop without toggling NPCS. The test requires shifting three times the length of one chain. The test operates as follows:
The forgoing test procedure assumes that the output and input scan chains are of equal length. In reality, the system has fewer actual inputs than outputs. The scan chain controllers should implement equal length input and output registers internally in order to support all possible external arrangements. If the external input chain is shorter than the output then, in step 4, the length of the input chain is shifted, thereby filling the input registers with data from the output shift registers but not all of it. The full output length shift at step 6 fills the controller's input registers with all of the output shift register data. The scan chain controller doesn't know the external chain lengths, but the CPU does. Therefore, in loopback test mode, the CPU will write into the controller the length of each shift. From the controller's point of view, test mode operation is as follows:
Just as with the end-chain board requirement for terminating the scan chain's parallel control signals, SCK, NPCS, and Reset (see cdxsys3: SCAN CHAIN: SCAN CHAIN STREAM CONNECTIONS) only a true end-chain board (one with only one standard connector) should have a hard-wired MOSI-MISO loopback. Daisy-chain and loopback type boards require selectable end-chain looping, which is provided by a removable jumper that connects the serial output of the MOSI tail register to the serial input of the MISO head register.
STARTUP
As explained in cdxsys3: SCAN CHAIN: SHIFT REGISTER COMPONENTS, the 74HC595, which is the only acceptable standard IC for scan chain output, does not reset its parallel outputs unless the scan chain is operating. Some devices, such as solenoids (in most circumstances) can tolerate at least a few seconds in the incorrect startup state when the instrument is powered up. For others, such as stepper motors, the uncontrolled startup state period must be minimized.
One way to reduce the uncontrolled startup period is to synthesize an improved output register, cdxsys3 suggests for stepper drivers. However, the standard component is more cost-effective, both in terms of basic component cost and in cost of extra handling, support, and paperwork. Therefore, in less demanding situations, the 74HC595 is preferred.
The number of situations in which the 74HC595 can be used increases as the uncontrolled startup period is reduced. The current scan chain controller exacerbates the startup problem in two ways. One is that the FPGA that implements the controller is programmed by the CPU and, thus, may not begin functioning at all for at least several seconds after power is applied. The other is that the controller portion of the FPGA does not begin functioning until the CPU enables it. Both of these should be changed.
We can still keep open the option of programming the FPGA via the CPU, as it may prove helpful during development (currently, it is a hindrance in several ways) but the primary means should be self-programming via serial flash memory. This represents a change in our concept that programming by the CPU would be the production configuration.
To further reduce the uncontrolled period, at power up, the scan chain controller should automatically assert scan chain Reset and then toggle NPCS at least once. This could be the beginning of normal scan chain operation for all signals except Reset. It would be acceptable to require the CPU to turn off the Reset signal, saving the FPGA the need to time its activation period. Thus, the scan chain controller doesn't need a startup state machine. The scan chain Reset signal starts up activated and normal scan chain operation begins immediately.
STEPPER DRIVER REGISTERS
In cdxsys3: Stepper Motor I/O: Right Panel Motor Driver Requirements, using synthesized output registers instead of '595s was strongly recommended mainly because of the uncontrolled startup period. If scan chain controller startup improvements reduce the uncontrolled period to under 1/2 second then '595s would be acceptable.
There are four scan chains in the system: two APU I/O chains (the spare may not be fully implemented in the FPGA); one MSM I/O chain; and one motor chain controlled from the MSM. The I/O and motor scan chains are essentially identical. The bus wire names and functions are described in cdxsys3: SCAN CHAIN. The motor control bus has two additional wires for stepper winding feedback. Its pinout is described in cdxsys3: STEPPER MOTOR I/O: GENERAL. The standard I/O scan chain bus pinout is as follows:
The loop back I/O bus has 14 wires instead of the standard 10. Wires 1 through 10 are identical to the standard bus. The remaining wires are used as follows:
MSM
SUBSYSTEM REQUIREMENTS
As described in AD46: HARDWARE PARTITIONING: MSM, the MSM board will combine the following subsystems:
TOWER
CONTROL INTERFACE
The tower subsystem derives from the interim design's tower board. The scan chain interfaces are replaced by direct I/O, eliminating the transmission line buffers (tower board U1, U2, and U3) but not necessarily the scan chain registers. The MSM's FPGA could provide direct I/O control lines, but using the I/O scan chain, which is needed for the loader anyway, would consume fewer FPGA pins.
SENSORS
The four flag sensors, GS2 Home, GS2 Height, GS1 Home, and Door Open, are buffered/inverted by HCT240 U8. Buffering serves no purpose and should be eliminated. All of these inputs except Door Open are motor flags and would, therefore, normally be input to the motor scan chain. However, since the MSM is responsible for both motor and tower control, both motor flags and tower I/O are in the MSM's immediate domain. Putting all tower sensors together may lead to more efficient script execution as well as better wiring locality on the MSM PCB. Therefore, all of these flag sensors will be read into the MSM's I/O scan chain through one register (HC597 or '166) along with the DC motor over- and under-current detectors, as shown in the tower schematic. This leaves two spare inputs, which should be connected to +5V through 10K resistors.
The schematic shows the serial output of the output register U7 connected to the serial input of the input register (U6). This hard-wired MOSI-MISO loopback is incorrect. The loader will be the end of the MSM I/O scan chain. This connection should either be deleted entirely or replaced by a removable jumper to allow the tower section to function as the end-of-chain during development.
ACTUATORS
The tower has three actuator devices, GS2 Release, Door Release, and bidirectional DC Spinner Motor. A stepper interface is also being provided for the spinner function, but the DC interface is needed as a backup in case the stepper is not as useful as we hope.
The interim tower design affords no power-down for either GS2 Release or Door Release. GS2 Release is activated to drop the tower and is never on for more than a moment. However, Door Release operates interactively with the user, whose behavior is unpredictable. Therefore, it must remain on for an unpredictable time and should have power-down capability. The ULN2003 (U10) driver has one spare output that can be used for this. The circuit should be identical to the solenoid drivers shown on the Left and Right Panel schematics except that the pulldown uses three instead of two sections of the ULN2003. To maintain a consistent (compared to other power-down actuators) software interface, the power and control bits need to be placed consistently on the output register ('595). One way to achieve this is to use the following pin assignments:
The L293 used to drive the Spinner Motor requires true and complement of the direction signal. An HCT240 is used in the interim schematic. This should be replaced by an HC04. The bus driving capability of the '240 isn't needed and HCT is the wrong technology. The L293's logic input levels are standard CMOS, Vh = 2.3V min and Vl = 1.5V max, which is also compatible with the serial output register HC595. The rest of the Spinner driver circuit shown in the tower schematic can be used without change.
STEPPERS
The MSM will provide the drivers for the tower lift motor and spinner (again, we won't know whether to use this or the DC motor until we have had a chance to experiment in a real system). It will also provide the drivers for the peristaltic pump and one spare motor.
If the fast scan chain startup reset described above is implemented and results in a sub-half-second (from power on) reset, then two '595s could be used to control the four motors, saving 16 FPGA pins (the 32+ internal registers that would be consumed probably are not as significant as the pins). If external registers were used then two inverters would be needed for each motor's I0/I1 (these pins of the pair of '3717s used for each motor are connected in parallel, so only two control lines are needed). As explained in cdxsys3: STEPPER MOTOR I/O: GENERAL, the '595 resets outputs to 0s, which would select high power in the '3717. Note that the negativity toward '595s for motor control expressed in that document did not take into account the possibility of a faster reset.
For the sake of control software, we should probably choose one motor control means, using either '595s or the synthesized improved '595, because the power controls are reversed between the two approaches. If the synthesized version is used, the main MSM FPGA would not have to be burdened with this I/O, as an efficient interface to an external PLD/FPGA already exists in the form of the scan chain. A common PLD/FPGA could be developed to support the six motors on the Right Panel Motor Driver, the four motors on the MSM, and the four on the Loader.
To reduce the wires in the Loader connector, the Loader will be the end of the motor scan chain. The Right Panel Motor Driver will provide an upstream motor loopback connector for which the MSM provides a matching downstream connector (the pinouts are identical-- upstream vs. downstream only defines signal direction). It makes little difference whether the MSM-local motors are upstream or downstream of the Right Panel Motor Driver unless the local motor interface is implemented in the motor controller FPGA. Although this will probably not be done, nothing is lost by allowing for it. Therefore, the motor chain will begin at the locals; traverse to the Right Panel Motor Driver Board, where it will loop back to the MSM; and finally traverse to the Loader.
COMMUNICATION INTERFACES
Regardless of whether scan chain outputs for local motors are provided by the FPGA or external registers, line drivers and receivers are not used for on-board communication. Also, they are not used for tower functions, which are direct I/O in the new design. Thus, the only communication interfaces are the following:
Unlike the I/O boards, MSM is the scan chain master and needs to buffer SCK, NPCS, and Reset in both the I/O and motor scan chains. These signals are buffered only once in the entire system. They are distributed in parallel and don't need to follow the link. For example, motor control data goes out to the RPMD and only the return data can go to the Loader interface, but SCK can branch after the buffer, with one branch going to the RPMD connector and the other to the Loader connector.
MOSI data being transmitted to the RPMD and MISO data coming in from it are both buffered. MOSI and MISO loopback data from the RPMD interface are taken directly to the Loader interface without additional buffering. Essentially, MSM is just a passive connector for the data to travel between RPMD and the Loader.
The MSM will consume eight transmitters (2 * 75174) for SCK, NPCS, Reset, and MOSI for the I/O and motor scan chains. It will consume two receivers (1/2 * 75175) for MISO in the two chains. The APU interface requires three transmitters (3/4 * 75174) and four receivers (1 * 75175). The 11 transmitters consume three 75174s with one spare section. The six receivers consume two 75175s with two spare channels.
The spare 75174 channel can be used to drive the "wink" LED. Previously, a spare LM339 from the vacuum accumulator wet circuitry was used, but this circuitry has moved to the new VPM. Either output of the 75174 may be used-- both can sink 60mA.
The only mixed signal requirement of the MSM is the motor winding test ADC. The TLV1549, an inexpensive ($1.90) single channel 10-bit ADC, could be used in this application, but its serial interface requires either an FPGA- or CPU-synthesized state machine for control. The raw motor feedback can't be input to it because the signal is differential and floating in the middle of 24V (when it isn't 0). If all of the motors were disconnected (and a driver turned on) the common mode input voltage would be 24V. Consequently, an amplifier is required to convert the signal to a 0-referenced single-ended input. Considering the software and hardware requirements to use the TLV1549, the original single-slope integrating converter from the CD3500 appears fairly equal. However, it too requires more than the MPM circuit indicates. Our system provides +/- 15V that could be used for the circuit except that it isn't clean enough. So, in addition to the many discrete components in the MPM circuit, we would have to add circuitry to clean up the power supplies. LDO regulators would be ideal for this except that negative LDOs are not available. A reasonable alternative would be 3-terminal adjustable regulators set to 11V.
Speed is irrelevant so there is no need to waste FPGA resources. A CPU program can access the serial ADC or control the integrating converter. The serial ADC has a three-wire interface (CS and CLK input and D out). The integrating converter requires two wires, the control bit and the comparator output. At least three pins of CPU Port A are available, which could take the FPGA entirely out of the design.
For the integrating converter, CPU TGATE1 or TGATE2 must be used for the input (from the comparator) in order to achieve accurate timing. This is required because the MSM CPU, unlike the MPM CPU, is always multitasking and can't be dedicated to conversion timing. The old integrating converter circuit uses two LF356 op amps, which would be replaced by one LF412C dual op amp. The 74HC00 used for the control line buffer, or some other HC component (e.g. HC04), would still be needed, because both the CPU and FPGA have Vout high of only 2.4V compared to the HC's 4.9V.
The only additional component needed with the serial ADC is the input amplifier. Given the possibility of 24V common-mode voltage, a current amplifier would be simpler than an op amp. However, the differential signal source comprises two taps off an equivalent 1M-154K-1M resistive divider. The Thevenin equivalents are Vth = 12.8V and 11.14V and Rth = 536K. Therefore, the signal inputs to a current amplifier, such as LM3900, would be 239 nA and 208 nA. But the input bias current (inverting input only) of the LM3900 varies from 30 nA to 200 nA. This much variation would reduce the initial accuracy to one bit. The alternative is to use a dual supply op amp with the signals dropped down below the supply with resistors. But this suffers the same component bloat that the old circuit experiences. Since the old circuit is cheaper and has already been tested (I am assuming this) we will use it, replacing the single op amp with a dual, the HC00 with an HC04 (because it may already be available), and the PIA interface with direct connections to the CPU.
The split analog power requirements are 6mA for LM311 and 7mA for LF412C. Any small 3-term regulator can supply 13mA. National's LH7001 positive/negative adjustable regulator provides both voltages in one package. Further, its low 1.2V VDO makes it close to an LDO. With this low VDO, it could probably provide +/- 12V from our system's +/- 15V supplies, especially considering our low current requirement (the LH7001 can deliver 100mA on both outputs).
FPGA
The FPGA needs to provide an I/O scan chain for tower and loader functions. However, 256 bits is much larger than needed. As described above, the tower functions, including two spare inputs and three spare outputs, consumes eight input and eight output bits. The loader currently also consumes eight input and eight output bits. A 64-bit scan chain (64 outputs and 64 inputs) would afford ample room for expansion.
The flash and RAM are accessed through a GAL22V10 in the FPM. The FPGA should continue to be uninvolved in flash access as this could prevent BDM access in the event of FPGA problems. However, the FPGA will have to assume responsibility for RAM access, because the CPU holds its chip select outputs high (inactive) rather than tri-stating them when it grants the bus to the motor controller.
The FPGA provides access to the 7-segment LED but not to the "wink" LED. The wink allows the CPU to announce that it is alive even when the FPGA is not functioning. Therefore, it is controlled by one of the CPU's Port A pins.
SPARE SCAN CHAIN I/O
As described above, the current tower I/O requirements leave two spare inputs and 3 spare outputs. It would cost little in components and board space to add another eight inputs and eight outputs to the MSM I/O scan chain. The spares could be used for additional tower functions or random functions in the body of the instrument but not for the loader. Since scan chain shift registers provide only logic inputs and outputs, in most uses they need additional circuitry. It's difficult to predict what might be needed, but solenoids are likely. Solenoids consume two bits, one for on/off and one for power. However, being connected to solenoid drivers doesn't prevent an output from being used for some other purpose. Therefore, it is reasonable to provide some solenoid drivers but to provide a means to tap into their control bits directly for a non-solenoid use.
We have two similar solenoid driver circuits, normal power, as on the Left Panel Module, and high power, as used for the tower's GS2 Release and Door Release. The high power circuit requires four sections of a ULN2003 and the normal three sections. Since each ULN3002 contains seven sections, one IC could support one high power and one normal solenoid.
The remaining outputs of the tower's output scan chain register should be used before the spare, as more compact register arrays can improve software efficiency. However, the power and on/off bits should be placed consistently with other drivers. Therefore, the Q5 output from the tower's output register should not be used as a solenoid control bit but left as uncommitted logic. The spare output bits of the tower scan chain output register would be used as follows:
The output bits of the spare shift register would be used as follows:
All of the spare bits should be brought out to the area of the board reserved for tower connectors. Each solenoid driver should have a normal 2-pin solenoid jack. The three spares from the committed tower control output register can be brought to a 5-pin jack that includes one digital ground and one +5V pin. The eight outputs from the spare register can be brought out to a 10-pin connector that include one digital ground and one +5V pin. Note that all but three of the logic bits alternately serve solenoid control and, if used in this capacity, may not be used for logic.
Each of the 10 spare inputs should be pulled up to 5V through a 10K resistor. Two sets of five of these can be brought out to 7-pin jacks including one digital ground and one +5V pin.
The total component consumption proposed for spare I/O is:
MECHANICAL REQUIREMENTS
CDNEXT: APU SYSTEM DESIGN REPORT 5
10/18/99
[Report4]LOOP STRUCTURE
Cdxsys3 and 4 describe the new scan chain design without addressing how current boards are affected by changes from the interim design. The following review describes the interim design, explains the rationale for the new design, and tells how existing boards are affected.
The interim system design mixes output and input registers in the scan chain. Each board in the chain has two connectors, an "input" from the upstream board (or master) and an "output" to the downstream board. The output connector on the last board in the chain loops back to the master. The distinction between MOSI and MISO data (in the bit stream) exists only at the master. On the slave boards, intermingled output and input bytes flow in one direction down to the last board, from where they return to the master.
The interim scan chain topology exhibits the following problems:
As described in detail in cdxsys3, the new design merges the downstream and upstream data paths into a single cable. The total amount of wiring is generally reduced because only the four data wires of the return path are replicated in the unified cable, compared to the full bus data return cable in the interim design. The unified cable affords additional opportunity for reducing the number of wires in certain cases, specifically loopback and end-of-chain. The end-of-chain wire reduction is especially valuable when strategically located. For example, making the Loader Board the end of both the MSM I/O and motor scan chains eliminates four wires compared to the loopback and 14 compared to the interim design.
The unified cable design is more flexible than the separate return cable, because it can support both separate and mixed input and output, whereas the separate cable topology requires mixed input and output. Wherever possible, separate input and output is preferred, because of its better address/bandwidth utilization, as already explained. However, in a few cases, separation is not feasible. For example, the new VSP controller is an Analog Devices ADSP-2104 with one of its two serial ports serving as a scan chain shift register. The port has no flow-through capability so this is emulated using the port's receive and transmit functions. Consequently, one full-duplex port is entirely consumed in emulating one-way data flow in the scan chain. In would be feasible to use one port for MOSI (output data) and the other for MISO (input data) but the other port will be used for reading a serial ADC. If the '2104 is placed at the end of the MOSI scan chain so that the MOSI-MISO loopback occurs exactly at the border between the input and output (simulated) scan chain registers, input and output data will be separate. Otherwise, they will be mixed in this case. The new topology has no problem with this-- it just allows devices to use the more efficient separated I/O if they choose to do so.
The interim design appears to afford a slight advantage in testing the integrity of the scan chain, because it feeds back all of the output data to the controller for verification. As described in cdxsys4, the new design requires putting the scan chain controller into a special loopback mode in which it doesn't assert the broadside load strobe NPCS or else none of the output data comes back to the controller for verification. This is actually true only if all 256 bits of the MISO data path are used for input. The sparser the MISO path, the more MOSI data is returned to the controller. For example, if 31 of the 32 MISO bytes are used for input, then the last input byte will be the last byte of output. That the output data begins being returned at the end of the chain and works toward the head as more inputs open up is particularly advantageous because only the last byte is really needed for verifying the entire chain (unless it contains all 0s or all 1s) because it has to traverse the entire chain. It might be reasonable to make the last byte of the MOSI scan chain a register connected to no outputs in order to be able to safely load arbitrary patterns for testing the chain.
Assuming that the MISO chain is filled with real inputs and, therefore, in the new design the special loopback mode must be invoked to test the output data, the fact that the interim design would afford continuous, automatic feedback in the same situation is no real benefit. Input data can be tested against output only in cases of actual output. Otherwise, the input data reflects the external signals captured on NPCS. Whatever has been written to the output image is ignored. But specific scan chain addressing is very fluid; adding, deleting, or changing one board affects all downstream addresses. It would be an unreasonable complication of the scan chain controller to either embed or download an I/O map in order to perform continuous integrity testing. The CPU knows the mapping, through the analyzer configuration file (analyz.ini) but it can't afford to perform continuous checking. Consequently, the interim design could only do the integrity test in a special mode, just like the new design.
APU CHANGE
The interim APU board contains a scan chain loopback jack, J9, which provides a return path for the data, i.e. MISO. It also contains the terminating resistors for SCK, NPCS, and Reset. As explained above, the new system merges the output and input paths into one cable. One board must provide the MOSI-MISO loopback and the terminating resistors, but it is not the APU. Therefore, J9 is eliminated. Its data return function (the two MISO wires) is moved to the unified scan chain connector J7.
I/O BOARDS
In the interim system, APU scan chain data traverses the following path:
In the new system, as already mentioned, the APU will have only the one unified bus connector. The data path will be APU to Right Panel Module to Status (loop back), to LPM, to VPM. There is no Tower Board. The APU will have one downstream connector. The RPM will have three scan chain connectors: upstream to the APU, loopback to the Status Board, and downstream to LPM. The LPM will have two connectors: upstream to RPM and downstream to VPM. The VPM will have one upstream connection to LPM.
None of the units, except optionally the VPM, will mix MOSI and MISO data. For example, MOSI data coming from upstream into the Status Board will go to U3 '595; from U3 to U6 DS1267; and from U6 back to the (loopback) cable, skipping U2. MISO data coming from downstream (from LPM via RPM) into the Status Board will go to U2 '597; and from U2 back to the (loopback) cable.
The RPM and LPM boards will contain many of the same components and sub-circuits as in the interim design but in different proportions. However, both of the new boards will contain input and output registers. As with the Status Board, the two will not mix. Upstream MOSI data flows down through only output registers while downstream MISO data flows up only through input registers.
On boards that may optionally serve as the end of a scan chain, CDXSYS4: SCAN CHAIN END BOARD: INTEGRITY TEST suggests including a removable jumper to connect the serial output of the MOSI tail register to the serial input of the MISO head register. CDXSYS3: SCAN CHAIN: SCAN CHAIN STREAM CONNECTIONS suggests the same treatment for the parallel control signals SCK, NPCS, and Reset. Loopback boards, such as Status, will not have the option, as they cannot serve as end boards, and end boards, such as Loader and VPM, will have these connections hard-wired. Only boards that can operate either in the middle or at the end of a scan chain, i.e. Left and Right Panel Modules, need the option. Further, the end board option will only be invoked during development, if at all.
A terminator/loopback plug affords a cleaner solution than jumpers on the board. The signals can be brought to jack into which a jumper adapter simultaneously terminated all three control signals and connects the MOSI tail to the MISO head. The adapter comprises three resistors and one loopback wire. This does not have to be a production assembly because it is only used during development. The most convenient arrangement would be an 8-pin (or 10-pin with one missing-pin key and one pin unused) female box type connector on the board and a hand-wired male terminator plug.
SCAN CHAIN REGISTER PULLUP/DOWN RESISTORS
In most of the interim circuits, single-ended data on the outputs of line receivers 75175 is pulled up to 5V through 10K resistors. For example, on the Status Board, R12 pulls up MOSI input from U4 pin 3 and R11 pulls up NPHRST (scan chain reset) input from U4 pin 13. The only purpose that these resistors serve now is to avoid floating inputs to CMOS parts in the event of catastrophic failure of the line receivers. This is not a very likely failure scenario in normal use but may occur during development and servicing.
Pulling up these signals is the opposite of the correct failure response. Reset presumably makes things as safe as possible. Forcing reset in case of failure of reset control is clearly better than forcing unreset. Regarding data, since reset clears output registers, presumably the circuits are designed to be safest in the 0 state if possible. Consequently, the safest replacement for valid data is 0. Therefore, all of the single-ended control signal pullups of NPHRST and MOSI or MISO data should be changed to pulldowns. The 10K value is still adequate, as none of these components are TTL.
CDNEXT: APU SYSTEM DESIGN REPORT 6
10/19/99
[Report5]MSM WITHOUT LOADER
SCAN CHAIN TERMINATION
The MSM controls two scan chains: the motor chain (MPI) for the entire instrument and an I/O chain for its own communication with the Loader board. The Loader has been chosen as the end board for both of these chains in order to reduce the wires in its connection to the main body of the instrument. As the end-of-chain, the Loader is expected to terminate both the motor scan chain and the I/O scan chain. To terminate a scan chain means to provide the MOSI-MISO loopback and terminating resistors for the parallel control signals SCK, NPCS, and Reset (See cdxsys4: Scan Chain Additional Details: End-Chain Board). Each of these scan chains has it own SCK, NPCS, and MOSI-MISO loopback, but they share a common Reset (this eliminates two wires from the Loader cable).
Andy O has reminded us that a CS model instrument or one in a tracked cluster won't have a Loader. Therefore, the MSM will require one or two optional termination plugs similar to the one described in cdxsys5: Scan Chain Topology: End Board. If one plug is used then it must contain five terminating resistors and two loopback wires. If two plugs are used, one to terminate the I/O scan chain and one the motor chain then a standard terminator adapter could be used for both of these and the optional APU daisy-chain board terminator. In this case, one of the MSM scan chains' terminator jack Reset pin pairs would not be connected in order to avoid double-terminating the shared Reset.
MISSING MOTORS
If the Loader is not present then the actual motor scan chain will be shorter than expected by the scan chain controller. This brings up the general question of what happens when any scan chain is not fully populated. First, it should be clear that the outputs (MOSI data) are not affected by any downstream events. Missing or damaged shift registers have an effect only at their own level and downstream.
What happens to the inputs (MISO data) is not obvious. If less than 14 motor driver registers were present, the MOSI-MISO loopback would occur upstream of where it would be expected, possibly causing some of the input data to be pushed off the head end of its image in the controller. However, regardless of where the loopback occurs, if the chain shifts 56 bits and the input image contains 56 bits, then the head of the input chain will always end up at the head of the image. The early loopback can't change this. What it does do is to fill some of the input image with output data from the previous scan cycle. Since each motor adds four output and four input bits to the full MOSI-MISO loop, each missing motor (registers) allows eight output bits, i.e. the control signals for two motors, to be passed into the input image.
The same effect occurs on I/O scan chains. Cdxsys5: Scan Chain Topology: Loop Structure suggests taking advantage of this effect by making the last byte of the APU I/O MOSI scan chain a register connected to no outputs in order to be able to safely load arbitrary patterns for testing the chain. This cost practically nothing for the APU scan chain, whose last MOSI element is in the VPM DSP's on-chip RAM. However, it is not feasible for the motor scan chain, which can't afford to give up a motor slot. It may also not be feasible (due to cost and complexity) for the MSM I/O scan chain, because it would have to be implemented in both the Loader and optionally in the MSM for Loader-less systems.
VPM.DOC (Word 97 @ k:\cdx\doc\analyzer)
Last modified 10/28/99
DESIGN REQUIREMENTS
IMPLEMENTATION REQUIREMENTS
STRUCTURE AND BEHAVIOR
SCAN CHAIN INTERFACE
While NPCS is low, the scan chain shifts and no other activity related to the raw shift registers takes place. When NPCS goes high, there is a four usec. period during which the input command (MOSI) is interpreted, possibly immediately changing the output image, which is subsequently copied to the output shift register (MISO). When NPCS goes low, the new output and the previous input begin serially shifting again. The output image register is needed for other processes to post status during the shift period. The input doesn't have one image. The command is fully interpreted during NPCS high and any value contained in the command is copied directly to the site where it is used.
ADC INTERPRETER
The ADC interpreter continuously reads the ADC, cycling through seven of the eight channels. The ADC is shared by servo control and the unrelated wet sensor detector, suggesting that it might simplify the architecture to separate the continuous ADC reading from any interpreters. This would be done by having an ADC controller continuously read and post the seven 10-bit values for independent asynchronous processes to interpret. This uses more storage memory but possibly simpler state machines.
For each of the five servo channels do the following:
The vacuum accumulator wet detection process can be based on a continuously driven or strobed sensor. Sensor circuits for the two approaches are very similar. The difference is mainly the use of a diode in the continuously driven circuit. The diode allows charge to accumulate continuously on the storage capacitor even while the sensor branch repeatedly charges and discharges. Except for the diode, the two approaches use the same number of components.
The continuously driven circuit can be excited at any convenient frequency and can be read at any time. The strobed circuit requires some coordination between the strobe assertion time and the subsequent ADC reading; and the strobe must also be deasserted to restore the sensor branch to a baseline state. For both circuits, the sensor is wet if the voltage on the storage capacitor falls below (or doesn't reach, in the strobed circuit) some circuit-dependent value. This value is relatively easy to predict for a continuously driven circuit, such as the one on the CD32/3500 VPM. The 5V. oscillator output charges the storage capacitor through 33K and a diode with a 1M bleeder to ground. If the oscillator's output were a square wave and the sensor were dry, the sense voltage would be 4.2 V. The reference circuit's oscillator does not produce a square wave. Excitation provided by an FPGA would be square but probably not as high as 5V. It is probably reasonable to use a fixed threshold for comparison, but the threshold could be set by command.
For each of the two wet sensor detectors, if the voltage is below the threshold, set the corresponding wet sensor flag in the output image; otherwise clear the flag.
TIMED EVENT PROCESS
There are two types of timed events, solenoid power down and vacuum/pressure recovery time. Although three of the solenoids are related to the recovery time at the application level and unrelated to the other two solenoids, it will be more convenient to ignore these relationships and treat all five solenoid power-downs as a single group unrelated to the recovery timers.
For implementation in a CPU, the ten timed processes (five solenoids and five VP recovery timers) would be implemented as posted times to compare to a global counter/timer. For implementation in an FPGA, it is more efficient to use one counter for each process. Ten counters are required. All counters are started and restarted by processes other than the timer. Therefore, timer processes only need to respond to actual timeouts.
Since each timer is dedicated to a particular device, the responding process does not need to be instructed on how to respond by the initiator of the timer. For each of the five solenoids, if the timer has reached the end time, assign 0 to the solenoid's power bit. It is not necessary to reset the timer. This can be a simple combinatorial-- e.g. if timer = 0 then output bit <= 0. For each of the five VP recovery timers, if the timer has reached the end time, assign 1 to the solenoid's power bit. It is also necessary to disable the recovery error timer process in order to allow the Master to clear the recovery error flag. When a recovery timer is disabled, the error flag will not be set, regardless of the timer value. Thus, the VP recovery timer process is as follows:
11/9/99
[Report6]For simple I/O, software can generally treat a scan chain much like a random access parallel bus. However, for sequential encoded communication, as between the APU and VPM, software needs to know when something that it has written to scan chain registers has actually been received before it can write the next value. While it is possible to use time to control writing, this complicates software and reduces performance. A better solution is for scan chain controllers to indicate when they have completed a scan. This can be done by interrupting the CPU.
Both the APU and MSM CPUs appear to have external IRQ levels 5 and 6 free. IRQ level 7 could be used even though the level is non-maskable, because the program will use port configuration to disable individual interrupt sources instead of disabling the level. However, it may be better to reserve IRQ7 for a future use where its non-maskable level is needed.
The most reliable means of asserting the IRQ is to hold it active until the CPU accesses a particular address or writes to a particular bit in the scan chain controller, which it will do only in the ISR. Normally, the CPU will disable this interrupt, enabling it only for sequential writing to a single location. Therefore, the IRQ will usually be asserted. The interrupt request can't be cleared on a routine operation, such as writing to any scan chain address, because software may continue to access random locations even while simultaneously executing a series of sequential writes.
The FPGA's job is simple: assert (low) the interrupt when NPCS goes high; disassert it when the CPU accesses the clear mechanism (R/W specific address or write specific bit). This is required only for I/O scan chains, not for motor scan chains.
APU, RPM, STATUS PANEL, AND CD4000
CD4000 REQUIREMENTS
The interim design attempts to meet the hardware control needs of a CD3200-class instrument with some spare capability for experimenting. Recently, John V, J. P. Y, and I have been reviewing the hardware control requirements of the CD4000 to try to anticipate its needs as well. As John has explained, the goal is to make an initial instrument that supports a wide variety of configurations, even though this will incur some cost penalty, and to remove portions to reduce the cost for any specific price-performance point. The major cost item is, of course, the laser. The next most costly items are hardware units, such as syringes and motors, followed by the number of channels on the APU, and finally by I/O control circuits. Given the relatively low cost impact of I/O control circuits, it is reasonable to provide extra capability where it we think it may prove useful. However, as much as possible, we want to modularize sub-circuits to facilitate depopulation, not only to reduce cost, but to avoid wasting scan chain resources that may be needed elsewhere.
The CD4000's basic control requirements are:
SOLENOID CONTROLS
The interim RPM contains 24 solenoid drivers, the LPM 32. The CD3200 has 15 solenoids on the right panel and 25 on the left. The CD4000 has 82 non-loader solenoids with all but three located on the left panel. To support the CD4000, 26 solenoid controls need to be added. As many of these as possible should be added to the LPM. The LPM is somewhat space constrained and may not be able to absorb all of these. Any remaining ones can be added to the RPM.
The solenoid driver circuits on both the interim RPM and LPM schematics contain an error that needs to be corrected. Some of the ULN2003 drivers mismatch inputs to outputs where one section is not used. Pin 4 input controls pin 13 output and pin 5 controls pin 12. In some cases pins 4 and 12 are connected instead of 4 and 13 or 5 and 12. In other cases, the mismatch is that pins 5 and 13 are connected.
SENSORS
The CD4000 has 33 non-loader sensors. Four of these are fluid sensors, which are treated separately. Ten are motor home flags, which are part of the motor control bus. There are 19 general-purpose sensors located throughout the body of the instrument. These can be provided by adding two HC597s to the RPM and one to the LPM. One half of one of these registers on the RPM will be used for the in-line fluid sensors (see discussion of RPM below). Each of the remaining 12 RPM and 8 LPM inputs should be connected to a 3-pin input device jack with the following pinout:
PERISTALTIC PUMP MOTOR CONTROLLER
The MSM's 14 motor limit is hard-wired into its timing and control mechanisms and cannot be increased. John and I considered the possibility of including on the RPM a fixed-rate controller for the three right panel peristaltic pumps, but J. P. pointed out that these pumps control flow cell fluid rates, which must be selectable from scripts. Even with selectable rates, these pumps demand less control than other steppers. J. P. says that the CD4000 may use ramps for these pumps, but this is not necessary. The pumps have no absolute position to maintain, so any position slip has no long term effect and doesn't change the slew rate. The worst possible effect of ramp-less driving is startup cogging, where a step fails to achieve more than 50% travel and the next step actually goes backward. However, a peristaltic pump presents a nearly constant load with little momentum, so any slew rate that allows cogging at startup risks cogging at other times as well and shouldn't be used. Therefore, a ramp-less controller should suffice for these motors.
The RPM is ideally located for accessing the delivery pump motors. However, the only communication means from either of our centers of intelligence (APU and MSM) to the RPM is the APU I/O scan chain, which is not designed for intelligent conversation. Never-the-less, we are putting a programmable VPM on the scan chain by encoding some scan chain (MOSI) bits to assign multiple uses to bits, and this could be done for a simple motor controller as well. The main problem would be scan chain address consumption. This couldn't be done with fewer than 16 bits (3 for on/off, 2 for register selection, and 11 for speed. To determine whether this approach could be supported, we need to consider how the scan chain would be used in a CD4000 box. The MOSI bit requirements are:
The total MOSI bit consumption is 212, leaving 44 available. Clearly, this leaves enough to implement the dumb stepper controller on the RPM. However, even a ramp-less controller would require a fairly substantial PLD. Assuming a 16-bit step counter and reload register for each of three motors, plus a 2-bit gray-code counter to generate step patterns, plus 24 bits for scan chain registers, 126 bits are needed. The smallest PLD that could be used would be an EPM7128.
An alternative approach would be a motor coprocessor that communicates with the APU's CPU through the PC104 interface. The coprocessor board would include on-board motor drivers plus power and motor cabling. This would be too big to implement as a mezzanine board. There is room on the APU's mounting plate for such a board, but the APU's PC104 connector must be properly positioned to minimize cable length (see next topic).
Considering the current uncertainty regarding fluid delivery precision and reliability when using a peristaltic pump, John and I have decided not to implement any means of driving the pumps at this time. The two alternatives will be reconsidered in the future. For now, the most important action is to be sure that the APU configuration will support a non-mezzanine PC104 coprocessor.
The current APU configuration doesn't allow the PC104 to serve as an effective coprocessor interface. The connector is located at the top of the board and the board is mounted at the top of the panel. PC104 is intended for stacking and is not properly terminated for any but the shortest bus lengths. Therefore, only a mezzanine board could be used now. But the most likely coprocessor function would be to extend I/O capabilities, which requires additional cabling and usually a fairly large board, for which a mezzanine is not suitable. Either the APU should move down on the mounting plate or its PC104 connector should be moved to the bottom edge of the board.
No matter what is done to the Status Panel or RPM, the cabling requirements of the Status Panel will not change. In contrast, there may be instrument configurations in which the RPM is completely removed, with the APU's I/O scan chain going directly to the LPM. Therefore, putting the downstream loopback connector to the Status Board on the APU would afford greater flexibility than putting it on the RPM.
If there is room on the left edge (circuit side) of the APU then the Status Panel loopback connector should be located here. The standard 10-pin scan chain connector to the RPM should be located at the bottom of the left edge; the 14-pin loopback connector can be located anywhere above this. Only one set of drivers and receivers should be located on the APU. These should be located upstream from the Status Panel connector. The return signals from the Status Panel are routed (on the APU) directly to the RPM connector without additional buffering. See AD46- Hardware Partitioning- Cabling or cdxsys.doc for scan chain cable definitions.
SCAN CHAIN REGISTER BYPASS
It is likely that some units on the APU I/O scan chain will, in some instrument configurations, not be used, possibly consuming chain addresses that are needed elsewhere. To simplify configuration, jumpers will be provided to connect units into or remove them from the scan chain. Only serial data needs to be bypassed, as all other signals are distributed in parallel. Thus, the common terminal of a 3-pin jack is the data input to the next unit downstream on the MOSI leg or upstream on the MISO leg. The two option terminals connect to the data input to the (de)selected unit and to the data output from the unit.
What constitutes a "unit" depends on function. Most units are 8-bit registers, such as the '595 and '597. Thus, each solenoid scan chain unit comprises a bank of four solenoids (4 bits on/off and 4 bits power). The VPM and Status Board both contain larger scan chain units, but all components of both of these modules are currently required. Therefore, neither of these modules will have any bypass jumpers.
IN-LINE FLUID SENSORS
AD46: Hardware Partitioning: Fluid In-Line Sensors suggests that the CD3200's two in-line sensors, one optical and the other ultrasonic, be moved to the RPM. The ultrasonic sensor is circuit is an OEM module, and it was suggested that this be piggybacked onto the RPM. Further review of the target analyzer indicates that we are not clear about certain I/O requirements, including the in-line sensors. Given that the ultrasonic sensor is already a self-contained circuit that needs only power and a single scan chain input, it makes more sense to just provide a jack for it. The RPM should contain two of these, in case we decide to use ultrasonic for both sensors. This is the same as the CD3200's FCM (9631320) J11, a 3-pin jack with the following pin assignments:
As mentioned in AD46, the optical sensor circuit is our own module 9631160 (drawing number). This is a very simple circuit, and it may still make sense to include this on the RPM. Two of these should be included, as we may decide to use optical for both sensors. Given the surplus of MISO bits and the uncertainty around the RPM I/O requirements, it would be best to make the optical and ultrasonic inputs independent. Thus, the RPM will simultaneously support two ultrasonic and two optical in-line sensors. These four MISO inputs can be attached to one nibble of the two HC597s provided on the RPM for general-purpose input (see Sensors topic above).
The original circuit contains a jumper-selectable option that we will not use and the alternate components can be deleted. This frees U1C, which, along with the spare U1D, can form the second circuit. Thus, only one LM339 is needed for two sensor circuits. The jumper, W1, is not needed for the old selection purpose but will be retained for another purpose. Although having the sensor circuit on the RPM is more efficient and affords a better mechanical arrangement than the separate PCB, we may find that the longer opto-interrupter leads pick up too much noise. Also, the manufacturing and service advantage of a common part may outweigh the advantages. Therefore, it would be good to provide the old interface as an option. W1 will provide the means to connect the MISO input to either the new signal (TP2, W1B-C) or to a 3-pin jack that is compatible with the old interface. The pinout of this jack is:
In the old circuit, the opto-interrupter leads are soldered to the PCB. This will be replaced by a 4-pin jack. The PCB space provided for DS2 and Q1 is eliminated. The old circuit's output jack J1 is eliminated. The output (W1-C) is jumper-selected to the MISO input. To use the circuit on the RPM, the four opto-interrupter leads are connected to the 4-pin jack. The old adjustable sensor circuit (9631160) is used by connecting its J1 to the 3-pin jack on the RPM.
SHEAR VALVE AND Y-VALVE CONTROLLER
INTERIM DESIGN CHANGES
The interim RPM version includes a shear valve and Y-valve motor controller/driver interface. An Altera EPM7032 provides the logic for both motors and an L298 dual H bridge provides the drive. The circuitry is mostly correct but the logic in the 7032 is incorrect and will not work. Also, to support additional configuration options, we have decided to modify the requirements as follows:
These requirements can be met using the L298 for the two shear valves and an L293D for the two Y-valves. The L293D is similar to the L298 but delivers lower power in a smaller IC and with integral suppression diodes. An L293 may be substituted for the L293D but would require eight external suppression diodes. The EPM7032 can support the increased requirements after correcting some of the interim design's inefficiencies.
MOTOR ASSEMBLIES
The shear valve and y-valve are similar in that they both comprise a bidirectional DC motor with some form of CW and CCW End Of Travel indication. However, the shear valve cannot rotate continuously, while the y-valve can, and their position indicator mechanisms are quite different.
The shear valve position is indicated by three sensors, EOT (End Of Travel), CWL (CW overtorque), and CCWL (Counter Clockwise Overtorque). Shear valve movement is mechanically blocked at both CW and CCW extremes. The motor pulls the valve via a spring, which allows the motor to continue moving even after the valve stops. When the motor pulls a certain amount ahead of the valve, the appropriate overtorque signal becomes true. If this occurs during travel (i.e. when EOT is false) it is ignored. If it occurs when EOT is true, the motor is turned off (failure to do this is one of the errors in the interim design).
The shear valve sensor board circuit is arranged to produce the following outputs:
EOT, CWL, and CCWL are all inverted on the RPM so that they are logically /EOT, CWL, and CCWL when input to the motor control PLD.
The shear valve motor drive is arranged so that normal polarity (positive current flow into the + terminal) drives the valve in the CW direction. Also, the Abbott standard cabling convention is followed, with the motor's + terminal attached to pin 1 of a 2-pin jack. Jack pin 1 is, by convention, the left-most viewed from above with the latching tongue facing down:
The y-valve assembly contains two position feedback sensors. These are simple opto-interrupters. A 1-K pullup to +5 is required on the RPM, but the inverters in the interim design serve no purpose and should be eliminated. The y-valve motor has a single-slot wheel that allows a phototransistor to pull the signal low when the slot is in the interruptor. Thus, CW and CCW are indicated by their respective signals low. Logically, they are /CW and /CCW. As with the shear valve, the y-valve controller must stop when the target position is reached. The interim design (equations for the 7032) fails to do this.
The y-valve assembly follows the same motor connection conventions as the shear valve. Motor + is connected to pin 1 of a 2-pin jack. Normal polarity drives the motor CW. The CW and CCW sensors are 90 degrees apart. Either one could be considered the CW or CCW but one way would require 270 degrees full travel and the other only 90 degrees. Obviously, we've chosen the latter. The position called sense 1 in the original DC Motor Driver (drawing 9631940) is CW, while sense 2 is CCW. These are called Y_CWL and Y_CCWL in the interim schematic.
DRIVERS
The L298 is a dual 2 Amp H bridge while the L293D is a dual 0.6 Amp H bridge. In both ICs, each bridge is controlled by three inputs, a general enable/disable and two half-bridge polarity selectors. A motor can be turned off either by the general disable or by assigning the same value (1 or 0) to both polarity selectors. These methods are not electrically identical but the differences are evident only when dynamic braking and/or PWM are used. We don't use either. Therefore, we only need to control the two half-bridges for each motor.
The interim design contains the following errors that will be corrected:
NEW CIRCUIT
The shear valves will be called SV0 and SV1, the y-valves YV0 and YV1. For consistency, the + of each motor will be connected to lower numbered half bridge, i.e:
The other driver pins are as follows:
When Sxx_CW is high and SxxCCW is low, the motor rotates CW. When Sxx_CW is low and SxxCCW is high, the motor rotates CCW. Both high or both low can be used for motor off. For consistency, only both low will be used. These eight pins are all connected to the motor control PLD and to an 8-pin SIP header. The header pins simplify rewiring for alternate scan chain use when a particular motor interface is deselected, in which case the corresponding two PLD outputs directly reflect MOSI bits; the MOSI direction control bit, which is called xCW (1 selects CW, 0 CCW) will be output as xCW and the on/off bit will be output as xCCW.
Each shear valve controller PLD section will have four inputs:
Each y-valve controller PLD section will have three inputs:
The four SVx_CWL and SVx_CCWL sensor inputs (i.e. before inversion to avoid having to disable the inverters) are connected to an 8-pin SIP pin header, the other four pins of which connected to the four SYx_CWH and SYx_CCWH signals. This provides a convenient means of connecting alternate inputs when a motor controller is disabled, in which case, the corresponding two inputs are gated directly to the MISO register. Note that disabling one motor controller frees two scan chain outputs and two inputs.
To interface to an old shear valve with integral controller, the motor controller section is disabled. Direct scan chain outputs provide the two control signals, ON and CW/CCW; the two position state inputs, CWACK and CCWACK, are routed directly to the MISO scan chain. To simplify attaching the shear valve, the RPM will include two 7-pin SIP headers, one for each of the two shear valve interfaces. These headers afford only a mechanical convenience, bringing two alternate outputs and two inputs (from the same motor controller section) to a single connector that is compatible with the old Shear Valve Controller's J2 (which is also FCM J8). The pinout is:
If a shear valve controller is disabled, whether the inputs are used for CWACK/CCWACK or some other purpose, they will be inverted. The only reason for going through the inverters is to avoid complicating the circuit with some means of multiplexing the alternate inputs with the inverter outputs. Software will not be affected by this inversion, because it will be hidden in the analyzer configuration file.
The interim circuit routes motor position signals through the 7032 to a separate HC597 for MISO input. The new design will capture them on the PLD's own internal 8-bit MISO shift register. The y-valve position sensors will be connected directly to the scan chain. For the shear valves, the internal signals, CWACK and CCWACK, generated from raw inputs (and state memory) will be connected to the MISO scan chain.
In the interim circuit, the 7032 has two scan chain data connections, reflecting the interim design's single leg chain architecture. For the new scan chain topology, the PLD requires four data connections, as follows:
PLD RESOURCE REQUIREMENTS
The PLD I/O pin totals are 23 inputs and 10 outputs, but the list doesn't include a convenient means of disabling motor control sections. It is possible to reprogram the PLD but it may be more convenient to provide an enable control pin for each one. This would require four more inputs. The EPM7032 has 32 macrocells and is, therefore, limited to 32 general I/O pins. It has four inputs that can be used as general or specific function inputs, but this still brings up the total to 36 vs. the 37 pins that would be needed for full selection capability. Being able to reconfigure via input pins is not necessarily easier than reprogramming the PLD. It isn't worth going to the next larger PLD, which is twice the size and complexity of the 7032, just to get one more configuration pin. Therefore, the YV0 controller will be permanently enabled (except by reprogramming the PLD). The other three motor controllers will have individual enable pins. The three inputs will be:
The scan chain will comprise one 8-bit MOSI input register and one 8-bit MISO output register embedded in the 7032, consuming 16 of the 32 available macrocells. Eight macrocells will be used for the four motor output controls (two per motor). Four macrocells will be used to generate the shear valves' latched CW and CCW states. Eight state flags will be driven onto the scan chain MISO register by NPCS, but none of these will need additional macrocells. Thus, only 28 macrocells will be consumed. The 7032's global clock pin 43 will be used for SCK. The other three special-purpose inputs, OE1 pin 44, GCLR pin 1, and EO2 pin 2, will be used as general-purpose inputs.
The controller logic contains two kinds of product terms, external and buried. External terms comprise motor controls and scan chain data (MOSI and MISO outputs). Buried terms comprise the scan chain shift registers and the four latched shear valve home position states, SV0_CWACK, SV0_CCWACK, SV0_CWACK, and SV1_CCWACK. These states and the four raw y-valve position inputs, YV0_CWL, YV0_CCWL, YV1_CWL, and YV1_CCWL, are latched into the 8-bit MISO register when NPCS is high. Latching may be asynchronous or synchronized on the global clock, which is driven by the scan chain's SCK. These eight signals are also used in motor control product terms. The shear valve state terms have the following equations:
SVx_CWACK = SVx_EOT & SVx_CWL & !SVx_CCWACK | SVx_EOT & SVx_CWACK & !SVx_CCWACK & !reset
SVx_CCWACK = SVx_EOT & SVx_CCWL & !SVx_CWACK | SVx_EOT & SVx_CCWACK & !SVx_CWACK & !reset
The conceptual meaning of these equations is basically that if EOT and the related overtorque are both true then the shear valve has reached the CW or CCW home position. This is latched by allowing the signal to hold itself true. One caveat that applies to both the initial home event and the latched state is that they can't be true if the other home position is true. This prevents the startup EOT coming out of CW toward CCW from latching CCWACK and the EOT from CCW toward CW latching as the CWACK-- remember, EOT is not position-specific. Reset should clear both the MOSI and MISO shift registers if possible.
The 8-bit MOSI shift register outputs are internal motor control signals or simple outputs (for disabled motor controllers). As motor control signals, these are:
The remaining shear valve controller equations are:
SVx_CW = SVx_EN & SVx_On & SVx_CW & !SVx_OC & !SVx_UC & !SVx_CWACK & !reset |
!SVx_EN & SVx_CW & !reset
SVx_CCW = SVx_EN & SVx_On & !SVx_CW & !SVx_OC & !SVx_UC & !SVx_CCWACK & !reset |
!SVx_EN & SVx_ON & !reset
The y-valve controller equations are:
SY0_CW = SY0_On & SY0_CW & !SY0_OC & !SY0_UC & !SY0_CWH & !reset
SY0_CCW = SY0_On & !SY0_CW & !SY0_OC & !SY0_UC & !SY0_CCWH & !reset
SY1_CW = SY1_EN & SY1_On & SY1_CW & !SY1_OC & !SY1_UC & !SY1_CWH & !reset |
!SY1_EN & SY1_CW & !reset
SY1_CCW = SY1_EN & SY1_On & !SY1_CW & !SY1_OC & !SY1_UC & !SY1_CCWH & !reset |
!SY1_EN & SY1_ON & !reset
The eight MISO scan chain registers are loaded during NPCS high. The load data are:
SVx_EN & SVx_CWACK | !SVx_EN & SVx_CWL
SVx_EN & SVx_CCWACK | !SVx_EN & SVx_CCWL
YVx_CWH
YVx_CCWH
CONNECTORS
The interim RPM contains two SPI/MPI connectors. These are replaced in the new design by the following:
All of the I/O connectors should be located on the top and left (component side view) edges of the board to facilitate harnessing. The in-line sensor jacks should be located at the extreme upper left. The others may be located for the convenience of the circuit. The power jack should be located on the right edge of the board below the APU or Status Panel scan chain connector.
RPM RECAP
LEFT PANEL MODULE
The interim LPM design strobes the fluid sensors on inverted NPCS, i.e. on every 132 usec. cycle of the scan chain. This is excessive and would substantially aggravate the existing sensor electrolysis problem. The strobe frequency needs to be reduced to the minimum, which is approximately four times per second. Also, the interim circuit incorrectly mimics the original circuit and may not function properly. The original 3500/3200 circuit (FCM drawing 9631320) uses a one-shot to generate a signal to charge the storage capacitors and then capture the comparators' outputs. The 20K and .001uF timing elements used with the LS123 produce a 10 usec. ramp period compared to the 4 usec. NPCS high period. Despite this substantial timing change, the sense (comparator) circuits in the interim version are identical to the original.
It would simplify software somewhat if the circuit were self-perpetuating and enabled by program control, but this would require some means of timing the 250 msec. sample period, which is too long for one-shots. A '555 timer is better suited to this interval but can't directly interface with the CMOS scan chain registers. A counter, such as HC4040 (12-stage ripple) could be used to divide down the NPCS. Alternatively, we could return to the old CD32/3500 approach, which is to let the analyzer program periodically strobe the circuit and then test the input. In either case, an all CMOS 10 usec. timer and latch mechanism is required. This will comprise an HC4528 one-shot and an HC273. The one-shot timing elements are .001 uF and 10K. The rest of the circuit is identical to the interim design, which is to say the original CD3500 FCM circuit.
MISCELLANEOUS CHANGES
CDXSYS 8, 9, and 10 were originally published as AD45, 46, and 47, i.e. as part of the on-going software design meeting record. They have been moved into the system design record.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 8
Sept. 22, 1999
[Report7] [Report9]MEETING MINUTES Originally published as AD45.DOC
PARTICIPANTS
David McCracken (chair), John V, Jack W, Lucy L, Dave R.
AGENDA
DISCUSSIONS
DUAL PORT MEMORY
Jack presented the case for using dual port RAM for APU-FPM communication. The obvious advantage of dual port memory vs. serial communication is speed. Jack cited several examples, such as the possibility of a remote fork requiring as long as 70 msec. in the serial protocol. Jack's document (dpram.vsd) points out that this is the worst case, in which communication fails two times before finally succeeding. In another example, just to read a remote sensor could require as long as 14 msec. Both of these operations would execute in a few microseconds with dual port memory for communication.
The only disadvantage of the dual port RAM approach is that 24 wires are needed to connect the APU and FPM together. If the two boards were close together, this would not pose a significant problem. Dave R added that the simple wiring scheme could not be extended to boards that are far apart because of electrical noise, so the amount of wires needed to bridge some distance might be two or three times that cited for APU and FPM boards in close proximity. David suggested that the advantages mentioned by Jack could be realized without the disadvantages by using Quick Ring, which is a very high speed (e.g. 1 GHz) serial bus designed to appear as dual port memory to the CPUs. The disadvantage of this approach is the high component cost.
Jack suggested that, in addition to its raw speed advantage, dual port memory could simplify load balancing. If the FPM, for example, were to periodically copy its sensor status image into shared memory then either the APU or the FPM could act on sensor conditions, incurring no time or CPU bandwidth penalty.
APU-FPM SYSTEM ORGANIZATION
When we started the CDNext-3200 design, we envisioned the following organization (this is copied directly from AD1.DOC)
NEW SYSTEM
1. APU (Analyzer Processing Module) combines of CPU-DCM, SPM, and MAM.
2. FPM (Flow Processing Module) combines FCM, MPM, VPM, and SHM (tower and loader).
3. APU and FPM both use M68340.
4. I/O scan chains on both APU and FPM; stepper control scan chain on FPM.
5. Communication links:
5.1. Data station communicates with APU through HSSL (no significant change).
5.2. APU-FPM communication is similar to current CPU-SHM serial.
5.3. CPU-MPM serial interface replaced by APU-FPM communication.
5.4. Peripheral Bus replaced by APU-FPM communication to FPM I/O scan chain and APU’s own scan chain.
5.5. CPU-MPM serial communication replaced by APU-FPM serial link.
During development, we have experimented with and considered alternative organizations, such as Jack's dual port memory proposal. In AD42, Dave R reported working around the fact that APU-FPM communication had not been implemented (or even fully defined). From AD42:
"Our system design has not yet solidified around the APU-FPM communication. Dave R developed a work-around this gap by wiring the APU to control the FPM's scan chains. He also reconfigured the FPM's CPU to provide some supporting intelligence. This system is basically ready to run flow scripts on the APU. The standard flow scripts will need to be modified so that the APU directly executes tower and loader macros."
The original plan was and remains appealing because it represents a natural evolution of the CD3200, with updated design and technology. We know that this basic architecture works and should work better with the improvements that we have made. A big plus is that the old flow scripts and macros need only translation and minor adjustments to be functional in a new APU-FPM system. Even if we were to devise a superior organization, we don't have a large enough team to implement one. By concentrating on technical improvements rather than sweeping architectural change, we can maximize our effectiveness.
SYSTEM DESIGN
The APU and FPM boards largely fulfill their requirements in their current hardware form. They contain the intelligence (CPUs that can execute scripts and communicate) and remote control (scan chain) capabilities to replace, respectively, the CPUDCM and TWR, LDR, and MPM (in addition to the other subsumed units).
The APU and FPM boards do not contain significant I/O resources and need to operate with remote I/O and motor driver boards. The natural remote I/O division from the point of view of scripts is for all local CPUDCM I/O to attach to the APU scan chain; all local TWR and LDR I/O to attach to the FPM I/O scan chain; and all motors to attach to the FPM's motor scan chain. This division should generally reflect physical locality as well, because the TWR and LDR units are physically distinct and the primary I/O responsibility of the APU is to manipulate valves to effect fluid flow related to sample measurement.
The left and right panel and tower boards of the most current APU-FPM prototype do not reflect the expected division. There are several reasons for this. One is that the locality of the tower and loader boards has been diminished by our desire to pull them into the body of the instrument to remove them from fluid zones (John says that our competitors have cited the exposure of these boards as evidence that the CD3200 is an unfinished prototype). Another reason is that we want to avoid having any board attach to both the APU and FPM scan chains, as this indicates a poor division of I/O. To avoid this, a temporary decision was made to simply dump all I/O responsibility onto either the APU or the FPM (or the FPM physical scan chain under direct APU control). This decision needs to be revisited. There are several competing requirements, including:
All of these requirements have to balance against each other. For example, having both the APU and FPM scan chains on one large board would reduce the number of scan chain connections compared to several smaller boards, each connected only to one scan chain or the other. Also, a larger board reduces the number of power connections but tends to increase local I/O wiring, because it can't be located as close to the electromechanical devices as several smaller boards.
The relatively small tower board represents approximately half of the FPM's I/O responsibility. Loader I/O represents most of the remainder except for special hardware to control the servoed pressure/vacuums and strobed fluid sensors located directly on the FPM. Considering only size and ignoring location, the APU scan chain could be connected to the right panel and half of the left panel while half of the left panel and the tower could be connected to the FPM scan chain. This would yield one very large board for all APU I/O and one moderate-sized FPM I/O board, minimizing connections but probably significantly increasing local I/O wiring. While this might be a viable solution, it more likely represents an extreme position. A more reasonable possibility might be two moderate-size APU I/O boards, one located on the left panel and the other on the right.
At the next meeting, the group will examine a CD3200 and determine a division of remote I/O based on the principles described here. To assist in this, John will procure an instrument (preferably with the optics bench removed). Also, the document "CDNEXT: CD3200-CLASS INSTRUMENT I/O" (k:\cdx\doc\analyzer\cdx32io.doc) has been prepared (primarily from the CDM's analyz.ini) as a kind of worksheet listing all of the I/O to make sure that we don't neglect any known devices.
APU-FPM COUPLING
Considering that the CD3200 obviously can tolerate the current level of CPUDCM-SHM coupling and that the only potential additional coupling for the APU-FPM is in the strobed sensors and servoed pressure/vacuum controls, it would be reasonable to predict that, even with no effort to reduce the coupling, the proposed APU-FPM system can meet its I/O requirements. However, reducing the coupling is desirable both to "buy" CPU bandwidth for other functions and to improve domain cohesion (and, therefore, maintainability).
It isn't difficult to find instances of what could become APU-FPM coupling in the existing CD3200 flow scripts (or their translated versions in k:\cdx\fsq). There are some obvious specific cases, such as the new checks script, which replaces embedded code in the CD3200 analyzer. This periodically reads the strobed fluid sensors. Clearly, this can be moved to the FPM, which can send a fault message just as it will for the old SHM faults. More generally, we can search for the following telltale signs:
In most cases only a few of these exist in any one flow script. In some of these cases, it may be possible to move the entire script to the FPM. Alternatively, the script may be divided into some portions that execute on the APU and others on the FPM. When the latter approach is being considered, it is important to examine the intermixing of resources from both domains. For example, the shm.f file contains some scripts (and portions of scripts) that access tower/loader sensors and global VARs that are shared with other scripts that communicate with the data station and, therefore, belong in the APU domain. Thus, scripts that would otherwise clearly separate into APU or FPM are glued together by resources (VARs) that can't cross the domain boundary (this restriction is likely to change in order to address this problem).
CDNEXT ANALYZER SYSTEM DESIGN REPORT 9
Sept. 30, 1999, 10:24AM.
[Report8] [Report10]MEETING MINUTES Originally published as AD46.DOC
PARTICIPANTS
David McCracken (chair), John V, Jack W, Lucy L, Dave R.
DISCUSSIONS
APU-MSM COMMUNICATION
Jack W questioned whether the APU-MSM communication, having evolved from the CPUDCM-SHM link, would be viable for the higher traffic volume. He pointed out that the SHM link not only has a low Baud rate but is also only half-duplex. Every message initiated by an SHM (such as macro done) requires the CPUDCM to respond to an ATN interrupt, send a request to the SHM, receive an ACK to the request, and then receive the reply as another independent message.
The APU-FPM physical interface, including Baud rate (19.2K) was similar to the SHM. The new APU-MSM physical interface is nearly identical to the HSSL, with a 500K Baud rate and hardware flow control. There still is an ATN line. In the meeting, David answered that the half-duplex requirement from the SHM was still in effect in the APU-MSM protocol. However, there is no need for this. The HSSL interface has provided reliable full-duplex operation and the only difference between this and the APU-MSM drivers is the restricted use of DMA simply due to the shortage of DMA channels.
To take advantage of the full-duplex capability, the MSM will automatically send the completion status of motor moves and scripts invoked by the APU. Sensor status must still be requested.
Currently, the APU-MSM protocol specifies the same CRC as in the APU-DataStation interface. This is a fairly time-consuming calculation and is unnecessary for the short distance and small messages (except during program download) that pass between the APU and MSM. This will be changed to a simple checksum.
The MSM interface Baud rate is 26 times faster than the SHM. Assuming that one-third of the traffic has been caused by the half-duplex restriction, the effective communication rate is 34 times faster than SHM communication. Tower and loader function communication does now have to share the link with motor communication but that is roughly balanced by the 50% reduction in tower and loader traffic due to converting interactive functions to full MSM scripts.
MOTOR SCAN CHAIN
John asked how the motors' winding test would be implemented in the new system, Dave R replied that it wasn't compatible with the scan chain. While it is true that the scan chain doesn't help with this function, it doesn't prevent the winding test from being implement essentially as it is on the current MPM. We just need to add two wires to the motor control cable that now comprises the scan chain.
The two new wires act essentially as an analog bus, with each of the 28 windings (two per motor) attached to the bus through 1 Mohm resistors. The drivers are connected across each winding. If a winding is not open, its driver is nearly shorted. If a driver is turned off, the circuit leg presents a 2 Mohm load across the analog bus. If all but one driver is turned off, the load is 154 Kohm. If there is an open winding across the one driver that is turned on, the voltage across the bus is 1.72V (154K * 24V / 2.154M). If that winding is good, it shorts the driver and the differential bus voltage is nearly 0.
The motor scan chain bus currently comprises both an I/O scan chain (SPI) and the motor scan chain (MPI). The SPI will not be needed in the new architecture. The new motor control bus will use two wires each for reset, clock, data out (from controller), data in (from motors) and parallel load. With the two analog wires, the total is 12 wires. Since this is not a common connector size, a 14-wire bus will be used. The motor scan chain runs continuously even if all motors are at rest. To avoid inducing noise into the analog lines, the analog lines should be placed on one side of the cable, interspersed with the two spare wires connected to the controller's (MSM) ground and not connected on any motor driver boards. Each motor interface circuit will connect the winding across the analog bus through 1 Mohm resistors. On the MSM board, the analog feedback bus will connect to an ADC. One of the functions of the MSM program will be to test the windings.
Dave R suggested using the otherwise unused potential motor scan chain inputs for optional home sensors. The scan chain has four outputs for each motor. After the scan chain serially shifts all 56 bits, the parallel load strobe transfers the bits to the outputs. A broadside load into the serial registers can occur at the same time (actually on the trailing edge of the strobe) just as is now done on the I/O scan chain. The next serial shift of the output bits, simultaneously reels in the inputs. Each motor interface circuit will provide one or more (up to four) home flag connectors, each comprising 5V, DGND, and a connection to one scan chain input.
ADDITIONAL ANALYSIS
HARDWARE PARTITIONING
The interim system architecture erroneously assumed that the FPM would be responsible for nearly all control functions, but to allow CD3200 flow scripts to be reused, the APU should control everything that had been accessed through the Peripheral Bus. These items should be located on the APU's I/O scan chain.
The interim APU-FPM partitioned the systems into the following boards:
Most of the boards do not fully consume the available spaces (in a CD3200 box). The available spaces are:
VPM
When we examined the interim system, several changes were obvious. One is that the vacuum/pressure circuitry could and should be reduced to the minimum needed. It will be controlled through the APU scan chain and, therefore, doesn't belong on the FPM board. The transducers complicate board swapping, so, for maintenance concerns, the circuitry should be minimized. The vacuum/pressure circuit should be located on the Pneumatic Unit but no other circuitry should be there. Therefore, the new design will have a VPM board, which will only control the vacuum and pressure. It will be located on the Pneumatic Unit and will be controlled through the APU I/O scan chain.
Responsibility for the strobed fluid sensors as well as vacuum/pressure control was taken away from the FPM and given back to the APU (via the APU scan chain). Thus, the name FPM (Flow Process Module) no longer has any meaning. The unit now strictly replaces the MPM (Motor Process Module) and SHM (Sample Handler Modules-- Tower and Loader). Therefore, it has been renamed MSM.
MSM
The core of the MSM unit comprises the 68340, FPGA, APU interface (connector and drivers), motor scan chain interface (now augmented with an ADC), and I/O scan chain interface (if tower and loader remain separate boards). This consumes approximately 36 sq. in. In the interim system, the MSM (FPM) controls the tower and loader boards through scan chains. Therefore, it could be located nearly anywhere in the box. The new smaller MSM, however, could be located closer to the tower and loader that it controls. We physically examined the system to determine whether it would be feasible and desirable to move the MSM to the plate where the Left Panel Board is located and to connect directly to the tower and/or loader I/O devices, thereby eliminating the tower and loader boards. This could improve reliability by reducing components and connections, reduce exposure of circuitry to fluids, and improve our image with customers.
TOWER
The tower board clearly should be replaced. Locating the MSM at the bottom of the left panel (inside) plate with its tower connectors on the lower left and threading the tower wires through a hole near the bottom of the left front panel allows the wires to be only slightly longer than they are with a tower board. This MSM position is also the best location for the board to connect to the loader.
Disconnecting the tower I/O from the MSM located at the bottom of the inside left panel would not be physically difficult. There are several inches of clearance between this location and the Pneumatic Unit, which can slide out if this isn't enough. It would still be advisable to use connectors with ejector tabs to facilitate disconnect.
LOADER
Whether to replace the loader board with direct I/O from the MSM is not as clear as for the tower. The interim loader board has 30 input lines and 13 outputs. The 43 wires of a direct I/O approach compares favorable to the 40 wires used in the scan chain design (two 16-pin scan chains plus the 8-pin power cable). However, some of the inputs may be sensitive to EMI and the stepper driver is an EMI source. Another consideration is future flexibility. If direct I/O is used, every change, such as adding a sensor, potentially affects the wire count. If we choose direct I/O, thinking that it fits into a DB37 connector, for example, then the decision can be rendered obsolete by a relatively minor functional change.
The interim loader board's cabling is inadequate for maintenance. Three cables have to be disconnected to release the loader and none of them will be very accessible. The board will have little vertical clearance, yet its power plug must be pulled up with considerable force to disconnect. The single DB37 cable used in the CD4000 represents a much better approach even if it does increase the cabling cost. The interim board approach to cabling is not inherent in the scan chain interface. In fact, better cabling favors the scan chain approach, which isolates the wire count from future I/O count changes.
The two analog motor feedback wires increase the wire count of the scan chain interface but not of the direct I/O. But the interim design uses more wires in the scan chain connector than is actually necessary. It has two scan chain connectors that are nearly identical. Units can't be daisy-chained to a scan chain, because the data wires require unique input and output points. There are two logical data lines, downstreaming from the master (MSM) and upstreaming from the slaves. Each of these requires two wires for differential signaling. These four wires have to be different in the connectors, but all of the others are identical. Further, the loader board in the new architecture would be the only unit on the MSM's external I/O scan chain. The last unit in a scan chain doesn't need to provide a downstream connector. Therefore, the I/O scan chain doesn't even need the four data loop wires.
The situation for the loader's motor scan chain is different. The MSM master is located physically between the loader and the (new-- see stepper review) right panel motor driver board. One of the two slave boards must provide the data loop. While reducing the wire count in the MSM-Loader interface is desirable, it could only be done by physically looping through the right panel motor driver board and coming back through the MSM and down to the loader board. This would require either two normal scan chain cables or one special looping cable. The looping cable would reduce the total wire count but at the expense of system flexibility, as only standard cabling supports "mix and match" boards. Special cabling should only be used in a point-to-point link that we are sure is not going to change, such as between the MSM and loader.
If the loader board provided the motor loop, the logical interface signals would be:
The total wire count would be 24. A DB37 could support these while providing 13 power pins, which should be sufficient. A right-angle DB37 on the loader board would solve the disconnect problem, particularly if the connector itself can be bolted to the loader frame.
STEPPERS
The MSM will have three stepper interfaces for driving the two steppers located in the tower-- probe and spinner (this has been changed from a DC motor to a stepper)-- and the peristaltic pump, located on the left-hand side of the left panel. If direct loader I/O replaces the loader board then a fourth interface is needed. These are located on the motor scan chain but with the single-ended electrical interface that can be locally used instead of differential.
The four syringe motors and one wash block motor are all located on the right front panel. A stepper driver board will be located on the (inside) right panel under the wash block motor and, therefore, next to the right-most syringe. This board will contain five stepper driver interfaces. If the board is too large for this space, it may move to the right several inches, where there is a large open area that was occupied by an SDM in the CD3200. All stepper interfaces are removed from the Right Panel Module. The shear valve driver interface doesn't similarly migrate because it is not a stepper, but is controlled through the APU I/O scan chain.
The size of the syringe/wash block stepper driver board is determined by the following elements:
LEFT PANEL
The peristaltic pump driver will move from the interim Left Panel Board to the MSM. The interim Left Panel Board provides a scan chain connector to the Tower Board. This will move to a mechanically similar I/O connector on the MSM. The MSM area is determined by the following subsections:
The Left Panel Board continues is still needed to provide at least solenoid drivers. The interim design also includes fluid sensor circuitry (including for the strobed sensors). The board area is determined by the following subsections:
Since the Left Panel Board and MSM are located adjacent to each other, it would be possible to simply combine them into one. However, with such a large number of connectors, servicing would be easier if the two were separate. If separate, another 3 sq. in. should be allowed for separation. The total area consumed by the two boards would be 122 sq. in. The left panel plate provides 168 sq. in. The MSM will be located on the bottom of the plate and the Left Panel Board above it.
Two of the six strobed fluid sensors, are located on the front left panel. The remaining four are probes inserted into reagent and waster containers outside of the instrument. Their wires enter the instrument through the back panel (of a CD3200) but they could enter at any convenient point. Since only the two sensors on the front panel have a fixed location, it is reasonable to locate the sensor circuitry near to them and route the remainder to this point through the most convenient path. The Left Panel Board is the obvious choice for this. The sensor wires entering through the back panel can traverse to the front if necessary, but it would be better to provide an access point closer to the Left Panel Board if possible.
FLUID IN-LINE SENSORS
The pre-shearValve and post-shearValve fluid sensors attach to two small boards precariously dangling from a flagpole on the right front panel behind the Y-valve. The optical sensor board is our own and its circuitry should be merged into the Right Front Panel. The ultrasonic sensor board is mated to its sensor and is tuned by the manufacturer. This board consumes less than 2 sq. in. and the Right Front Panel board has virtually unlimited space, so it would be reasonable to provide an option to mount the sensor board on the panel board. Remote mounting should not be precluded, because the sensor cable has a fixed length that may not support every possible position of the sensor unless the board is mounted on the front panel.
STATUS BOARD
The interim Status Board has two combined motor and I/O scan chain connectors for attaching to the FPM. These must at least be replaced by connectors for the APU I/O scan chain, which doesn't include a motor interface. The wiring can be much more aggressively reduced by using a point-to-point loop-back cable to the Right Panel Board. This requires just one 14-pin cable.
RIGHT PANEL
The interim design has two combined motor and I/O scan chain connectors for attaching to the FPM. The new design has two similar connectors for attaching to the APU I/O scan chain. Neither the MSM's I/O or motor scan chain attaches to the new board. All of the stepper driver circuits are moved to the new right panel motor driver board.
A point-to-point loop-back cable connector will be added for connecting the Status Board to the APU I/O scan chain.
The interim design uses consistent cabling, which would support mixing I/O boards, i.e. any board that attaches to a particular scan chain can be located anywhere on the chain. This essentially doubles the amount of cabling and connectors. The supposed flexibility that this should buy is really an illusion. Each I/O board is designed according to where it is located. Any movement would be within a narrow range that would not upset the bus topology. The bussing flexibility is inconsistent with the reality of these boards. Therefore, cabling should be reduced wherever feasible by using special point-to-point arrangements as specifically described above. The two cases that stand out are the Right Panel Board to Status Board and the MSM to Loader board.
The standard I/O control cables are as follows:
I/O control cabling usage is as follows:
APU
FPM
No longer exists. Its vacuum/pressure control moves to the new VPM. The remainder moves to the new MSM.
MSM (replaces FPM)
VPM (new)
LOADER BOARD AND CABLE
LEFT PANEL BOARD
RIGHT PANEL BOARD
STATUS BOARD
RIGHT PANEL MOTOR DRIVER BOARD (new)
CDNEXT ANALYZER SYSTEM DESIGN REPORT 10
Nov. 11, 1999
[Report9] [Report11]MEETING MINUTES Originally published as AD47.DOC
PARTICIPANTS
John V, Robert D, Andy O, Jack W, JP Y, Dave R, David McCracken.
DISCUSSIONS
REPLACEMENT DRIVERS AND RECEIVERS
As the first item under the RPMD review, Andy pointed out some basic scan chain changes from the interim design.
The 75174 and 75175 differential drivers and receivers are obsolete. Andy and Coleman replaced them with MC26C32 (receiver) and MC26C31 (transmitter). These differ from the older parts in the following ways:
For scan chain usage, these differences only improve (reduce) the possible SCK-NPCS skew. It was not discussed in the meeting, but we will also need to replace the 75174 and 75175 used in the HSSL and IML (Inter-Module Link between APU and MSM). The IML timing analysis in cdxsys2- Distribution of FPM Subsystems- Communication Clocking will have to be revisited.
After the general meeting, Andy, Robert, and I discussed the possibility of a further upgrade to LVDS components. These are even faster and might support a significant speed increase. Our system, firmware, and hardware designs are based on the old, slow parts. Significant system design improvements could be made possible by a relatively small speed increase. For example, increasing SCK from 2 MHz to 2.5 MHz would enable the MSM (with FPGA and firmware changes) to support the 17 steppers required by the CD4000. It might be relatively easy for us to double the speed of all scan chains now in order to support future system enhancements. Andy is going to investigate LVDS.
MOTOR FLAGS
The version of the RPMD under review does not include motor flags, because the description of RPMD given in Cdxsys3.Doc- Stepper Motor I/O- Right Panel Motor Driver Board Requirements failed to mention them. The general motor scan chain description includes four flags per motor with at least two of them taken to 3-pin (+5, signal, DGND) pins. This description should apply to all stepper motors. It can be found in cdxsys2- Stepper Motors- Motor Scan Chain Inputs, reprinted here for reference:
As explained in AD46, the stepper interfaces will include four inputs for flags. One HC597 octal shift register will support two motors. For a circuit example, see Status Panel or Left Panel. The '597 is double-buffered. On the rising edge of NPCS0 (scan chain parallel load) the input capture register is loaded by asynchronous RCK. During the period in which NPCS0 is high, at each positive transition of SCK0, the captured input is loaded into the input scan chain because the active low synchronous load signal SRLOAD is driven by inverted NPCS0.
At least two of the scan chain inputs are taken to standard three-pin flag jacks. The other two pins provide +5 and digital ground to an opto-interrupter. The flag pin is connected to the '597 input and to a pullup resistor.
A value was not given for the pullup resistor. It should be 1K. This is the value used in the y-valve (DC motor driver) circuit, which uses our most common opto-interrupter. For additional background discussions, see AD46- Discussions- Motor Scan Chain.
UNUSED SCAN CHAIN
Andy opened the discussion of RPMD motor flags as an example of a situation in which either MOSI or MISO data is unused on a given board. Obviously, if both were unused, there would be no reason for the board to be on the scan chain. All of the current and planned boards use both MOSI and MISO, but a board using only one of these is conceivable.
In the case of the RPMD, had MISO been unused, there would be no need for its inclusion in the cable at all. The general loopback cable described in AD46- Hardware Partitioning- Cabling assumes that both MOSI and MISO are used in the loopback board. If there is no doubt about which boards will be on both ends of a loopback cable, the two boards can agree to simply leave out all four wires of an unused data signal. If more than one type of loopback board might be used or if the loopback board extended the loop by daisy-chain to another board (before returning via the loopback cable) and both MOSI and MISO were needed in some configuration, then the loopback launch point would have to be a full (16-pin in the case of motor scan chain) loopback connector. Then the situation presented by Andy could exist on one of the loopback boards. Andy's question was whether there was any need for a receiver-transmitter pair for the unused signal or if a wire loopback would be sufficient. The wire loopback is certainly adequate.
MOSI SHIFT REGISTER
The interim design used HC594 for MOSI output registers. This is very rare and we have substituted HC595, which is nearly identical but has an Output Enable instead of Output Reset. As explained in cdxsys3- Scan Chain- Scan Chain Operation- '595 Architecture, the only way to reset a '595 is to assert its MR (Master Reset pin 10) and then strobe its latch clock (pin 12).
Andy explained that the RPMD design doesn't try to reset the '595 but, instead, uses its OE to establish an initial output value. Disabling the output would allow pullup or pulldown resistors to establish a value sooner than the motor scan chain controller could load itself and effect the described reset. The motor driver current select signals could be pulled high, eliminating inverters. This approach conceptually answers the concern regarding motor safety at startup expressed in cdxsys3 and the general suggestion that a synthesized (in PLD or FPGA) replacement "could improve on the '595 by making the reset output more reliable and by reducing external inverters by resetting to 1 or 0 as appropriate." Dave R pointed out that the concept sketched by hand into the RPMD schematic ties the OE to +5 via 10K, affording the scan chain no means to control the output. We all agreed that OE should be tied to NPHRST, the scan chain reset signal, and that this signal would have to change to active high (PHRST instead of NPHRST). Andy mentioned that this would be a good change also because the replacement differential transmitters and receivers default to high. Also, scan chain reset is tied to '595 MR, which is not appropriate. The hand sketch appears to have accidentally swapped the connections. MR should be tied high and OE to scan chain reset.
We discussed the fact that this approach would require resistors on '595 outputs. For the motor scan chain, some of these would be replacing inverters. However, in all other situations, these would be additional components, so we might want to consider alternatives for I/O scan chains. There is another problem that we didn't notice during the discussion. Giving the scan chain reset control of OE appears to defeat the original purpose of providing a controlled reset condition without depending on the scan chain. However, if reset comes up high, for example because the FPGA's outputs are tri-stated until it configures itself and this results in the driver assuming a high output, then the dependency is very little compared to the active reset method.
Andy described how the scan chain controller would initially hold PHRST asserted (now high) and not disassert it except under software control. Software would load the scan chain with safe values and then write to a control bit in the FPGA to disassert the reset. Dave R added supporting detail to this scenario, explaining that he had expected software to initially test the scan chain by loopback before loading application values and then releasing reset.
The following analysis and proposal for I/O scan chains was not discussed in the meeting. Since there are many cases in which simply clearing the output register within perhaps 200 msec. affords adequate initialization, it is excessive to demand that all scan chain output registers reset in the manner proposed for the motor scan chain. However, there may be specific cases that require the essentially instantaneous controlled state made possible by that method. Two reset means can easily be provided in one system by the following design:
Item 4 in the above proposal deliberately contradicts the CD3200R Flow Panel Processor (FPM) Board Hardware Design Description (Dave R 9/4/98) description of the SPI control register. The EN (enable) bit described in that document serves no useful purpose and should be eliminated.
CONNECTORS
Andy relayed a question from Jill regarding the positioning of connectors, particularly motor jacks, on the RPMD. The RPMD requirements analysis in cdxsys3 says "looking at the component side, jacks for attaching to the four syringe motors are located on the right edge. The wash block motor jack is located as near as possible to the upper right corner of the board. The spare motor jack may be located anywhere on the board. Each motor jack is the standard 5-pin missing-center stepper connector (as used in CD3500/3200)." It also states that the board is 6.5" horizontally by 4.5" vertically and it calculates the total board space required as less than the allowable board size. However it doesn't take into account motor driver layout form factor.
The interim RPM replicates a motor driver template five times. Dave R explained that he allowed two square inches of copper for heat dissipation for each driver (one motor requires two drivers). Four drivers based on this pattern would require a board edge length of 7.5" compared to the 4.5" available. Locating the four connectors on the right edge is ideal but not required. Locating them along the top would be almost as good. All of the motors are located above the RPMD and the four in question are also located to the right of the board. We want to avoid any other locations for these connectors, which would require the cables to cross over the board. JP pointed out that service has always justifiably complained about such interference.
The interim LPM contains an example of an alternate form factor. The pump driver circuit aligns the two drivers length-wise, creating a 1.3" by 3" block with the connector located on the 1.3" side. Replicating this pattern four times would produce a block 5.2" by 3". While the 5.2" is still longer than the 4.5" right board edge, it fits well within the 7.5" horizontal limit. Replicating this pattern six times produces a 7.8" edge, which nearly fits all drivers on the top edge. Since much of the space is just a blank copper heat spreader, it would be a simple matter to slightly reduce the width and increase the length of the pattern. For example, 1.2" by 3.4" would give each driver 2 square inches while taking 7.2" of the top edge of the board and placing all of the connectors along the top.
MSM
Andy explained that he chose to avoid scan chain NPCS to SCK skew by stopping SCK before asserting (high) NPCS (and presumably not starting SCK until falling NPCS has had a reasonable time to settle). This is one of two solutions discussed in cdxsys3- Scan Chain- Scan Chain Controller. The other solution is for the scan chain controller to toggle NPCS only on the falling (inactive) edge of SCK.
The approach that Andy chose affords essentially infinite skew margin, because the controller can wait for an arbitrarily long time after stopping SCK before asserting NPCS and after disasserting NPCS before restarting SCK. However, any devices on the scan chain that have a synchronous parallel load will not function in this architecture, because they need at least one SCK to occur after NPCS has been asserted.
Both the '597 and '595 have asynchronous parallel load and, therefore, don't need an SCK after NPCS asserts. The analysis in cdxsys3 presents only the 'HC166 as an example of a synchronous parallel loading device. It has become clear that the '597 and '595 are readily available and we won't need other simple registers. However, asynchronous loading may not be feasible in all synthesized scan chain register situations. For example, the 7032 PLD proposed for the RPM's shear and y-valve controller (see cdxsys7) has a single global clock, which is driven by SCK. It also provides a global clear that is independent of the clock, i.e. it is asynchronous, but not a global load. It appears that the only way to effect a parallel load is to use NPCS to steer the data toward the latch, which is strobed by the global clock. Andy is going to review this entire PLD design and may be able to suggest alternatives.
Andy identified an error in the skew analysis presented in cdxsys3, which examines the possibility NPCS to SCK skew but explains that the situation with serial data is different, arguing that the data comes from an adjacent register rather than from the scan chain controller. This isn't true for the first output register in the scan chain. Dave R and I both expressed the opinion that the controller should change data on the falling edge of SCK. Dave said that he thought that this was the case in his original design. I explained that I had originally brought up the issue of skew because of Dave's Word document drawing that appeared to have NPCS changing on the rising edge of SCK. I did not consider data skew at that time.
Andy asked whether the "sense" register described in the CD3200R Flow Panel Processor (FPM) Board Hardware Design Description should be implemented on the MSM. It should not be. The sense facility assists bubble detection, part of the fluid control domain, which is the responsibility of the APU.
Although the sense facility is not part of the MSM, it is important to the APU and its requirements need to be clarified. Dave R mentioned that the reference document is fairly old and does not necessarily agree with more recent analysis. The two sense address registers are consistent with more recent requirements but the detection and control means are not. Instead of simply recording whether a change has occurred from one 132 usec. scan period to the next, the Boolean value of the monitored bit needs to be sampled and counted. The rationale for this requirement is described in implementation report 18 (report18.doc) as follows:
"[In the APU system, the] hardware assist only needs to record any single transition to conclude that the signal is not stable. But for bubble detection it needs to record the number of 0 samples and the number of 1 samples. These are two different functions. In AD16 – Observe and Watch Commands – Hardware Assist for Observe, I suggested a hardware trace function to reveal the pattern of bubbles while Jack advocated a change detector (based on XOR between each sample and the next). The single change function serves the stable state filter but not the bubble detector. However, the trace consumes significant hardware and software (to analyze the trace) to generate a picture that is more detailed than necessary. It would be much better for the hardware to count 1s and 0s on the two monitored sensors. If this is possible then it is all that is needed for both bubble and stable state detection."
The FPGA needs to provide two event counters for each of the two monitored sensors. Software needs to be able to both read and clear the counters.
MOTOR PAGE FLIP AND SCAN CHAIN INTERRUPT
Andy asked for suggestions on how the stepper controller should tell the CPU when it has finished executing one of the two step script pages. This is described in motor.doc- Design- Firmware as follows:
"The two microcode programs will both be located on page boundaries. When ecp reaches 0xxxFF, the current program is exhausted and the next one should be started. If the next one is the later one in memory, ecp could simply keep incrementing. However, if it is the lower one, then one or more of ecp's bits higher than 7 must be changed. In all cases, ep must be changed to point to the other program's event array. At the program change, the FPGA should interrupt the microprocessor and present an appropriate (i.e. different from any error codes) value in the status register. This will alert the microprocessor to the availability of the memory occupied by the exhausted program. It could be helpful if the status code would indicate which memory buffer, lower or upper were being made available, but the microprocessor can also keep track of this itself."
This brings up the related topic of scan chain interrupt, discussed in cdxsys7. This interrupt is not related to the motor scan chain and may only be relevant to the APU in practice. However, it should be included in the MSM's I/O scan chain definition for possible future application. As explained in cdxsys7- Scan Chain Interrupt, sequential access to one location on the scan chain may be necessary for communicating with intelligent agents, such as the VPM. To avoid overwrite, the CPU is synchronized to the scan period via an interrupt from the scan chain controller.
The page flip reference cited above implies that the FPGA has one interrupt connection to the CPU and that the CPU has to read a status register in the FPGA in order to determine the cause(s). However, cdxsys7- Scan Chain Interrupt implies that the scan chain interrupt must be unique because it is nearly always active but rarely of any interest to the CPU. These are not necessarily mutually exclusive definitions. The APU and MSM CPU's each have at least two unused external IRQs. It would be possible to dedicate one of these to the scan chain interrupt and the other to all other sources, which at this point seems to be only the motor page flip. It also would be feasible to combine the scan chain interrupt with other sources, but then the FPGA would have to provide an enable bit for this to allow software to turn it off. Dave R mentioned that he had planned to provide such a bit for all interrupt sources. It probably is a good idea to follow Dave's thinking on this, as it increases system configuration options. ISR performance is only slightly degraded by having just one IRQ, so the decision to use one or two seems to hinge more on FPGA complexity considerations.
RPM AND LPM
PARTITIONING
I (David McCracken) introduced my latest design report, cdxsys7, which attempts to define the Right Panel Module (RPM-- not RPMD) and Left Panel Module. Except for the Loader, these are the last of the modules that have not yet been defined sufficiently to begin implementation. John and I had decided to review CD4000 I/O requirements to try to prepare the first system to support the 4000's current mechanical requirements. These are outlined in cdxsys7- Apu, Rpm, Status Panel, and CD4000.
Cdxsys7 suggests that 26 solenoid controls be added to the 32 on the interim LPM. This was rejected by all participants. Dave R mentioned that he had received complaints about having as many as 26 on one board. JP said that solenoid failure was such a common problem on the CD4000 that each bank of eight reserves one for a spare. The new solenoids that we will be using will have detachable leads, which should go a long way toward reducing the difficulty of replacing solenoids, but Dave pointed out that driver failure would require replacing the entire board.
Andy suggested dividing the LPM into two identical modules, each with 32 solenoid controls. This simultaneously reduces the "big board" criticism while increasing instrument configuration flexibility. He also suggested that further division, e.g. into four identical boards more like the CD35/3200's CDMs, would yield diminishing returns, as each board requires two scan chain connectors, a power connector, and physical margins.
The only down side to Andy's suggestion is that we had planned to keep the strobed fluid sensor circuitry on the LPM for close proximity to two of the five sensors. The MSM is also well placed for accessing the two sensors on the left front panel, but fluid sense lies in the APU's control domain and should be on the APU's scan chain to minimize communication traffic. The fluid sensors as a group are not localized; the other three are located under the instrument, which is more accessible to the RPM. Therefore, we will move the strobed fluid sense circuitry to the RPM. The four wires from the two left front panel sensors will be routed to the RPM. This reduces the LPM to a simple block of 32 solenoid controls and 16 general inputs. It also puts all specialized I/O, except for vacuum/pressure control, on the RPM. Concentrating specialized I/O onto one board improves system flexibility by allowing the design of simpler boards to stabilize.
SOLENOIDS
During the discussion of solenoid distribution, it was revealed that there isn't one definition of solenoid driver that applies to all instruments. The CD35/3200 instruments use an all electric valve with two-stage driver. JP and Dave R described a much different situation in the CD4000, in which banks of small solenoids open pneumatic valves to actuate larger air-driven valves. This system is well known as a source of problems. The pressure/vacuum requirements are very high (JP added that the vacuum/pressure unit is a frequent source of problems), the air-driven valves often don't have enough mechanical force to pinch tubing fully closed, and the small solenoids have reliability problems. JP and Dave also described the CD1200 as having a different all-electric control means in which software apparently implements some sort of PWM-based power down, thereby eliminating power transistors, but at the expense of CPU bandwidth (according to Dave).
JP volunteered to investigate the CD1200 solenoid control means to determine its general utility. However, if Dave's estimation that it requires intensive CPU involvement is correct then it is not generally applicable. Likewise, the low-power single-stage drivers needed for the CD4000's solenoids are not applicable to all-electric valves. Only the two-stage drivers used in the CD35/3200 and interim CDNext systems afford solenoid control in all systems, albeit at some hardware expense. In the case of a CD4000-class instrument, replacing the pneumatic valves may be highly desirable. In any case, at this stage of development, flexibility is more important than relatively minor cost reductions. Therefore, for now we will continue to use the two-stage driver.
Dave R mentioned that the power transistors are vulnerable to shorting due to bending of their leads. Andy asked whether equivalent surface mount components were available. These should have no such problem. If we continue to use through-hole parts, they should be mounted on spacers designed to prevent this problem.
SCHEDULE
Andy outlined the anticipated schedule as:
John V asked the Dallas group to not do any work on the APU until the Santa Clara group has reviewed its current design status. However, since the MSM and APU share the basic scan chain design, any improvements (such as timing and component upgrading) or clarifications developed for the MSM should be folded into the APU design. The Santa Clara team will review the APU in the following areas:
CONCLUSIONS
CDNEXT ANALYZER SYSTEM DESIGN REPORT 11
Nov. 24, 1999
[Report10] [Report12]DESIGN HISTORY FILE CHANGE
Our project plan specifies that reports cannot be edited once they have been published unless clearly identified as unfinished. I have had to bend this rule for the system/hardware design meeting reports. I had published them as documents AD45, AD46, and AD47 because all of our previous meetings were reported under this series; but all of the other meetings were for software design. I had been publishing system/hardware design and implementation reports under the CDXSYS series of documents and none of those were meeting minutes.
Organizing reports by whether they involve a meeting is inappropriate, especially since, although some of us are involved in both software and hardware, others work on strictly hardware or software. Therefore, I have moved the system/hardware meeting minutes into the cdxsys series, renaming AD45, AD46, and AD47 to cdxsys8, cdxsys9, and cdxsys10. I modified the original reports only to tell their new file name and their origin. All substantive text is unchanged.
To recap the file organization: all design files are stored under the project server directory K:\cdx\doc where K is mapped to \\eisscd32\cd3200. The system-hardware-software design files are located in k:\cdx\doc\analyzer. The files are:
COMMENTS ON CDXSYS10
Cdxsys10, originally published as ad47, reports on the system/hardware design meeting of 11/9/99 at ADD. After reviewing this and cdxsys7, a supporting document introduced at the meeting, we have had some follow-up discussions.
In cdxsys10- Motor Page Flip And Scan Chain Interrupt, I suggested that software needs an interrupt (that it can dynamically enable and disable) from the I/O scan chain controllers that asserts (if enabled) after each scan. This will be used to support sequential writing to one location for communicating with an intelligent unit on the scan chain. It might also be used to support randomly related devices, where one device should be written at any time prior to another, for example to set a voltage level before turning a device on. However, this situation is difficult for the Interrupt Service Routine, because there may be several of these relationships pending at any given time and it is difficult to tell generally which ones can be consolidated into a single scan.
For software, it would be simpler to be able to track the scan controller at a higher level than the interrupt. The most straightforward means of achieving this is for the scan chain controller to count scans in a register that the program can read. To be sure that a write is completed before another is effected, the program can perform the first write, read the scan count, and perform the second write only when the count changes.
If the scan chain rate were approximately doubled (the NPCS high period would not necessarily be reduced by doubling the scan rate) the 64-bit MSM I/O scan chain period would be 18 to 20 usecs and the 256-bit APU scan period 66 to 68 usecs. Currently, the APU worst-case main loop period is 45 msec, but the MSM is unlikely to experience any period longer than 2 msec. It is important that the scan counter not roll over during the APU's main loop period. This would dictate that the maximum count be at least 680. Ten bits would provide 1024.
The I/O scan chain interrupt is required. The scan chain counter is only a convenience, which may simplify software and flow scripts, and should, therefore, be included (in both the MSM and APU I/O scan chain controllers) only if it doesn't impinge on essential FPGA functions.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 12
Nov. 30, 1999
[Report11] [Report13]Mike Y (edited by David McCracken)
Synopsis: CDNEXT-APU analyzer APU3 board design review. MEETING MINUTES.
PARTICIPANTS
John V, Mike Y, Dave R, David McCracken.
DISCUSSIONS
Introduction
Due to the APU3 board being an experimental platform, there are no formal design specifications to date. Many of the circuits were not well characterized before the board was designed, so formal performance specifications would not be benificial. Initial testing of the APU3 shows the performance to be very good and the next step is to gather data through the board.
There have been many concepts for using a single general purpose "APU" like board for a number of machine platforms. The APU3 board was designed with 7 parameters of high speed linear data acquisition with 4 decade accuracy suitable for any of our optical data. An 8th. parameter is intended for bandwidth limited data acquisition such as an impedance channel. The APU board is intended to functionally replace the CPU/DCM, MAM and SPM boards in a CD3200. Relative to a CD4000, the APU board functionally replaces the CPU, DACM, OSPM1, OSPM2, ITSPM and ADCM boards. The APU3 is designed to have 3 Photodiode signals come in on 1 connector, 4 PMT signals on another connector and 1 impedance signal on a 3rd connector.
Both the CD3200 and CD4000 are desired platforms to gather data on for the APU. However, the preamps for the CD3200 and CD4000 have very different architectures, the CD3200 has fixed gain preamps and the CD4000 preamps have wide adjustable gain range. The CD3200 implements its adjustable gain of x1, x2, x4, and x8 on the MAM board. The purpose of this meeting is to determine the best configuration of the preamps, adjustable gain, and connectors on the APU board.
Discussions
The initial discussions were where to place the adjustable gain. It was agreed that the APU board would only be implemented in new factory built machines, therefore no field compatibility should be considered. At this point it is decided that adjustable gain will be implemented at the preamp board level.
Separate Photodiode preamps or schemes to consolidate them are discussed. The CD4000 uses a bullseye detector has both forward scatter detectors very close to each other whereas the CD3200 uses a beamsplitter and 2 separate detectors several inches apart. It is decided that due to the fact that the gains are all different in all the detectors that there is no advantage to having a universal preamp to be used in each channel. It was decided that a Photodiode preamp consolidated with 3 channels will be designed. The second CD3200 forward scatter photodiode will wired to the 3-channel preamp. The extra channel will be de-populated if it is not used.
Regarding the PMT preamps, David McCracken and John V were initially in favor of keeping all the preamps as universal and separate due to the added flexibility and ease of stocking spare parts for field service. Mike Y pointed out that there would be changes to the existing CD4000 PMT preamps such as interface voltages and readback voltages. The CD3200 preamp could not be used because it is a fixed gain design. It was decided that 3 PMT preamps circuits would be consolidated onto one board.
With the previous decisions, there is no need to revise the APU3 at this time.
David McCracken’s group will be ready to acquire data in 6 weeks, it will take Dave R 2-3 weeks to get the data acquisition state machines running. Due to Robert D’ groups tight schedule, it was decided to initially use adapter boards to connect the APU3 to existing CD3200 and CD4000 preamps for initial data acquisition.
CONCLUSIONS
CDNEXT ANALYZER SYSTEM DESIGN REPORT 13
Dec. 9, 1999
[Report12] [Report14]Synopsis: APU system hardware analysis. VPM ADC Control.
VPM ADC CONTROL
DISCUSSION (via Email)
James C (Coleman) R; 12/08/99 10:41 AM:
Since the APU scan chain description does not include any interface
between the APU and ADC on the VPM Board, I have some questions.
(1) The ADC needs to be programmed at power up for the conversion
rate (fast or slow). Should the APU do this, or should the FPGA
on the VPM do this. If the FPGA does it, what should it program
it for - fast or slow?
(2) The ADC can return data for self-test voltages (Vref). Should the
FPGA read this data, and if so what should it do with it?
(3) The ADC has a command for Software power down. Is this mode
planned to be used, and if so how is this to be controlled?
(4) The ADC can be operated in two distinct modes: normal sampling
mode (fixed sampling time) and extended sampling mode
(flexible sampling time). Is the extended sampling mode
planned to be used, and if so how is this to be controlled?
David McCracken; 12/9/99 10AM-PST.
Reviewing the data sheet for the TLV1548 and our VPM requirements, I have the following observations and suggestions:
THE DOCUMENTS AFTER THIS POINT ARE NOT YET INCLUDED IN THE INDEX
2/17/2000
[Report13] [Report15]
DISCUSSIONS
Coleman and Maha reported wading through the 80 plus pages of cdxsys to get up to speed on their new assignments. I (David) committed to adding an index to the document and hyperlinks for new reports. As the first step in hyperlinking, I created links at the beginning of cdxsys to each of the reports. Clicking on one of these takes you to the named report. This is not particularly useful for users but helps a lot when adding hyperlinks to topics, because these links afford faster navigation to and from the general reference areas.
Robert described hyperlinks as specially formatted text that, when clicked on, move the viewer to the selected topic. He mentioned underlining and font change as one kind of formatting. I'm using a slightly different format, underlining plus square brackets with no font change, because MS Word seems to like to change the font when the text is cut and pasted. If the font is the same as surrounding text, Word does this less often.
Especially valuable is being able to retrace the links back to your starting point by clicking the web link arrows. You can't do this unless Word's web toolbar is exposed. To do this, open Tools-Customize-Toolbars and put a checkmark in the Web checkbox.
System design report 11 topic Comments on cdxsys10 item 6 reports Andy's conclusion that David's request to double all scan chain rates (I/O and motor) from 2MHz to 4 MHz [Cdxsys.doc-ScanChain4MHz] can be easily achieved with the 26C series line drivers and receivers. This change will be implemented immediately because it potentially affects nearly every subsystem.
Stepper Motor Controller Proposal (motor.doc) requirements specify that 14 motors will be supported by the MSM [Cdxsys.doc-MotorRequirements]. David's request to double the motor scan chain rate did not include immediately increasing the number of supported motors, but only to avoid delaying the design schedule.
Coleman suggested that, as the schedule has already slipped due to personnel changes, the time required to increase the motor count would not be significant. System design report 7 topic Apu, Rpm, Status Panel, And Cd4000 enumerates the 17 motors used in the CD4000 [Cdxsys.doc-CD4000Motors]. It is not clear that all 17 would be used in an alternate version of a CD4000-class instrument, but it is possible that more advanced instruments will require even more motors.
With the 32 usec. step resolution and keeping at least 4 usec. for the NPCS high time (see report 3 topic Scan Chain Operation [Cdxsys.doc-NPCS]), the maximum number of motors that can be supported is 28. This is much more than we anticipate using in the near future. David arbitrarily suggested supporting 20.
In addition to the faster scan chain shift rate, increasing the motor count requires increasing the FPGA's scan chain register length. If the number of simultaneously-driven motors were to increase, the size of event memory (not event count memory-- see motor.doc [Cdxsys.doc-MotorEvent]) would have to increase, as would the size of the event counter. In the original 14-count design, only 14 events could occur in any single 32 usec period, requiring only a 4-bit event counter [Cdxsys.doc-MotorEventCounter]. If all motors were to step at their fastest rate one (of two) event arrays would consume 3,584 bytes. Increasing the number of simultaneously-driven motors to 20 would require a 5-bit event counter and 5,120 bytes in each of the two event arrays. The event array pointer would require 13 active bits (plus a static base address). Even at 14 motors, the event arrays are too large for implementing in the FPGA and are, instead, implemented by a portion of main memory. Thus, supporting 20 simultaneous motors only increases the event counter by one bit and the event array pointers by 1 or 2 bits. If all 20 motors stepped in one period, the FPGA would have to read 20 events. Andy's design specified reading the events (as many as 14) during the 4 usec NPCS high period. Increasing this to 20 could create problems, but a very simple solution is to increase the NPCS high time for motors.
In the current systems, no more than seven motors are simultaneously active. However, given the modest FPGA resources needed to support 20, it seems unreasonable to enforce such a limit in the basic design. The CPU may have difficulty meeting the requirements of 20 motors, but, if so, this may be resolved by specifying a combinational limit, like a transistor's SOA, i.e. a maximum amount of step activity that may be apportioned as more motors at slower rates or fewer motors at faster rates.
The original scan chain design specifies 4 usec. NPCS high period with a 2 MHz. clock. Changing the clock to 4 MHz. brings up the question of what to do about the NPCS high period. The VPM proposal (vpm.doc) specifies several tasks that have to occur during VPM high [Cdxsys.doc-VPMNPCS]. System design report 2 explains how a DSP could be used as the VPM controller [Cdxsys.doc-DspVpm]. In this approach, the DSP, interrupted by NPCS going high, performs these tasks in software. The suggested design requires at least 3 usec. Whether this VPM approach is implemented or not (Coleman was working on an FPGA version but has been rotated into the more important MSM job) it is reasonable to continue to provide sufficient NPCS high time for this sort of scan chain intelligence. Therefore, the 4 usec. NPCS high time will be retained even though the serial shift period changes from 128 usec. to 64 usec.
For the motor scan chain, a longer NPCS high time may be even more important than for the I/O scan chain in order to support reading as many as 20 events during one period. Therefore, the motor scan chain NPCS high time will be as long as possible, depending on the number of bits in the chain. An 80-bit chain supports 20 motors. This shifts in 20 usec, leaving 12 usec. for NPCS high. If a feedback register is implemented (see System Design Report 4 topic Missing Motors [Cdxsys.doc-MotorScanChainFeedback]), the resulting 88-bit chain shifts in 22 usec, leaving 10 usec. for NPCS high.
As explained in System Design Report 3 topic Msm Motor Controller Bus Mastering [Cdxsys.doc-BusMasterLockout], the worst case delay for the FPGA to gain control of the bus is 1.88 usec. Assuming that it reads the shared memory in a burst and that the cycle time is 80 nsec, 20 event bytes could be read in 1.6 usec. Even with the old 4 usec. NPCS high, the FPGA could read the events with all motors changing in one 32 usec. step period. Andy's plan to read the events during NPCS high is still valid. Andy's design includes implementing the two 256-byte event count arrays in the FPGA. If this was proposed out of concern for timing in the NPCS high period, it should be revisited. With a 12 usec. NPCS high period, the FPGA has more than enough time to read the event count and 20 events. Moving the 512 bytes from the FPGA to main RAM would steal a small amount of bus time from the CPU but this would not be significant and might significantly reduce the size and cost of the FPGA.
Coleman reported that Andy had not yet designed the means by which the FPGA reads the event memory, which requires that it assume the role of bus master.
A major issue is how the FPGA gains control of the memory interface, which is normally controlled by the CPU. In particular, we were concerned about which CPU signals become high-impedance when the CPU grants the bus. The only ones that don't are the chip selects, CS0 through CS3. All address, data, RW, and AS go to high impedance. See MC68340 User's Manual section 3-6 Bus Arbitration (p. 3-40), Fig. 3-24 Bus Arbitration Timing (p. 3-42), and Table 2-5 Signal Summary (p. 2-14). Motorola documentation doesn't say what levels (high or low) the CS signals assume when the CPU grants the bus, but Motorola tech support told us (Service Request Number 1-98LP answered by Cindy.Barham@motorola.com 9/27/99) that they would be high (inactive). Consequently, the FPGA only needs to control the one that selects RAM.
The new MSM schematic includes no FPGA but the FPM and APU schematics indicate that all CPU data and nearly all address signals are connected to the FPGA. The standard chip select usage that we have followed is CS0 selects flash, CS1 RAM, and CS2 or CS3 selects I/O. It is useful during development to be able to access RAM and flash even if the FPGA is not programmed. This has been provided through a GAL in the previous designs; continuing to support is desirable. However, with the addition of the serial EPROM configurator, this is not essential, as it was when the CPU had to program the FPGA itself.
The original motor controller proposal doesn't specify whether the FPGA should read shared memory as bytes or as words. Clearly, the latter would use the bus more efficiently, which would benefit the CPU, but it would be more difficult for the FPGA. If the additional complexity is not too severe, reading words would be preferable.
System Design Report 4 topic MSM - Tower suggests that the I/O from the interim tower board is to be merged with into MSM as direct I/O from the MSM board [Cdxsys.doc-MsmTower]. Coleman asked for clarification regarding the means of providing this. There are three possibilities. The I/O could be provided directly by FPGA pins; it could be provided by discrete parallel registers mapped into the CPU's I/O space; or it could be provided by local (on the MSM board) serial registers mapped into the MSM's I/O scan chain. An I/O scan chain must be implemented in order to support the loader, which remains a separate board, so this solution would entail only increasing the chain length.
The scan chain solution uses the fewest FPGA pins but the most internal resources (for the additional chain register length). The FPGA direct pin solution uses the most FPGA pins but relatively few internal resources. The parallel registers solution uses relatively few FPGA resources but consumes the most board space.
Coleman questioned the complexity of the LED driver inherited from the old FPM. Apparently the LM339 comparator was used instead of a simple transistor because a spare was available from the DC motor current fault detectors. This is still the case. However, the voltage divider used to provide the reference level is not needed. Any level between 1 and 3 volts would work and this is already available in the divider used for either of the motor current detectors. The reference should be shared.
Coleman also asked about the source of the "wink" signal. This is basically a visible watchdog, which the CPU toggles periodically to announce that it is alive. It can be mapped onto the I/O scan chain or any other I/O that the CPU can write to. An appropriate control would be one of the Port A pins, as the CPU can control this even if the FPGA is not functioning.
The MSM schematic sheet 3 shows a 12K base resistor connected to the TIP125. As pointed out by Maha for the LPM, this should be replaced by 2.7K. Also, as David pointed out for the LPM's solenoid drivers, the ULN2003 inputs should initialize to low values to avoid turning on (at high power) the solenoids.
SPINNER MOTOR
Report 4 explains that both a DC motor driver and a stepper are to be provided for the tube spinner [Cdxsys.doc-SpinnerMotor]. The new MSM circuit incorporates a DC spinner motor driver circuit from the interim design. This design was examined in report 7 topic Right Panel Module - Interim Design Errors, which concludes that the under-current detector is unusable in concept and should be discarded [Cdxsys.doc-MotorUnderCurrent].
The over-current detector is not implemented correctly. The reference voltage at pin 7 of U9 LM339 should be set to match the 2 volts developed at the 1 Ohm sense resistor (note that U11 L293E pins 4 and 7 are supposed to be tied together-- this isn't clear in the schematic) at the maximum 2A output. The circuit shown develops a 3.3V reference, i.e. 3.3A over-current threshold. The 2A maximum output current is specified for the L293E as non-repetitive t = 5msec, so the 3.3A threshold is clearly useless. SGS application notes suggest approximately 0.7% positive feedback around the comparator. This is probably not necessary, but we might want to consider it. Without a feedback resistor, the reference divider should be 6.8K to +5 and 4.7K to digital ground (this 2V reference can also provide the LED driver threshold). Note that MTR_OVR is active low, i.e. logic 0 signifies over-current.
The motor control signaling is inefficient, using three lines when two would suffice. The MTR_CW and NMTR_CW, which should be renamed MTR_CCW, can both be taken high or low to stop. These should be translated to ON and CW signals in the MOSI scan chain (or other direct output control means) by the following equations:
MTR_CW = ON & CW
MTR_CCW = ON & /CW
With positive logic, these equations make both signal low/0 if ON is false. MTR_CW and MTR_CCW should initialize to both low or both high to avoid running the motor at system startup.
System Design Report 4 topic MSM - Mechanical Requirements suggests that two connectors be provided on the MSM for the motor and I/O scan chains to connect to the DB37 loader connector [Cdxsys.doc-MsmLoaderConnector]. Coleman question the rationale for using two instead of one larger connector. Robert added that manufacturing would normally prefer a single connector. No rationale was offered in report 4 but it may be preferable to use two connectors, a 14-pin and a 10-pin, that are used throughout the system, than one unique part. It would be appropriate for Robert to decide this.
MOTOR WINDING TEST
System Design Report 4 topic MSM - Motor Winding Test [Cdxsys.doc-MotorWindingTest] reviews the various options and suggests essentially duplicating the old MPM circuit. David asked that this analysis be reviewed by an analog engineer. Robert offered Ed Sewell's assistance.
As suggested in System Design Report 2 topic Motor Control, Tower, and Loader [Cdxsys.doc-MsmDma], DMA2 will be used for transmit data to the master (APU) and DMA1 for receive data from the master. CPU TxRdyA pin 83 will be connected to CPU DREQ2 pin 96. CPU RxRdyA pin 82 will be connected to CPU DREQ1 pin 93.
MASTER AND SLAVE COMMUNICATION INTERFACES
The MSM's slave communication connects to the RS232 barcode reader. It does not change from the interim design.
The MSM's master (to APU) communication connector will be identical to the APU's master (to Data Station) connector. The interface circuitry shown on sheet 3 must be changed to match the signals of the APU (see APU rev. 21 sheet 4). However, 26C drivers and receivers should be used (and, therefore, only terminator resistors are required). The interim APU's slave (to MSM) communication connector will be rewired to mimic the Data Station (as well as changes in the treatment of PRST and Clock).
System Design Report 5 topic Scan Chain Topology - End Board [Cdxsys.doc-EndBoard] describes optional SCK, PRST, and NPCS termination on scan chain boards that may optionally serve as end boards. Similar treatment will be accorded the high-speed serial link (HSSL or Inter-Module Link). The APU will require optional (instead of permanent) terminators for the master's clock and PRST signals. However, the MSM will use permanent terminators, because it must be the end of the IML, as its slave is not IML-compatible.
Sheet 3 of the MSM schematic also shows 10K pullups at the data inputs to the line driver. These should not be necessary.
Maha and Jack have been independently preparing updated versions of the RPM, particularly focussing on the Shear Valve and Y Valve controls. For Jack this has primarily been an exercise to gain familiarity with the Altera tools. He has also been able to show that design proposed in System Design Report 7 topic Right Panel Module - Shear Valve And Y-Valve Controller [Cdxsys.doc-ShearValveController] consumes more internal resources than a 7032 has to offer. For her part, Maha has reviewed the same design, improved signal naming, and corrected errors in the design description. Simultaneously, she has been correcting the LPM, which shares a number of circuit elements with the RPM. Going forward with this, we want to be sure that Jack and Maha coordinate their efforts.
Maha is using Verilog for logic design, Jack AHDL, and Coleman VHDL. Jack says he would be happy using Verilog, but we may not have the tools for this in Santa Clara. Robert D may have some suggestions about how to coordinate, as he has experience managing eclectic environments. One that we can all agree on immediately is to use the improved signal naming (and corrections) suggest by Maha.
Jack presented his RPM block diagram, which emphasizes the scan chain. This format is particularly useful for interfacing to software, which needs to know the bit address of all I/O.
Jack's diagram shows the interim RPM, which doesn't separate the input and output legs of the scan chain. As explained in report 5 topic Scan Chain Topology [Cdxsys.doc-InterimScanChain], this inefficient topology will be replaced by one in which all output registers occur in the MOSI leg and all input registers in the MISO. The two legs connect at the end of the MOSI leg, which loops back through the MISO. U33, the one input register on the RPM, will be folded into the PLD. It is important not to duplicate the interim topology in the new design.
Jack's diagram suggests that the PLD is the end of the APU scan chain, with the internal output (MOSI) register feeding into the input (MISO) register, whose output goes to the APU. As shown in the Intermodule Communication Diagram [Modules.doc] the RPM will have a standard connection to the APU, followed by a loopback connection to the status board, followed by a standard connection to the LPM. This is also described in report 5 topic Scan Chain Topology - I/O Boards [Cdxsys.doc-IntermoduleCommunication]. See report 3 topic Scan Chain Stream Connections [Cdxsys.doc-ScanChainStream] for a description of the different kinds of connections. System Design Report 7 topic APU Connectors [Cdxsys.doc-ApuStatusConnector] discusses improving cabling flexibility by including the Status Board loopback connector on the APU instead of the RPM. However, changing the APU is not currently in our schedule and we have to provide some means of connecting the Status Board.
Jack's study indicates that a 44-pin 7064 would be able to implement the proposed Shear and Y-valve controller. However, Maha reminded us that we had decided (see report 10 topic Rpm And Lpm [Cdxsys.doc-RpmLpm]) to move the strobed fluid sensor controls (see report 7 topic Left Panel Module [Cdxsys.doc-StrobedFluidSensors]) from the LPM to the RPM, for which the 44-pin 7064 would not provide enough I/O.
After briefly reviewing the strobed fluid sensor requirements, Coleman and David suggested that a 68-pin 7064 would probably provide sufficient internal as well as external resources, based on the fact that the 7032 was only one macro cell short (if logic equations are used instead of state machines) for the Shear and Y Valve controllers. Maha and Jack did not necessarily agree with this assessment. Robert mentioned that the 5 volt 7064 is considerably more expensive than the 3.3 volt version. According to Maha, Andy was considered using an Actel equivalent of the 7064 that might be cheaper. David objected to using Actel because the anti-fuse technology renders devices untestable by the manufacturer (not to mention the fact that we are continually changing our designs). David proposed looking at Cypress components. This proposal was not taken seriously, but Cypress affords several advantages at least during development. Its VHDL and Verilog tools are practically free and most of its parts can be programmed in or out of circuit. We should also consider the possibility of using the 3.3 volt Altera parts in our 5 volt system. If signal levels are acceptable, the cheapest solution may be to use an LDO linear regulator to power just the PLD.
SHEAR VALVE AND Y-VALVE CONTROLLER
Maha presented her review of the design proposed in report 7 [Cdxsys.doc-ShearValvePld]. She found and corrected the following deficiencies:
One more error, apparently overlooked by the review, is that both the Shear Valve and Y-Valve equations contain an under-current term, SVx_UC and YVx_UC. The under-current detectors have been eliminated and so should these terms.
In the original proposal, Y-valve 0 was not selectable as simple I/O, unlike the other valve sections. This was only due to a PLD pin shortage. Since the next size up affords an additional 24 pins, this limit no longer exists. Y-valve 0 should have an SV0_EN mode select input. It and Y-valve 1 will now be identical and can be described by YVx equations similar to the Shear Valve equations.
The following is a revised version of the PLD Product Terms section of report 7.
The controller logic contains two kinds of product terms, external and buried. External terms comprise motor controls and scan chain data (MOSI and MISO outputs). Buried terms comprise the scan chain shift registers and the four latched shear valve home position states, SV0_CWACK, SV0_CCWACK, SV1_CWACK, and SV1_CCWACK. These states and the four raw y-valve position inputs, YV0_CWH, YV0_CCWH, YV1_CWH, and YV1_CCWH, are latched into the 8-bit MISO register when NPCS is high. Latching may be asynchronous or synchronized on the global clock, which is driven by the scan chain's SCK. These eight signals are also used in motor control product terms. The shear valve state terms have the following equations:
SVx_CWACK = SVx_EOT & SVx_CWL & !SVx_CCWACK | SVx_EOT & SVx_CWACK & !SVx_CCWACK & !PRST
SVx_CCWACK = SVx_EOT & SVx_CCWL & !SVx_CWACK | SVx_EOT & SVx_CCWACK & !SVx_CWACK & !PRST
The conceptual meaning of these equations is basically that if EOT and the related overtorque are both true then the shear valve has reached the CW or CCW home position. This is latched by allowing the signal to hold itself true. One caveat that applies to both the initial home event and the latched state is that they can't be true if the other home position is true. This prevents the startup EOT coming out of CW toward CCW from latching CCWACK and the EOT from CCW toward CW latching as the CWACK-- remember, EOT is not position-specific. PRST should clear both the MOSI and MISO shift registers if possible.
The 8-bit MOSI shift register outputs are internal motor control signals or simple outputs (for disabled motor controllers). As motor control signals, these are:
The remaining shear valve controller equations are:
SVx_CW = SVx_EN & SVx_ON & SVx_CW_REQ & !SVx_OC & !SVx_CWACK & !PRST | ; shear valve control
!SVx_EN & SVx_CW_REQ & !PRST ; simple output
SVx_CCW = SVx_EN & SVx_ON & !SVx_CW_REQ & !SVx_OC & !SVx_CCWACK & !PRST | ; shear valve control
!SVx_EN & SVx_ON & !PRST ; simple output
The y-valve controller equations are:
YVx_CW = YVx_EN & YVx_ON & YVx_CW & !YVx_OC & !YVx_CWH & !PRST | ; y valve control
!YVx_EN & YVx_CW & !PRST ; simple output
YVx_CCW = YVx_EN & YVx_ON & !YVx_CW & !YVx_OC & !YVx_CCWH & !PRST | ; y valve control
!YVx_EN & YVx_ON & !PRST ; simple output
The eight MISO scan chain registers are loaded during NPCS high. The load data are:
SVx_EN & SVx_CWACK | !SVx_EN & SVx_CWL
SVx_EN & SVx_CCWACK | !SVx_EN & SVx_CCWL
YVx_CWH
YVx_CCWH
Maha and Jack will review this again for residual errors.
As described in System Design report 9 topic Hardware Partitioning, there are six strobed fluid sensors [Cdxsys.doc-FluidSensors]. The new design will provide 8. Report 7 topic Left Panel Module [Cdxsys.doc-StrobedFluidSensors] discusses two means, hardware or software, of providing the longer of the two periods. Jack and David agreed in the current meeting that software will provide the period. Hardware only needs to respond to a trigger from software (via the scan chain) and, otherwise, do nothing with the fluid sense circuits.
Jack and David described the theory of the sensing mechanism. To reiterate, the fluid sensors are capacitively coupled to the circuit. The coupling capacitor presents an open circuit when the sensor is not contacting fluid and a ground-referenced capacitance otherwise. The other end of the coupling capacitor is attached to one end of a charging capacitor, the other end of which is grounded. The common node is normally discharged to ground through a resistor. When the mechanism is triggered by a signal from the CPU a transistor is turned on to charge the common node. The charge rate depends on total capacitance, which depends on whether the coupling capacitor is floating or grounded. The node voltage is (continuously) compared to a reference. After 10 usec (with the components used in the original FCM circuit) if the capacitor is floating, the threshold will be crossed; but if the capacitor is grounded, the node voltage will not have reached the threshold. At this time, the circuit latches the output of the comparator.
To render this circuit in the PLD requires the following elements:
As discussed in the meeting documented by report 10 [Cdxsys.doc-RpmLpm], the LPM will comprise a relatively small, simple I/O facility that can be duplicated for larger systems. Maha presented her review and corrections of the constituents of the new LPM.
As mentioned for the MSM Door Release, the transistor's base resistor needs to change from 10K (12K in some instances) to 2.7K.
David pointed out that the ULN2003 logic inputs float when scan chain reset is active. If they float high, high power will be applied to all solenoids. To avoid this, the control input to each ON (pulldown) driver pair should be pulled low by a 10K resistor to digital ground. With the pulldowns turned off, it doesn't matter what value the high-voltage drivers assume. However, floating the ULN2003 inputs may not be good for the IC.
David questioned the choice of Schottky diodes for the high/low voltage blocking diodes. These originally were 1N4002 (even a 1N4000 would suffice). If the cheaper diodes are available in surface mount, we will use them instead of the Schottky diodes.
As with all of the other modules, David questioned the use of resistors on the line driver inputs and receiver outputs. These are actively driven, so passive pullups seem to serve no purpose. They may be an unreviewed vestige of the interim design. All interim design circuits are suspect. Unless someone comes up with a rationale for these resistors, they should be eliminated.
The RPMD schematic was briefly discussed. David had expressed concern that the motor drivers' I0 and I1 inputs might be floating at scan chain reset. Both I0 and I1 should be pulled high by 10K resistors (to +5 volt). David reviewed the schematic and concluded that the proper pullups were in place. However, upon closer inspection of Rev. 11, it appears that this is not the case. This should be clarified before fabrication.
The line driver input and receiver output resistors should be removed unless we conclude that they serve a purpose.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 15
4/21/2000
[Report14] [Report16]PARTICIPANTS
John V, Robert D, James C (Coleman) R, Jack W, Maha N, David McCracken.
VENUE
DISCUSSIONS
LPM AND RPM
2/21/2000 EMAIL BY MAHA
This is a follow up on the changes discussed during our last meeting:
LPM:
Coleman and I re-examined the function of the 10K pull up resistors on the MC26C32 outputs and concluded that they are indeed unnecessary. They have been eliminated from the design.
The Schottky diode has been replaced with a surface mount '1N4148W' diode. It's the simplest and cheapest SMT diode I found in the system.
Ed Sewall and I looked at the ULN2003 internal schematic. It shows that all input pins are internally pulled down with a ~10K resistance. Therefore we think that an additional external pull down is unnecessary. Please let me know if you agree.
RPM:
I might have not made myself clear concerning the FPGA we were thinking of using: Although I simulated using an Actel as a beginning experience, Andy had mentioned that Altera parts are preferred, as they are reprogrammable. However, he suggested to use ATMEL as an identical but cheaper alternative to Altera especially the 5 volts part - which is what we needed-
I will check into the cypress option, as well as the cost effectiveness of using a 3.3 volt part in a 5 volt system.
2. I noticed that Jack is using 594's for the rpm boards, but we are using 595's because we needed the extra gate for NPCS control.
REPLY FROM DAVID MCCRACKEN
1. I agree with your conclusion regarding the ULN2003 input.
2. Regarding the RPM FPGA. I'm sorry that I didn't represent your comments correctly in my report. I remember that you had suggested Atmel, which I think is a good choice. Atmel is generally inexpensive and high quality.
3. Regarding Jack's RPM schematic; he is just using the interim design for getting up to speed on the Altera tools and to develop a bubble detector, which he will soon present for review. He isn't proposing that any portion of the interim circuitry be retained. As far as the 594 is concerned, we don't want it used anywhere because it is unavailable.
DESIGN MEETING 4/11/2000 IN SANTA CLARA
Coleman reported increasing the shared (between CPU and FPGA) memory pointers in the MSM in order to accommodate the increase from 14 to 20 motors. See Report 14 topic MSM-- Event Memory [Cdxsys.doc-EventMemory]. Report 14 suggests that having the FPGA read the shared memory as words instead of bytes would benefit the CPU by using the bus more efficiently [Cdxsys.doc-EventMemoryAccess]. Coleman asked why this would be the case. David explained that this would free more bus time for the CPU, but this would only be true if the FPGA were using cycle-stealing. Coleman explained that burst mode is being used. This is actually much more efficient than cycle-stealing because it is not only inherently faster to skip all but one arbitration cycle but, additionally, the FPGA and SRAM can cycle significantly faster than a normal CPU bus cycle. David asked whether the burst duration could become long enough to create a problem for the CPU. Coleman pointed out that the maximum number of readings would be 21 (one event count reading plus 20 event readings) which, at a 40 nsec. cycle rate, leads to a maximum CPU bus lockout of only .84 usec. This is certainly not a problem for the CPU and reveals clearly that the decision to use burst-mode instead of cycle-steal is correct.
Coleman reported that the 4 MHz motor scan chain rate presents no problem for the FPGA. It should be pointed out that although this rate (or at least faster than 2 MHz) is required to support more than 14 motors, it is also desirable in the I/O as well. Increasing the scan rate reduces I/O control latency, which obviously can improve performance. More subtly, it can simplify script and/or analyzer programming by making events in two different cycles of the scan chain appear more nearly simultaneous to the relatively slow electro-mechanical system. Ideally, the scan chain would provide instantaneous response, like a parallel bus. We can't achieve this, but replacing all of the 75174/175 chips with 26C31/32 devices allows us to easily double the rate. We want to do this to all I/O scan chains (APU and MSM) as well as to the motor chain.
Coleman asked for confirmation that the 32 usec. period of the motor scan chain remains in effect for 20 motors and that it applies only to the motor scan chain-- I/O scan chains cycle is as fast as possible. This is correct. Also, reiterating previous decisions:
FPGA PORT AND REGISTER ADDRESSES
Maha asked whether there were any rules or requirements for addressing FPGA internal resoures that are visible to the CPU (APU or MSM). None are imposed for the benefit of software, which can easily adopt any assignment that suits the FPGA.
Coleman asked whether it was still necessary, given the addition of FPGA self-programming from a configurator (serial memory), for the MSM CPU to be able to access flash memory independently of the FPGA. This requires a separate PLD (22V10), which he would like to eliminate. David answered that independent CPU access to RAM can be convenient when using the BDM to debug hardware, because it may afford the only means for the CPU to execute a program. However, this is not feasible for the MSM, because the FPGA also accesses main RAM as a bus master and must arbitrate access.
There is only one reason for independent CPU access to flash memory-- to support the CPU in programming the FPGA. We discussed whether to continue to support this method in addition to the other two methods of programming, through the Bit Blaster and through the configurator, and decided not to burden the hardware design with supporting this third method. I (David) would like to reconsider this decision. I originally asked for the configurator to alleviate problems in the interim design's handling of the FPGA. The virgin system programming method was extremely complicated and could not be automated; our standard BDM debugger could not be used at all; the FPGA configuration image in flash could easily be corrupted; and the FPGA could easily be programmed in a way that caused it to be destroyed. We have corrected these problems by modifying the analyzer program, tools, and methods. The only remaining problem with CPU-based FPGA programming is that it wastes time during analyzer program development, because it requires an extra step every time we reboot under the debugger. This may sound petty, but the accumulated time waste is significant. On the other hand, this method affords greater convenience than the configurator when distributing an FPGA update because it can be done entirely through the communication system, without even opening the analyzer. The CPU-based method also saves the cost of the configurator, but, as implied by Coleman's question regarding flash access, at the expense of a PLD (22V10) which also requires programming.
The Bit Blaster method is required, because it is far preferable to the other two methods for testing an FPGA program under development (in fact CPU-based programming can't begin to function in a virgin APU until the Bit Blaster has established an initial configuration). Given that we now have a reliable and reasonably convenient means of programming the FPGA via the CPU, this might be the preferred method for production and maintenance, but we would still like to have the configurator during analyzer program development.
The PLD in the FPM circuit, from which the MSM is derived, provides both RAM and flash access. Only flash is actually needed. The FPGA doesn't have to be involved in any way with flash access, so the CPU can more-or-less directly access it. CS0 can be connected directly to flash CS. The CPU's R/W can probably connect directly to flash WR, with CS used for write cycle timing. However, flash OE and the DSACK0 feedback to the CPU must be generated by circuitry, which precludes eliminating the PLD entirely. Therefore, the question that we must answer is whether it is better to have the configurator or the PLD in production. Both need to be programmed and the PLD costs less (especially if the 22V10 is replaced by a 16V8) . We want the configurator as an option in any case for the convenience of program developers but it doesn't have to be stuffed. As an option, it may not even require a socket, as it may be possible to put the device on a small adapter that plugs into the Bit Blaster connector.
If we decide to use CPU-based programming for production, the BIOS program would not change. It now checks FPGA Done signal for high, indicating that programming has already been done, to avoid overwriting a new configuration set up through the Bit Blaster. As far as the program is concerned, there is no difference between the Bit Blaster and the configurator. However reset is affect by this decision. This was the next topic that we discussed.
We discussed reset, assuming that CPU-based FPGA programming was no long an option. Coleman suggested that the CPU be held reset until the configurator is done programming the FPGA. Clearly, this is correct if the CPU depends on the FPGA for access to flash memory. I requested a reset button, adding yet another reset source. The reset sources are:
We decided the following:
Although we didn't consider it at the time, item 4 nearly solves the problem of how to handle reset for CPU-based vs. configurator-based programming of the FPGA. When the CPU programs the FPGA, its reset should be connected to the composite reset. Power-on should be added to the list of composite reset sources, so that the CPU will be reset by all sources except FPGA self-programming phase. This doesn't interfere with configurator-based programming, because the FPGA was itself going to combine power-on with the composite reset. Doing this externally can only simplify the FPGA. A separate power-on FPGA input is still needed to trigger the self-programming sequence.
The apparent conflict between the CPU's port A pins and the configurator would not materialize, because the port pins reset to input and the CPU would not change them to output unless the FPGA Done signal indicated that programming was needed. A conflict would arise only if the configurator were plugged in (or stuffed) and CPU reset were jumped to the composite signal instead of to the FPGA's reset out. This would happen only due to user (engineer) error (when using the BDM to debug a bad board, configuration can be skipped).
Whether or not we decide to support CPU-based FPGA programming, the power-on signal should be added to the composite reset; omitting it was just an oversight. With this change, the reset circuit has no bearing on the CPU-based programming decision.
TOWER CONTROL
Coleman asked for clarification of the plans for tower control. The plan is to eliminate the interim design's tower board, moving its functionality to the MSM. The reasons for this are:
Coleman was unclear about the three different means of providing I/O mentioned in report 14 topic MSM Conclusions [Cdxsys.doc-TowerIo]. The different means are as follows:
Report 14 doesn't suggest a preference. I (David) prefer the first choice, because it preserves FPGA pins compared to option 3 and is uses less wiring and smaller ICs than option 2.
Report 14 topic MSM- NPCS High states that the 20-motor scan chain length would be 88 bits if a feedback register is included [Cdxsys.doc-MotorScanChainLength]. That reference includes a link to Report 4 topic Missing Motors [Cdxsys.doc-MotorScanChainFeedback]. Coleman asked for additional explanation.
Another reference to this is in Report 5 topic Scan Chain Topology [Cdxsys.doc-ScanChainFeedbackRegister]. The feedback register would comprise some number of output bits, either in a dedicated shift register or a portion of a register used for application purposes. The scan chain controller or the CPU could write test patterns to the feedback register and read them back at some MISO position if the MISO chain is not completely full, which is likely to be the case. This would enable continuous testing of the scan chain's integrity under normal operating conditions. The MISO position in the FPGA at which the feedback register's image appears depends on the number of external MISO registers (which must be at least one less than the FPGA's MISO length.
One aspect of the feedback register that Jack has discovered is that its serial shift register cannot continue shifting during NPCS high, as does the 74HC595, if it occupies the end of the output leg, because this would cause the pattern to be lost. However, if it does not occupy the end of MOSI leg and the next downstream register shifts during NPCS high then it too may do this. The only effect that this would have would be to shift the pattern in the FPGA's MISO memory. Since this position depends on factors outside of the FPGA's control anyway, the shift would be immaterial. In any case, the CPU would either have to directly set up and test the pattern or tell the FPGA where to expect the feedback image in the MISO array.
Coleman suggested that, for now, he simply add an 8-bit register to the FPGA's motor MOSI array, i.e. to make it 88 bits instead of 80. The MISO array should also contain 88 bits.
We did not discuss how to handle the feedback register. It has several unusual constraints. One, as already mentioned, is that the FPGA can't embed knowledge of its MISO image address. The FPGA also can't know the register's MOSI source address, because the MOSI chain length is not fixed. Depending on the number of motors in the external chain, the MISO reflection of any particular MOSI byte may appear in the same shift cycle (in which case an external register would not be needed), delayed by one cycle, or never (if the output byte ends up in an input register, it will be overwritten by input data). This would suggest that the CPU be responsible for managing the feedback test. However, this would not support continuous testing. Moreover, the CPU has a problem writing the pattern. It does not have or need direct access to the motor scan chain MOSI registers in the FPGA. The only means that it currently has for writing into the register would be to treat it as a motor, incrementing the event count and providing an event descriptor byte that sets or clears just one bit of the register at a time. This arduous process would complicate normal motor control.
Having the FPGA perform the integrity test automatically and continuously would be more reliable and much easier on software. This would not be terribly difficult but it would consume FPGA resources. The FPGA would contain two 4-bit registers, into which the CPU would write the MOSI and MISO byte indices corresponding to the feedback register. The FPGA would write a fixed pattern, such as 10111000, to the MOSI address. This would have to be done on every scan cycle after the FPGA clears the MOSI registers. At the end of each scan cycle, the FPGA would compare the byte located by the MISO index to the fixed pattern. If they are not identical, a status flag would be set for the CPU to read and the CPU would be interrupted. It doesn't matter whether the feedback image is delayed by one cycle or not since the pattern is constant. The CPU can discard the first interrupt caused by the integrity test in case it results from the first such delay. To avoid interrupting the CPU in a system that does not implement the feedback register, the FPGA should not perform the test (or at least not generate the interrupt) until the CPU has written to one of the 4-bit index registers. The CPU would write to the controlling register after writing to the other.
Considering that this affords a continuous and nearly infallible test of the scan chain's integrity, it seems worth implementing if the FPGA can support it. It should also be extended to the I/O scan chains, changing only the size of the index registers according to the length of the MOSI and MISO arrays. Except for the FPGA resources, this only has the shortcoming that it steals a byte of useful I/O space. We can design around this and if the byte becomes desperately needed, the CPU only has to not write to the index registers to regain control of the lost byte.
Report 14 topic MSM Conclusions assigns Robert D the task of determining whether to use one 10-pin and one 14-pin or one larger connector for the Loader connector (on the MSM board). We decided to use the two connectors, because the system uses them in other places.
David asked to change the PC104 connector, explaining that half of the pins are unused, that we have no intention of connecting any real PC104 board to it, that synthesizing Intel CPU signals from the 68340 bus wastes FPGA resources, and that not having 68340 bus signals prevents efficient use of the connector for real potential uses, such as:
Robert suggested an alternative to the bus extension connector for logic analyzer access. All of the CPU signals are routed to a standard pattern on the circuit side of the board. A matching empty device carrier or socket can be soldered to this pattern for attaching a logic analyzer probe specific to the target (68340). It seems that the cost of providing at least the pad pattern is practically nil, as we would want some arrangement of pads for the bed-of-nails tester anyway. Jack and David should find out what kind of probe is available for the HP logic analyzer that Jack uses. Absent an alternative, we should match this.
We decided to connect the following signals to the 64-pin ex-PC104 connector in any convenient arrangement (if it's no trouble we might want to arrange in logical groups, such as data, address, and control):
The usage of all of the signals except those involved in bus mastering is obvious. If the circuit attached to the connector is a bus master, it will have to coordinate with the MSM, which is also a bus master. The circuit can attach directly to the CPU's BR (Bus Request) which is open-drain, but the MSM (which also has its own direct BR connection) must arbitrate the CPU's BG (Bus Grant) signal. If the motor controller has asserted its BR then the FPGA should not assert BG to the expansion connector. The connector circuit can attach directly to the CPU's BGACK, because this is open drain and because this circuit's BGACK would not have reason to assert unless the FPGA had already given it the BG.
THROUGH-HOLE VS SURFACE CONNECTORS
Maha reported an objection from manufacturing to the surface-mount 3-pin connectors on the RPMD (Right Panel Motor Driver). These components require a secondary operation. The main reason for using surface-mount is to avoid secondary operations. The surface-mount parts are difficult to get and much less reliable for connectors than through-hole. They are especially unjustified for the RPMD, whose motor drivers must be through-hole.
We agreed to use through-hole parts in the future for all connectors.
STROBED FLUID SENSORS
The original strobed fluid sensor circuit from the CD3000 series (see Cdxsys14 topic Strobed Fluid Sensors [Cdxsys.doc-RpmFluidSensors]) is replicated with several different charging RC combinations, all with the same tau. Maha wanted to make all of these identical.
We can be sure that the variations were not accidental, but their purpose is undocumented. Likely possibilities are:
Although the most obvious characteristic of the charging circuit is its tau, which determines the sense voltage rise time, different charging RC combinations with equal taus are not equal circuits without including the coupling capacitor and the resistance of the fluid. Looking at the net circuit, C1 (charging capacitor) is charged through R1 (charging resistor) while C2 (coupling capacitor) is charged through R1 plus R2 (fluid resistance). When R2 is high (no fluid) the rise time to the threshold is less than when R2 is low. Ideally, the shortest and longest rise times (to the threshold) are centered around the capture time (Jack says this is 10 msec, but Maha and I think that it is 10 usec), because this provides the greatest error margin.
When R2 is infinite (no fluid) the tau that Maha has considered really is all that matters. When R2 is less (fluid present) the network tau determines the rise time. The goal of the circuit is for each instance to have the same charging RC tau and the same network tau as all of the others. The actual charging RC values can be varied to compensate for the coupling RC values. We can't just choose R2 (fluid resistance). It varies according to the electrical properties of the fluid and by the emersion level of the electrodes at the point that we want to identify as "fluid present". If this can be determined, component values can be chosen by solving the simultaneous equations for fluid present and fluid absent rise time. The "fluid present" tau can be simplified by lumping C2 and C1. C2's equivalent capacitance is determined by the relationship of R1 to R2 (because Q = CV). The network tau is approximately R1 * ( C1 + C2 * R1 / (R1 + R2)).
We can design the circuits to produce nearly any arbitrary margin but component values may become unreliable at extremes. 7.5 vs. 12.5 usec. would provide a 50% margin and should be reasonably attainable. However, we currently have no idea about R2, the fluid resistance (at the transition point). I (David) volunteered to find someone to take some measurements. It should be pointed out that this discussion has used tau for convenience but it can't be directly used for time calculations, because the threshold is 50% (2.5V out of 5V) not 63%. Actual times are calculated by .693 * RC, which is the 50% case of Vout = Vin * (1 - e^(-t/rc)).
David mentioned that Maha's FPGA output driving the base of the PNP discharge transistor should be Open Drain to be sure that the transistor cuts off. Coleman clarified this by noting that the FPGA uses tri-state to simulate this effect. The low output is active while the high output is tri-state. David suggested removing the pullup resistor, but Coleman made the very good point that it is difficult to debug a circuit with floating nodes. We should make this a general rule: where it doesn't interfere with circuit operation, any node that can float (assuming this is itself safe, as in this example) will be weakly pulled up or down to facilitate debugging.
Maha asked how to connect the fluid sense inputs. CD3000 series instruments use several multi-signal connectors. Jack mentioned that some of the signals go through two connectors before going out to sensors. We decided that this arrangement is too specific to a particular instrument for our needs. Each sense input will connect to a separate 2-pin jack. The other pin is connected to analog ground.
SHEAR AND Y VALVE CONTROL
System design report 7 topic Shear Valve and Y Valve Controller contains a diagram of a new shear valve control circuit [Cdxsys.doc-ShearValveCircuit]. Maha asked about the meaning of "10+2" in the new shear valve jack. This "jack" is a conceptual representation (i.e. gobbledygook) of the 2 motor drive wires and the 10-wire ribbon cable that comes off the existing shear valve position sense board (this is J1 in the CD3000 series Shear Valve Motor Control circuit 0900). Considering that the motor drive wires must be heavier than the ribbon cable wires and that they will produce a lot of EMI, they should not share the ribbon cable connector.
Maha also wondered where the EOT was in the new shear valve connector. It is supposed to be there. The connector carries +5, CCWL, EOT, CWL, and Gnd. The remaining 5 pins are not connected.
Maha pointed out an error in the PLD Product Terms for shear and y valve control. The PLD Requirements correctly identify only overcurrent signals, SV0_OC, SV1_OC, YV0_OC, and YV1_OC, but the equations continue to include the undercurrent signals, SV0_UC, et. al. The analysis in report 7 explains that undercurrent should be eliminated, because it is illogical for undercurrent to prevent turning on a motor.
4/18/00 EMAIL FROM COLEMAN
I reviewed the schematic for the interim Tower board and found the following:
There are two separate scan chains <0> and <1>. Scan chain <1> drives 1 stepper motor.
Scan chain <0> drives 2 solenoids (GS2 Release and Door Release), and 1 dc motor (spinner), and has inputs from GS2 Home, GS2 Height, GS1 Home, Door Open, MTR_OVR, and MTR_UNDER.
I know that the stepper motor on scan chain <1> is already included in the allocation of motors for the MSM motor scan chain (mpi), along with a second motor designated as an alternate for the dc spinner motor.
My question is: Are the 2 solenoids and dc spinner motor already included in the allocation of 64 bits for the MSM i/o scan chain (spi)? or, Do we need to add 8 more bits to the MSM i/o scan chain (spi)? I know that you suggested making the MSM i/o scan chain 128 bits instead of 64 bits, but do you really think that is necessary? Or could we get by with just adding 8 bits?
Another question I have is: Is there a new allocation for bits for the MSM motor scan chain (mpi) now that we have increased it for 20 motors instead of 14? I understood the old allocation of:
MSM - 2 for tower (probe & spinner)
1 for peristaltic pump
1 spare
RPMD - 4 syringe motors
1 wash block motor
1 spare
LOADER - 2 motors
SPARES - 2 motors
I guess my main question is, Where is the location of the shift registers for the spare motors?
Another question I have concerns the flags from the stepper motors to the input scan chain. Documents Cdxsys2.Doc and Cdxsys10.Doc both imply that each stepper motor interface has 4 flag inputs to the scan chain, but in actual implementation, each stepper motor interface has only 2 flag inputs to the scan chain. Both documents reference the Status Panel and Left Panel, which I have not seen yet.
Reply From David McCracken 4/19/00
[Not in reply : From cdxsys2.doc Motor Scan Chain Inputs:
At least two of the scan chain inputs are taken to standard three-pin flag jacks. The other two pins provide +5 and digital ground to an opto-interrupter. The flag pin is connected to the '597 input and to a pullup resistor.]
1. I chose the 64 bit figure for the MSM's I/O scan chain out of thin air. Reviewing the interim Loader board, I see that it has only 4 outputs and 8 inputs. Even with the additional tower requirements, 64 bits is more than sufficient.
2. Regarding spares in the motor scan chain, we don't need to fully populate the external chain. For both the I/O and motor scan chains, holes in the chain mean nothing to the CPU, which just reads and writes the FPGA locations that are defined in our configuration file. The unused positions only waste FPGA registers. It doesn't hurt anything for the FPGA's motor controller to write nonsense to unused slots, so it is not affected by how any many motors any particular system actually uses.
3. Regarding the motor feedback inputs, I think we should associate four inputs with each motor, even though only two are actually required, because having two spares doesn't cost much and could be very handy during development. Cdxsys2 topic Motor Scan Chain Inputs suggests the treatment of the two required inputs. I think that it would be reasonable, as long as the board has space to treat the two spares in the same way, i.e. each one connects to a pullup (e.g. 10K to +5) and one pin of a 3-pin jack, the other two pins of which connect to digital ground and +5.
4. A topic that we have not addressed related to unused scan chain resources is parameterization in the FPGA design. While a system will function perfectly well without utilizing all of the scan chain potential provided by the FPGA, the FPGA resources are precious. Since we are planning to support a variety of system configurations, being able to easily reduce the resources could allow selecting a smaller FPGA or combining functions into one. For example, the current architecture has the APU FPGA providing gather functions and the MSM FPGA providing motor functions. Both provide I/O. A smaller, single-board analyzer may be feasible, combining fewer motors, less I/O, and/or fewer gather channels, all handled by one CPU and FPGA. The analyzer CPU program already parameterizes I/O-related resources (the motor control program will also do this). To take advantage of a reduction in requirements, we only need to change one number and recompile the program.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 16
5/18/2000
[Report15] [Report17]PARTICIPANTS
John V, Jack W, David McCracken, and J. P. Y.
References:
DISCUSSIONS AND POST-MEETING ANALYSES
ECP HARDWARE AND SOFTWARE DESIGN
IEEE 1284 COMPATIBLE STATE MACHINE
Jack described the ECP state machine that he has been designing according to the IEEE 1284 specifications. The basic operation of the interface is as follows:
INTERFACE OPERATION
The basic operation of the interface described above leaves many unanswered questions, particularly regarding interface state transitions. These are difficult to answer effectively without taking into account the operation of software. First, we need to examine the basic packet transfer mechanisms. The analyzer and the PC's ECP driver are nearly identical functionally. In both systems, one DMA channel and one interrupt are associated with the ECP interface. To transmit, the DMA channel is programmed with the memory source address, the ECP port data destination address, and the message byte count. The DMA and interrupt controllers are programmed to cause an interrupt at the DMA TC (terminal count). Receiving a packet is more complicated. Hardware is configured to interrupt on the first input byte, at which time the ISR programs DMA to transfer the remainder of the packet (the first byte tells the length) from the ECP data port to memory. Again, an interrupt asserts at DMA TC.
The 68340 responds rapidly and efficiently to interrupts. The PC currently is running Windows 95, for which the interrupt response is slow (latency as high as 15 to 20 usec) and consumes considerable CPU time. Thus, reducing the number of interrupts is more important on the PC side than on the analyzer.
Another fact to consider is that the volume of data in the reverse direction (analyzer to PC) is at least 100 times greater than in the forward direction. This suggests that Jack's original decision to make input to the PC the default would be more appropriate. However, this is feasible only if we are willing and able to modify the ECP interface standard. We certainly shouldn't be afraid to change the protocol because we control everything except the PC's hardware. To make the analyzer the default talker, the PC would need to ask the analyzer for permission to talk and it would be the analyzer's prerogative whether to grant the request.
LISTENER-TALKER ROLE REVERSAL ANALYSIS
Although it would be possible to use any of the interface signals in any way that we like, we will encounter the least resistance by staying close to the actual 1284 standard, which uses only the three signals, nInit/nReverseRequest, nFault/nPeriphRequest, and PError/nAckReverse, for direction control. The PC drives nInit while the peripheral (analyzer) drives nFault and PError. This can't be changed, but the signals may be interpreted differently if they are independent of port configuration in the PC's port chip. Most important, they are not tied to port direction control. Section 6.1.3.9 of the Intel 82091AA data sheet tells that clearing ECP's ECR bit 4 enables interrupt on the low-going nFault signal. Axelson's table 15-2 also indicates this but uses the ambiguous name nError for the signal. According to the Intel data sheet (Fig. 31, page 76) the PC can read the status of nFault and PError from bits 3 and 5 of the ECP PSTAT register at I/O address Base + 1. Axelson's table 15-1 (page 289) ambiguously refers to this as DSR without explanation, but presumably this status register is present in all ECP devices. Section 6.1.1.1 of the Intel data sheet also shows these signals in bits 3 and 5 of the PS2-compatible PSTAT located at Base + 1. There is no indication that these two inputs are tied to any function other than the possible interrupt associated with nFault.
Port direction is changed by selecting PS2 mode (in the ECR) and then changing PCON (register at Base + 2) bit 5 (0 for forward and 1 for reverse). See Intel data sheet page 67, figure 27. While in PS2 mode, output signal nInit is controlled via PCON bit 2. Figure 32 on page 78 of the Intel data sheet shows a nearly identical PCON register in ECP mode. This figure erroneously states that bit 5 controls direction in PS2-Compatible and ECP Modes, which Axelson's sample programs and Intel's own data sheet (ECR register description on page 86) state is not the case. The direction can only be changed in PS2-Compatible mode. Intel's register description states that the direction can also be changed in mode 000. Since this is the unidirectional ISA-Compatible mode, the data sheet makes no sense. Despite Intel's confusion regarding direction control, it is likely that the nInit signal is derived from PCON bit 2, which can be controlled in both PS2-Compatible and ECP modes. There is no indication that it is associated automatically with any particular hardware function.
One apparent roadblock to reversing the direction control means is that the standard affords the default listener (peripheral) two output semaphores, nPeriphRequest/nFault to request the right to talk and nAckReverse/PError to indicate its direction reversal. The nPeriphRequest function is desirable to avoid polling but the function of nAckRequest, to avoid having the PC and peripheral attempt to simultaneously drive the data bus, is essential. The standard takes advantage of the fact that there are two signals available to separate the two functions, affording additional flexibility, but they could be merged into one signal. Consider that nAckReverse is actually need only at its desassertion, which is when it tells the PC that it can safely revert back to talker and that nPeriphRequest is actually only needed for its assertion and can remain asserted throughout the reverse transfer, serving in the role of nAckReverse when it deasserts. The following is a neutral description of the two-semaphore direction change means:
For the PC as default listener, nFault would serve as reversalAck because it can cause an interrupt. Thus, the PC can request to talk and do other work without having to poll to determine when to begin talking. It should be noted that the interrupt could be triggered by the first byte of a message from the analyzer even if the PC has asserted its talk request. nInit has to serve as the reversalRequest, as it is the only output available from the PC for this purpose.
SOFTWARE FOR TESTING HARDWARE
Jack asked for some sort of program to run on the PC to enable him to physically test his chip. The Nohau 68340 BDM debugger affords sufficient control of the analyzer side.
It is relatively easy to control a PC's parallel port in a DOS VM (Virtual Machine). Old real-mode programs run in DOS boxes under Win95 and have no trouble controlling legacy devices like the parallel port. The downside is that every access is trapped by the operating system, resulting in poor performance, which is obviously not a problem in Jack's test situation.
The simplest way to control the port is by using DOS debug in a DOS box. Only two commands are really needed, IN and OUT. The syntax for these commands is:
i<ioaddr>
o<ioaddr> <data>
To find the I/O address of a PC's parallel port, open Start- Settings- Control Panel- System- Device Manager- Ports- Printer Port- Resources. We are using the port in ECP mode and depending on it to have already been configured by either the BIOS or Windows. The device manager should call it an ECP port and its resource list should include two I/O base addresses, typically 0x378 and 0x778; an IRQ, typically 7; and a DMA channel, typically 3 or 1.
For example, using debug commands we can set the port's direction to output (must be done in PS2 mode-- EPP is supposed to also work but it didn't when I tried it); switch back to ECP mode; and then send a form feed character to the printer. The commands are:
DEBUG
>o77A 35 ;Write 0x35 to the port's ECR register to select PS2 mode.
>o37A 6 ;Write 6 to the PS2 mode control register to signal direction change.
>o37A C ;Write 0xC to the control register to change the data drivers to output.
>o77A 75 ;Write 0x75 to the ECR register to return to ECP mode.
>o778 C ;Send form feed character (0xC) to the printer.
At this point any number of characters can be sent to the printer, by writing to 778.
This example assumes that the port's base addresses are 0x378 and 0x778. It uses the automatic strobe feature of ECP mode to simplify sending multiple characters to a printer. To provide test signals to Jack's circuit, it may also be useful to put the port into the original SPP mode (ECR bits 7,6,5 <= 000) which doesn't generate any signals automatically.
Testing can also be relatively easily automated by writing a simple real-mode program for compiling by the Watcom C compiler (Microsoft's compiler doesn't support real-mode program development). The Watcom debugger (also unlike Microsoft tools) supports reading and writing I/O ports.
J. P. mentioned that he had been helping Ed Sewall find information on how the stepper motor winding test was used. We discussed the possibility of implementing a similar test for the solenoids. The basic concept is to provide electrical feedback from each driven device (one motor winding or one solenoid coil). To avoid having one feedback wire for each device, each one is connected to a common wire through a large value resistor (1M in the existing motor winding test circuit) that doesn't pass enough current to interfere with normal operation of either the associated device or any of the others that are also connected to the common feedback wire. To test a device, all others are electrically floated by disconnecting both their pull up and pull down drivers. In this state, the voltage across the device under test can be tested through its high value feedback resistor. The device driver is turned on. If the device is bad (open circuit) the voltage across it will remain high. If it is good, its low resistance will significantly reduce the voltage.
For the test to be effective the feedback resistors must not interfere with normal operation, i.e. the trickle current must not activate devices, and the voltage across the device under test when it is driven must be appreciably different from when it is not driven. The 1M feedback resistors limit trickle current to .024 mA from 24V even if all but one device are turned on. This is not enough to activate a motor or a solenoid. The existing motor chopper driver circuit also meets the second requirement. For the off state, all transistors in the FWB (Full Wave Bridge) can be turned off, allowing no current to flow. When one winding is turned on, the current flows from 24V through a pull up transistor (in the bridge), the motor winding, a pull down transistor (in the bridge), and a current sense resistor to ground. The winding's DC resistance is several ohms while the sense resistor is .5 ohm (typically-- there are several possible configurations). Even if the transistors' resistances were 0, the winding and sense resistor would present a fairly predictable voltage divider. If the winding is open, the tester will see 24V across it when the driver is turned on. If the winding is good, the voltage will be less by the inverse ratio of the winding resistance to the combined resistance of the two bridge transistors and the current sense resistor. For example if each transistor exhibits a saturation voltage of 1.2V (from the PBL3717 data sheet), the winding resistance is 3 ohm, and the current sense resistor is 1 ohm, then the differential voltage will drop to 16.2V. Even if the transistors' saturation voltage were 0, the differential voltage would be 18V.
The solenoid driver circuit presents a much trickier situation for measuring the unit under test. For one thing, the device cannot be fully floated because the pullup comprises two drivers, one of which is an uncontrollable diode to 12V. With the high power pullup transistor and the pulldown driver turned off, the top end of the coil will go to 12V, because trickle and leakage current will forward bias the low power pullup diode. More significantly, the voltage across the energized coil depends on the ratio between the drivers' effective resistance (due to saturation voltage) and the coil resistance; there is no current sense resistor. Further exacerbating the difficulty of distinguishing between an open and a good coil, the typical solenoid DC resistance is 24 ohm, making even a good solenoid look much more like an open circuit than, for example, a 3 ohm motor winding.
Our solenoid pulldown device is composed of two sections of ULN2003. The pullup is either a TIP125 to 24V or a 1N4002 to 12V. If the high power circuit is used for testing, the excitation current will be 1A, if low power, .5A. The TIP125's Vce is 1.2V at 1A (from Motorola's data sheet). The diode's Vf is 1V at .5A (estimate based on 1.6V at 1A specification). The ULN2003 Vce at 350mA is 1.2V (typical). The TI data sheet doesn't list Vce for higher currents. Assuming that the two sections are nearly identical, they should split the .5 or 1A current evenly and each exhibits approximately 1.2V Vce.
Given the relatively high Vce of the TIP (due to its being a darlington) there is no penalty for using the high power for testing. The greater current would both create larger (more easily distinguished) voltage drops and increase the pulldown's Vce. Therefore, high power is better for testing than low power. When the pulldowns are turned off, the only current flow will be a small reverse voltage leakage through the 1N4002 from the 24V and nearly the entire 24V will be seen across the coil (this must be measured as part of the test). When the pulldowns are turned on, the voltage across the coil will drop by approximately 2.4V. Detecting this 10% drop should be feasible.
Adding the solenoid coil test to the system would entail adding the two feedback wires to the standard scan chain cable. Since there are no 12-pin connectors, we would have to move up from 10-pin to 14-pin. This cable would be identical to the motor scan chain, as described in AD46 topic Cabling [Cdxsys.doc-MotorControlBus]. For the MSM, tower I/O drivers are located on the MSM board itself, so no cabling would be involved. We may not want to increase the wire count in the loader interface, so we might want to leave the loader's solenoids without feedback. However, we need to discuss this, because the loader contains some of the most difficult solenoids to test by inspection. The loader interface DB37 connector now contains 13 pins for loader power, as described in AD46 topic Cabling [Cdxsys.doc-LoaderInterface]. We could take two of these for solenoid feedback.
The decision whether to add solenoid coil testing should consider that adding wires to the scan chain increases connector pin count, which decreases reliability. We know that the small solenoids used in the CD4000 have reliability problems but we don't intend to use these in the future. For the large solenoids that we do intend to use, we need to consider failure history, not only in terms of raw number of failures but also the percentage due to open coils and bad connections, which are the only failures that the circuit can test.
J. P. mentioned that the CD1200 uses an efficient power-down mechanism that we might want to migrate to CDNext. He and Dave R introduced this at an earlier meeting as well but no one explained exactly how it operates. After the meeting J. P. and I (David) examined the circuit and determined how it operates.
The circuit, despite rumors, is not "software-driven" any more than the 3000 series instruments' solenoid drivers are. It is also not more power-efficient. It operates by providing two pulldown drivers to each solenoid. These operate in anti-phase, each at a 50% duty cycle. The high end of the solenoid is tied to 24V. If neither pulldown is enabled, the solenoid is off. If only one is enabled, the solenoid sees 50% (average) of 24V, i.e. 12V. If both are enabled, the solenoid sees the full 24V. This circuit may be more cost-efficient than the driver in the 3000 series because it doesn't require a separate 12V supply. The circuit itself may be cheaper as well. Both circuits (3000 series and CD1200) require two logic signals per device. The CD1200 circuit also requires the anti-phase generator, which is a simple square wave generator. But the 3000 series driver requires the TIP125 high power pullup plus one section of ULN2003 to drive its base. It also needs the low power diode pullup. The CD1200 pulldown is implemented as two ULN2003 sections in parallel. The CD1200's phase generator is inefficiently implemented in small GALs, but it is still probably cheaper than the 3000 series pullup devices. The phasing frequency can be very low, e.g. 60 Hz, to minimize EMI and transistor switching loss.
The CD1200 circuit turns off 8 control signals at a time by deasserting the LS374's Output Enable. It can't operate on a bank of fewer than 8 solenoids without wasting control signals, which for CDNext would mean scan chain bits. If this circuit were implemented on the scan chain, the HC595's OE bit could be controlled in the same manner as the LS374. Our intention to disable outputs at reset would have to be merged with the phasing signals. One disadvantage of the CD1200 circuit is that the circuit itself would dictate that the scan chain outputs be used only for solenoids. It would not be feasible to use them for other purposes since they would be floating half the time. Some of the disadvantages of the CD1200 circuit could be eliminated by using a PLD for the driver circuit. A pair of HC595's could be replaced by a single bank of 8 PLD outputs driven by two buried 8-bit registers driven in anti-phase. Except for the possibility of a very small glitch, a 100% duty cycle output would present an adequate control signal. Two bits would still be required for each output but at least the output could be used for something besides a solenoid. Also, it would not be necessary to control only banks of 8 solenoids because the PLD can be programmed to make any single output controlled by either one scan chain bit or by two alternately selected (anti-phase) bits. The PLD would be more expensive than two HC595's, especially when the cost of programming is considered.
In considering whether to design a scan chain version of the CD1200 solenoid driver circuit, we should also consider where to develop the anti-phase control signals. A square wave generator with gating and logic to combine the phase signals with scan chain reset is fairly cheap to implement locally on each board that contains solenoid drivers. However, if we were to implement the solenoid feedback test described above, it would leave two unused wires in the new standard I/O scan chain cable. These could be used to distribute phasing signals generated at a central point, reducing circuit redundancy while providing a convenient means to stop phasing at will, which would be helpful (maybe essential) in the feedback test.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 17
5/25/2000
[Report16] [Report18]PARTICIPANTS
Robert D, James (Coleman) R, Jack W, David McCracken, and J. P. Y.
References:
DISCUSSIONS
Referring to the suggestion in report 15 that MC68340 IRQs 3, 5, and 6 be routed to the auxiliary connector [Cdxsys.doc-AuxIrqs] Coleman explained that IRQ 5 and 6 are already being used for the I/O scan chain and motor controller interrupts. As the inclusion of IRQ 5 and 6 was only suggested as potentially useful, we agreed to neither implement the suggestion nor try to devise a substitute.
Coleman pointed out that the MSM schematic inherits from the APU and FPM schematic bubbles indicating that MC68340 IRQs 5, 6, and 7 are active high while IRQ3 is active low. He asked whether this was, in fact, the case, and whether they should be passively pulled appropriately high or low. Jack reviewed the CPU manual and concluded that all of the interrupts are active low but, because our software configures them as edge-triggered, they could be tied either high or low to avoid false triggering when not being actively driven. We concluded that they should be pulled up to reflect their true passive state. We could find no reason for some of the interrupts being shown without bubbles. Coleman will correct the component library.
Coleman asked whether IRQ7 should be used. David explained that IRQ7 is a level 7 interrupt, which is non-maskable. Jack added that the program uses a level 7 interrupt internally for the watchdog timer, but that can in effect be masked by not turning on the watchdog timer. They both agreed that there is no external mechanism that represents a sufficiently severe condition as to warrant being non-maskable. Coleman had found a reference to some kind of IRQ7 usage that might be related to memory. Memory parity or CRC error would be appropriate for NMI but we are not using memory error detection.
Coleman questioned the request in report 15 topic CPU Bus Connector [Cdxsys.doc-Done2] for DMA DONE2 on the auxiliary connector. This request was an error. We have no use for either DONE1 or DONE2.
Report 15 requests DACK2 as well as DREQ2 on the auxiliary connector. Coleman asked whether omitting DACK2 in the CPU's UART A interface to DMA channel 2 was an oversight. Jack explained that only DACK needs to be connected externally. The CPU executes the entire bus transaction internally for on-chip peripherals.
Coleman pointed out that he had connected DREQ1 to RxRdyA per David's request in report 14 topic Msm Dma [Cdxsys.doc-RxRdyA]. As with the TxRdy link to DMA2, DACK1 is not connected.
No usage had been specified for the CPU's timer counter pins, GATE1, TIN1, TOUT1, GATE2, TIN2, and TOUT2. David requested that one of counters (we arbitrarily chose T1) be used to generate an alternate source for the IMB (Inter-Module Bus between APU and MSM) serial clock. When the APU or MSM (which can communicate directly with the data station as well as through the APU) communicates with the data station using the HSSL interface, the data station HSSL card's transmit data clock (differential signal) is routed across the APU to the MSM, providing the IMB data clock. Unlike HSSL, the IMB uses just one clock for data travelling in either direction.
We are planning to replace the HSSL data station link with USB and/or parallel port, eliminating the IMB's clock source. Either the APU or the MSM needs to provide a replacement. This could be provided by any 500 KHz free-running clock, but using one of the CPU's timers would reduce component count. The APU already uses both of its timers. Neither of the MSM's timers was previously committed.
The timer can be programmed to operate as a nominal (Jack pointed out that we can't get exactly 500 kHz, but this is not a problem) 500 kHz 50% duty cycle oscillator. TOUT1 can drive the input of a 26C31 (conveniently, there is one spare). The differential output can drive the IMB clock, but this reverses the normal direction and can't be permanent. The 26C31 output can be tri-stated but only in pairs and the other section needs to be always enabled. Coleman also pointed out that the MSM contains a permanent terminating resistor for the master interface clock and the transmitter is not supposed to be terminated.
Coleman suggested that we not even attempt to hardwire the TOUT1-driven 26C31 transmit pair onto the IMB bus, but instead connect the IMB clock wires to either the 26C32 receiver or to the transmitter using two 3-pin jacks with the IMB signal pair on the center pins. We all agreed. A third 3-pin jack is needed to connect the CPU's SCLK to the receiver output when the master provides the clock or directly to TOUT1 when MSM provides the clock.
In our discussion regarding the motor and solenoid winding tests, we decided to use timer 2 for the ADC timer. TGATE1 is the only pin used for this.
David suggested that unused timer/counter pins be pulled up by resistors but also connected to holes, making them more readily accessible for prototyping experiments. Jack argued that the time/counter pins can only be used for simple I/O when the associated timer facility is disabled. Therefore, the unused pins are not useful. TGATE1, TIN1, and TIN2 should simply be pulled up while TOUT2 is no connect (see winding test discussion).
Coleman showed how the CPU's port A pin usage was returned to the original design (from APU and FPM) in which bits 7, 6, 5, 1, and 0 are used in programming the FPGA. This was in response to David's request in report 15 [Cdxsys.doc-CpuProgFpga] that we retain the ability to program the FPGA from the CPU.
SYNC
Coleman asked about the purpose of the "sync" attached to PA2. This is a vestige of the interim APU-FPM communication interface and is no longer relevant. PA2 is, therefore, available. We decided to use it and PA3 for motor and solenoid winding test.
WINK
Coleman asked about the actual operation of "wink", attached to PA4. David explained that wink is toggled in the main program loop to visually indicate that the program is functioning. This is like a visual watchdog and is almost redundant because the main loop also toggles the 7-segment LED between two patterns to indicate the same thing. There is a small difference in that the CPU can access the wink LED even when the FPGA is not functional. In practice, this difference is almost inconsequential. If the unit (MSM) is not fully functional in the field, it is simply replaced. If it is not functional during development, we have to use instrumentation (chiefly the BDM debugger) to determine where it is failing, and the crude distinction provided by the wink vs. the 7-segment display is not particularly helpful.
We agreed to retain the wink but later realized that port A pins could be useful for the motor and solenoid winding tests. We only needed two pins and they were available without discarding wink, but if any other uses for port A present themselves, we should sacrifice wink.
The wink LED driver was not efficient, consuming an entire comparator package and several resistors. Two comparator sections were already unused and we freed a third when we eliminated the spinner motor's under-current detector. Any one of these can be used for the wink LED. Also, the resistors used in voltage divider to generate the wink comparator's reference voltage were redundant, as the motor's over-current comparator's reference can be shared and is adequate for the wink.
To reiterate, we will keep the wink only as long it consumes available resources that are not needed for other purposes. It should be sacrificed if we need a port A pin or another comparator.
DOUBLE-LENGTH SHIFT
Coleman requested a further explanation of scan chain test mechanisms. His FPGA design includes a double-length shift mode for testing in both the motor and I/O scan chain, as suggested in report 3 topic SCAN CHAIN STREAM CONNECTIONS [Cdxsys.doc-ScanChainDoubleShift]. He questioned how this could work correctly, given the variability of the external MOSI and MISO chain lengths. He also wanted a more detailed explanation of the "feedback register" first suggested in report 5 topic Scan Chain- Loop Structure [Cdxsys.doc-ScanChainFeedbackRegister].
CONTINUOUS FEEDBACK CONCEPT
David explained that the double-length shift mode for testing is not necessary if the register feedback test mechanism (see report 15 topic Scan Chain Feedback Register [Cdxsys.doc-ScanChainFeedbackRegisterTopic]) which is much better, is implemented. The register feedback mechanism affords continuous 100% testing of the scan chain in normal operation. Jack added that continuous testing is almost imperative because of the fact that the scan chain requires I/O to be updated constantly. This means that I/O errors can occur even when the CPU is not writing I/O. We will present the cheap continuous testing as an advantage of the scan chain I/O mechanism over less-dynamic bus-based architectures that may not appear to need continuous testing but which also afford no cheap means to do it.
The scan chain feedback register test is conceptually simple but the implementation is somewhat difficult to understand. Jack and my (David) understanding of the mechanism is helped by having seen with the BDM debugger the phenomenon that provides the basis for the test.
The scan chain comprises four legs: the internal (to FPGA) MOSI registers, the external MOSI registers, the external MISO registers, and the internal MISO registers. The end (output) of the external MOSI leg connects to the end (input) of the external MISO leg. If the external MISO leg is not as long as the internal MISO leg (which is normally the case) some number of bits in the internal MOSI leg will appear in the internal MISO leg. The relative lengths of the four legs determines the number of bits, their MOSI source address, their MISO destination address, and when they appear. If the combined length of the two external legs is less than the length of one internal leg (internal MOSI and MISO legs have the same length) then some of the first MOSI bits to shift out will shift through both external legs and end up in the internal MISO length in a single scan cycle (delimited by NPCS). If the combined length of the external legs is greater than this, no internal MOSI bits will traverse the entire external chain in one cycle, but some of these bits stored in external MOSI shift registers will continue their traversal and end up in the internal MISO leg on the next cycle.
The longer the external MISO leg, the fewer MOSI bits that show up in the internal MISO leg. However, if even a relatively small number, comprising a distinct pattern (i.e. some mix of 0s and 1s) shows up in one or two cycles at the expected internal MISO location, it almost certainly guarantees that the entire external chain is functioning properly. Further, there is a good chance that this tests the external interfaces of the external registers as well as their internal shift registers, because most chip failures are not isolated even when they are caused by an external event at just one pin.
We would like the FPGA to perform the feedback test in every scan chain cycle in order to detect transient errors as well as permanent failures, but it can't know any of the characteristics of the phenomenon that depend on external chain leg lengths. We can simplify the information that is needed in two ways. One is that the test bit pattern doesn't need to change, as it would for a memory test for example, because each 1 and 0 in the pattern has to traverse the same path, which guarantees that "stuck at" and "stuck to" faults will be detected by a single pattern. Using a fixed pattern not only simplifies the test in itself but also provides the means to ignore the cycle uncertainty. After the first two scan cycles, whether the test pattern requires one or two cycles to reach its internal MISO destination is irrelevant. The FPGA only needs to delay before performing the test and this can easily be built into the CPU-FPGA API, relieving the FPGA of responsibility.
No trick can eliminate the need to vary the source (internal MOSI) and destination (internal MISO) locations of the test pattern according to the length of the external chain legs. Some MOSI bits will never appear in the MISO registers and the ones that do will not appear in fixed MISO locations. The CPU will know this information because it can be derived from the analyzer configuration file (analyz.ini) which the script compiler uses to translate source scripts to system-specific commands. Because the pattern is constant and occupies a static MOSI location, the CPU can write the source directly transparently to the FPGA in the case of the I/O scan chain. However, this won't work for the motor scan chain, because the FPGA clears the internal MOSI chain at the end of every scan and the CPU couldn't rewrite the pattern every 32 microseconds even if it did have direct access (which it doesn't now have). The CPU could write the pattern into event memory but this would be inefficient for both the CPU and FPGA, which would have to read one byte for each bit in the pattern every 32 microseconds. Therefore, the FPGA should automatically write the pattern into the motor internal MOSI chain itself after clearing all of the bits. The CPU tells the FPGA where to write by writing the address into a register. Similarly, the CPU tells the FPGA where to read the MISO image by writing the address to another register. The MISO pointer is needed for both the I/O and motor scan chains, while the MOSI pointer is needed only for the motor.
The CPU should first prepare the outgoing test pattern by writing it to the MOSI leg of the I/O scan chain and by writing the MOSI address of the pattern to the source pointer register of the motor scan chain controller. Then the CPU will write the MISO address of the reflected pattern to the destination (where the test pattern will show up) pointer register of both the motor and I/O scan chain controllers. The act of writing to this register can turn on the FPGA's MISO pattern test, which registers a fault if the MISO pattern is incorrect. The CPU will delay long enough between preparing the MOSI pattern and writing the MISO address to ensure that two scan cycles have passed. Thus, the FPGA will be able to perform the test without regard to timing or sequencing. However, the error flag should stick until cleared by the CPU to be sure that temporary faults are detected.
Given the significantly greater complexity of the motor feedback test (due to the MOSI pattern generation) and the fact that most motor failures can be detected by other means, we may choose to implement the integrity test only for the I/O scan chain.
If the combined length of the external MOSI and MISO legs is greater than the internal leg length, the integrity test requires a dedicated external register to hold the test pattern between scan cycles. Depending on the chain leg lengths, this would not necessarily have to appear at the end of the MOSI leg, but this location would support the longest external legs and should be used. The test register can't be used for real output because it will always be written with the test pattern. Also, it should not shift during NPCS high as does the HC595, our typical output register.
The register isn't needed if the combined external leg length is shorter than the internal leg length. If the external length is too long and an external register is not provided, the test can't be performed. This situation would not require changing the FPGA. Instead, the CPU would simply not write to the MISO pointer register and the FPGA would not perform the test.
CONTINUOUS FEEDBACK REALIZATION
Coleman suggested that it would be easier to understand the continuous feedback test if we would pin down specific cases. Although the test pattern could comprise as few as four bits, we agreed to use a full byte, arbitrarily chosen to be 10010011 (0x93). We also decided to reduce the length of the usable I/O scan chain to 48. The tower I/O consumes only 8 bits, leaving 40 for the loader, which is more than twice as much I/O as provided by the CD3200 loader controller (SHM).
For the MSM, we decided on the following:
If a loader is not used, the MSM I/O scan chain will not need an external feedback register for integrity testing because the combined external length will be less than the internal MOSI length. If a loader is used, its own I/O usage will determine whether it needs to provide a feedback register. The situation with the motor scan chain is more complicated. As with the I/O scan chain, the loader is the end of the chain but only if it is installed. Otherwise, like the I/O scan chain, the motor scan chain is terminated and looped back (external MOSI to MISO) on the MSM board. If we decide to implement the motor scan chain integrity test then the loader will include a feedback register. If the loader is used then all of the requirements for the motor scan chain test will be in place. However, if the loader is not used and more than nine motors are in the chain then the test can be performed only if some other unit in the motor scan chain provides the feedback register. Since our current system defines only 8 steppers other than whatever motors are in the loader, it is probably unreasonable to expect the MSM to provide an optional feedback register. It is more reasonable to expect this of some additional (as yet unspecified) motor driver board.
MOTOR AND SOLENOID WINDING TEST
MOTOR WINDING TEST
Coleman reported that he and Ed Sewall have been studying the stepper motor winding test described in report 4 topic Motor Winding Test [Cdxsys.doc-MotorWindingTest] and report 9 topic Motor Scan Chain [Cdxsys.doc-WindingTestBus]. Ed was particularly concerned about the statement "If a driver is turned off, the circuit leg presents a 2 Mohm load across the analog bus." in report 9, because he thinks that the PBL3717 motor driver chip affords no means to turn off all of the transistors in the bridge. We reviewed the internal schematic of the 3717 and confirmed that one of the upper transistors will always be on. Consequently, each "off" driver circuit leg appears as 24V connected to each wire of the feedback bus through a 1M resistor. Coleman volunteered himself and Ed to re-analyze the circuit based on this information.
Ed and Coleman were unclear about how the test is supposed to be conducted. David repeated his general interpretation, which is that all drivers except one are turned off (which we now know to mean connected to 24 V). The differential voltage across the bus is measured. Then one driver is turned on. If the winding is open (or the motor disconnected) current will flow through the resistor network, causing a measurable differential voltage. If the winding is intact, it will nearly short the driver, causing the differential voltage to be significantly less. In David's original erroneous analysis, the differential would have been 1.7V in the case of an open winding and nearly 0 otherwise. This difference is easily measured. Ed will determine how much of a difference we can expect given the real functionality of the 3717.
SOLENOID WINDING TEST
David presented excerpts from report 16, in which a solenoid winding test, similar to the motor test, is described [Cdxsys.doc-SolenoidTestCircuit]. The circuit analysis in that report is based on accurate (hopefully) data regarding the components and indicates that a 10% difference voltage difference could be expected between a good and a bad winding. However, it was based on the old solenoid driver circuit, which uses a pullup transistor to 24V for high power, a diode to 12V for low power and a pulldown driver. David also presented another excerpt from report 16, describing the solenoid power down circuit used in the CD1200. We all agreed that this is a better circuit and Coleman decided to use it for the MSM's on-board tower solenoid drivers.
The solenoid winding test described in report 16 calls for a two-wire bus as used for the motor winding test. Coleman pointed out that the high end voltage is not relevant with the elimination of the pullup devices. We agreed that only the low end wire is useful. This is particularly advantageous for the loader, because it means that the solenoid test would steal only one of its DB37 pins. For the I/O scan chains, adding even one wire requires increasing the 10-pin to a 14-pin connector because 12-pin connectors are not available. This is not a significant issue for the MSM, because its tower I/O circuitry is on-board. As reported in report 15 topic MSM-Loader Connector [Cdxsys.doc-MsmLoaderConnectors] we had decided to use two standard connectors, 10-pin and 14-pin, on the MSM for the loader interface. Adding a solenoid winding feedback wire would require the 10-pin connector to change to a 14-pin. The MSM-loader wiring harness is unique and this change is not significant.
David requested that Ed and Coleman review the solenoid winding test proposal, particularly in light of the driver circuit change.
WINDING TEST ADC
David had originally asked Ed to review just the single-slope ADC analog circuit from the old MPM board to determine whether it could function as suggested and to consider whether it could be replaced by a cheaper circuit (especially one that would not need a split power supply). There was some uncertainty about how the circuit was expected to function. David reiterated his understanding that the MPM CPU provides logic and timing control, while the circuit provides a means to discharge a capacitor and then charge it up to a set threshold at a rate determined by the voltage across the winding feedback bus. This appears to be a single-slope ADC, which is not very accurate compared to a dual-slope converter, but which should still be adequate if we take a reference reading.
Coleman suggested that we use some of the CPU's native I/O pins to control the two circuits, one for the motor winding test and one for the solenoid test. David suggested that the unused timer/counter T2 be used. If TGATE2 were attached to the comparator output, the CPU would be able to either count the charge time programmatically with TGATE2 configured as a simple input or use the timer to measure the duration of the TGATE2 period. Coleman suggested using one port A pin (PA2 and PA3 are available) to discharge one shared or two individual slope capacitors and one pin to select between the two sources.
We did not define how much analog circuitry the motor and winding tests would share, i.e. whether the signals are multiplexed at the analog input, in the middle of the analog circuit, at the output, or not at all. Given the addition of the solenoid test, we may want to reconsider whether the homemade single slope ADC is really the best solution. An off-the-shelf multi-channel input ADC may be easier for us than trying to multiplex the two signals in the homemade ADC and cheaper than simply duplicating the entire motor winding analog circuit. On the other hand, if we were to simply duplicate the analog circuit, it would not be difficult to provide the necessary control pins from the CPU. One approach would be to use TGATE2 for one comparator input and one port A pin for the other. The wink could be sacrificed so that two port A pins would be freed to provide separate discharge capacitor control signals. With this arrangement, TGATE2 could be configured as simple input so that both measurements are timed programmatically or it can be configured as the timer gate, in which case one measurement would be timed by program and the other by timer. Neither approach presents any software difficulties.
David's original analysis of alternatives to the old analog circuit rejects using a Norton current amplifier (like the LM3900) at the front end to reduce the high common-mode signal down to a more easily managed 0 to 5 or 15V. The reason was that the high input offset current variation would swamp out the measurement. We may want to reconsider this. If we take a reference reading of all drivers in the off state immediately before turning on the one under test, the component variation may be irrelevant, as it would affect both readings identically. The only problem would be if the good/bad measurement variation were so small that the required signal amplification times the input offset current could pin the amplifier's output regardless of the signal. Another consideration is that there are better current amplifiers than the LM3900, but some of these need a split supply. A high common-mode voltage tolerant diff-amp, like the INA117 or AD629, might also be used although they may be too expensive. The reason for bringing up this issue again is that it may be possible to redesign the analog circuit to make it more adaptable to multiplexing or cheaper to simply duplicate. Now that Ed and Coleman understand the basic premise of the winding test, they may be able to devise more efficient alternatives to the original proposal.
IMPROVED SOLENOID DRIVER CIRCUIT
David compared the CD1200 solenoid driver circuit to the one that the interim design took from the CD3000 series (see report 16 topic Pwm-Based Solenoid Power-Down [Cdxsys.doc-PwmSolenoidPower]).
The analysis in report 16 raises several questions regarding the scan chain connections, such as whether to transmit the phase signals on spare wires made available if the solenoid winding test is implemented or to generate them locally on scan chain I/O boards. For the MSM, these questions are easily resolved. In the case of tower I/O, all control wiring is local anyway. For the loader, the primacy of DB37 pins dictates that, if the loader does implement the new driver circuit, it must generate the phases locally. Also, the problem of the CD1200 circuit decreasing flexibility by dedicating outputs in groups of 16 bits to motors is not an issue for the MSM's tower solenoid controls, because they are implemented in the FPGA rather than by indivisible 8-bit registers.
Report 16 suggests a pwm frequency of 60Hz to minimize EMI but we all agreed that this would likely cause the solenoids to chatter in the low power condition. We decided to use 240Hz. This may also be too low, but it can easily be changed.
GAL FOR BYTE-WIDE FLASH
Coleman reported having to reinstate the PLD used for memory selection. It is needed only to tell the CPU (through NDSACK0) that the flash is byte-wide and only to support CPU programming of the FPGA (otherwise, this function would be implemented in the FPGA instead of the separate PLD). Coleman restored the original 22V10. He will review this and possibly use a smaller and cheaper device like a 16V8. In any case, we will use the pins not needed for the flash interface for other purposes.
Jack questioned why the reset needed any logical combination. Coleman explained that the circuit implements the requirements that we determined in report 15 topic System Reset [Cdxsys.doc-SystemReset] with changes to accommodate David's request to retain the ability of the CPU to program the FPGA.
HALT INDICATOR
Coleman asked about the purpose having the CPU's HALT signal control the 7-segment LED's decimal point, which he copied from the interim FPM circuit. This serves only as a crude diagnostic indicator of the CPU's condition. David reported seeing the DP turn on during early FPM development and not knowing its significance. Knowing that it indicates that the CPU has halted, which can be caused by a bad instruction or double bus fault, might be useful. However, we now do nearly all development under a BDM debugger, which provides specific information about such faults. Nevertheless, we may as well keep the circuit, because it costs nothing.
TDI AND TDO
Coleman asked what to do with TDI and TDO. These, in addition to TCK and TMS, support IEEE 1149.1 JTAG testing of the CPU. Since the board has no other JTAG-compliant components, boundary scan testing would not be cost-effective. TDI and TMS are inputs but the CPU has internal pullup resistors for them. TCK should be pulled up to 5V.
UNUSED INPUTS AND OUTPUTS
David requested that, generally, any CPU (or any other component) I/O pin that is not used be connected to a hole to facilitate connecting a wire in case we find a use for it during development. Unused inputs should also be pulled to 5V by resistors.
CONCLUSIONS
CDNEXT ANALYZER SYSTEM DESIGN REPORT 18
6/4/2000
[Report17] [Report19]
DESIGN ANALYSIS
Report 16 topic Ecp Hardware and Software Design - Listener-Talker Role Reversal Analysis [Cdxsys.doc-ReverseEcp] describes a protocol derived from the standard ECP. The derived protocol reverses the roles of the PC and the peripheral to make the peripheral the default talker. Per report 16 task 4, Jack has reviewed the protocol and made the following observations:
There are two basic approaches to direction reversal. The programmatic approach has the analyzer and PC programs negotiate using Jack's interface device as a simple conduit. The automatic approach has the PC program negotiate directly with the interface device. The latter is clearly preferable if it can be reliable. This means that there cannot be any point, however brief, at which the device and the PC's parallel port both try to drive the interface. They may be simultaneously set for input, but this must be a temporary condition that is resolved by control signals (and CPU involvement when necessary).
Jack's suggestion (listed above as observation 2) seems to afford a reliable means of reversing direction. However, the simple protocol fails in one scenario. Imagine that Jack's device starts transmitting a packet by driving the first byte onto the interface and asserting the transmit strobe. The PC may assert the reverse request just as the device asserts the transmit strobe. The device can't stop the transaction in progress, but if its output is automatically tri-stated by the PC's reverse request there is a risk of the PC capturing an indeterminate data value because the ECP's data exchange is automatically done in hardware independently of the program-based direction reversal request. This problem can be prevented by Jack's device transitioning to a data transmit state prior to commencing a transaction. In this state, the device can simply ignore the reverse request. As Jack has observed, his device cannot know by itself when it has transmitted the last byte of a packet, suggesting that such a data transmit state could not persist for the duration of a packet without assistance from the analyzer CPU. The CPU might be involved anyway at the end of the packet, but there is a simple way to establish the transmit state per byte. The transition to transmit state can occur when Jack's device latches its transmit byte. The transition out of transmit state can occur after the PC's receive handshake. Thus, the reverse request can combinatorially tri-state the device's output at any time that the device is not in the transmit state, and the only possible glitch would occur in the data just as the device drives it onto the cable when it would be in transition anyway.
The above analysis shows that it would be relatively simple to provide safe direction reversal with a low-level protocol. However, this approach leaves the analyzer vulnerable to a higher-level conflict. Imagine that the analyzer has left itself in a ready to receive or neutral state, i.e. not blocking the PC's request to talk. A direction reverse request from the PC will be granted at this time even if the analyzer CPU is simultaneously preparing to transmit. If the analyzer's DMA then tries to write the first message byte to the interface device there can be a conflict with the input byte from the PC. It may be possible to resolve this conflict at a low level, but a simpler solution is to require the analyzer CPU to negotiate for the right to transmit, not with the PC but with Jack's device. The device can respond immediately, so this represents a simple control flow situation for the CPU. Once the device grants the right, it can ignore the PC's reversal request. The device cannot rescind this right because, as Jack says, it doesn't know which byte is the end of the message. The analyzer CPU has to explicitly give up its right just as it must explicitly request it. This approach prevents low level direction conflicts by wrapping them in a protocol that prevents higher level conflicts. The PC can (and should) hold its reversal request throughout the transmission from the analyzer. At the end of the message, the analyzer can examine the reversal request signal (through Jack's device) and decide whether to give up its right to transmit. Conflict can exist only if the analyzer and the PC simultaneously request to transmit. Preferably the right would be accorded the analyzer but this is a minor issue. What is important is that the device resolve the conflict without generating glitches in the interface signals. Using the analyzer's and the PC's unsynchronized talk request signals combinatorially would create an opportunity for a metastable state to produce a prolonged glitch. The analyzer's talk request is inherently synchronized to the interface device's state machine clock by virtue of the fact that it is created by latching the CPU's data bus (when the CPU writes to the control register). To synchronize the PC's talk request, the device only needs to also latch it. The latched signals will be synchronized to the same clock and can be used combinatorially.
The PC's ECP signal directions can't be changed, but their usage changes for our inverted protocol. Without making major (and unnecessary) changes to the standard ECP definition, the only signal available for the PC's talk request is nReverseReq (Centronics nInit). We could use either nAckReverse/pError or nFault/nPeriphReq for the talk grant signal. The former is more consistent with the ECP standard but the latter would allow the PC to be interrupted when the grant is asserted. The interrupt could prove useful when Jack's device doesn't immediately grant the request. Consistency with the ECP standard is tenuous and of no particular value. The standard meaning of nAckReverse is to grant the opposite direction from our reversed protocol. Any automatic use of the signal according to the standard interpretation would provide exactly the wrong functionality. Therefore, it makes the most sense to use nFault/nPeriphReq. The PC can treat this as a simple input without interrupt if it chooses.
IMPLEMENTATION
FIRMWARE
Since all byte transfer signals are the same as for standard ECP, we will refer to ECP names except for the two redefined signals. Our inverted protocol will use a two-wire mid-level direction control, defining the two signals as follows:
Reverse name |
ECP name |
Centronics name |
Pin number |
nPcTalkReq |
nReverseReq |
nInit |
|
nPcTalkGrant |
nPeriphReq |
nFault |
An equivalent two-signal interface exists between the analyzer CPU and the interface device, except that the request signal, PeriphTalkReq, comprises one bit of a control register and the grant signal, PeriphTalkGrant, comprises one bit of a status register. As part of the control register, PeriphTalkReq is inherently latched and synchronized to the device clock. The device also latches nPcTalkReq in order to provide a similarly synchronized signal. In the following description, "nPcTalkReq" refers to the synchronized signal unless stated otherwise.
Quiescent State
When the parallel interface is inactive, nPcTalkReq and PeriphTalkReq are deasserted. The interface device deasserts nPcTalkGrant and PeriphTalkGrant. The device's analyzer data port is input (tri-state) as always except when the analyzer is receiving a byte. The device's PC data port should also be tri-stated, because this is the safest condition. For example, if the analyzer were plugged into a live PC not currently running the correct program, the two might both try to drive the data signals.
There really is no "default" talker. In the quiescent state, both the PC and analyzer are prepared to listen, and both must ask the analyzer's parallel port device for the right to talk.
Analyzer Talker Without Conflict
When the analyzer wants to transmit a message, it asserts the PeriphTalkReq bit in the device control register. It reads the device control register in the next instruction, which occurs "immediately" in the context of program execution but after a substantial delay relative to the response time of the device. If not already in the PC talk state (in response to the assertion of nPcTalkReq) the device responds to PeriphTalkReq by transitioning to the peripheral talk state, asserting the PeriphTalkGrant bit in the control register, and making its Pc data port output (it may already be output from the quiescent state). In this state, it ignores PcTalkReq.
The analyzer responds to PeriphTalkGrant by setting up (which may be done beforehand) and starting a DMA transmit. The DMA and port device transmit the entire message without further analyzer CPU involvement. During each byte transfer, the port device indicates "transfer in progress" in the status register. This indication can comprise either two bits that reflect the current states of Ack/PeriphClk and AutoFd/HostAck or a single bit representing Ack/PeriphClk active (low) or AutoFd/HostAck active (high), i.e. if either of these interface signals is active, a transfer is in progress. The logical combination by hardware is easier for software (more efficient and less likely to be misunderstood by the programmer).
After transferring the last byte, DMA interrupts the CPU (the port device is not involved in this). At this point, the CPU reads the status register, repeatedly if necessary, until it indicates that there is no byte transfer in progress. Then the CPU clears the PeriphTalkReq (by writing to the control register). If PcTalkReq is asserted, the device immediately grants the PC's request. Otherwise, the interface returns to the quiescent state. It does not automatically reverse the interface regardless of nPcTalkReq just because an ACK/NAK is expected from the PC. This is part of a high-level protocol, which is not implemented in firmware. However, the PC will assert nPcTalkReq either before the message finishes (which the port device ignores) or very soon afterward.
PC Talker Without Conflict
When the PC wants to transmit a message, it asserts the nPcTalkReq signal. It reads the nPcTalkGrant in the next instruction. If not already in the peripheral talk state (due to PeriphTalkReq assertion) the device responds to nPcTalkReq by transitioning to the PC talk state, asserting the PeriphTalkGrant signal while simultaneously making its PC data port input (it may already be input from the quiescent state). In this state, it ignores PeriphTalkReq.
The PC responds to nPcTalkGrant by setting up (which may be done beforehand) and starting a DMA transmit. After transferring the last byte, the PC's DMA interrupts its CPU, at which point, the CPU reads the Strobe/nWrite and Busy/nWait interface signals, repeatedly if necessary, until they are both inactive (Strobe/nWrite high and Busy/nWait low). Then the CPU deasserts the nPcTalkReq signal. If PeriphTalkReq is asserted, the device immediately grants the analyzer's request. Otherwise, the interface returns to the quiescent state.
Talker Request Conflict Resolution
The only situation in which a conflict can occur is when the analyzer and PC simultaneously assert PeriphTalkReq and nPcTalkReq. Metastability is avoided by using the nPcTalkReq as captured by the analyzer's port device clock. The port device resolves simultaneous requests by granting the analyzer's. Once the device transitions to either talk state, it cannot rescind this. The peripheral talk state needs no protection, because the analyzer will not deassert PeriphTalkReq until the message is complete and PeriphTalkReq has precedence over nPcTalkReq. The device must protect the PC talk state by not allowing the subsequent assertion of PeriphTalkReq to rescind it. There is actually no need for a distinct means of indicating PC talk state, as the nPcTalkGrant signal (fed back within the port device) coincides with this state. Talk state logic is inherently sequential, because PeriphTalkReq, nPcTalkReq, and nPcTalkGrant are all latched, thus avoiding indeterminate electrical states. The logic is:
nPcTalkGrant = nPcTalkGrant & nPcTalkReq | nPcTalkReq & !PeriphTalkReq.
Peripheral Talk State = PeriphTalkReq & !nPcTalkGrant.
PC Talk State is simply nPcTalkGrant.
Analyzer As Listener
As already stated, the analyzer is prepared to listen when the interface is quiescent. It does not need to be involved in the PC's negotiation for the right to talk. However, it (like the PC CPU) needs to respond to the first input byte, which it uses to set up DMA to receive the message. If the PC has requested to talk in order to send an ACK/NAK, the analyzer, which is presumably expecting this, also needs to respond in order to process the one control byte.
Polling would be an inefficient means for the analyzer to detect the first input, but the interface device can't tell the difference between the first and subsequent bytes without additional state memory. A full message byte counter would be overkill, but there really is no need for any state function in this case. Instead, the device only needs to provide a control bit to enable/disable interrupt. As the analyzer goes into listen mode, it sets this bit. Whenever this bit is set and an input byte (from PC) is latched into the device's analyzer data port (which is tri-stated until DMA or CPU reads it) the device will not affect a DMA exchange but will, instead, simply assert the IRQ and allow (if the CPU doesn't always have this right) the CPU to read the data. In servicing this interrupt, the CPU may clear the interrupt control bit to allow all subsequent bytes to transfer by DMA (whether the CPU actually does this depends on the purpose of the input byte).
Depending on device details, it will probably be safe for the analyzer to simultaneously set the interrupt enable bit while clearing the PeriphTalkReq when it finishes transmitting a message. However, it can also change these bits individually if needed to prevent the possibility of low-level hardware timing conflicts.
SOFTWARE
Implied Ack
Because communication is half-duplex, neither the PC nor the analyzer will send a message while receiving one. Even if the receive and transmit programs are independent, they have to coordinate their use of the parallel port. Before a pending message can be transmitted, the program must determine whether it first needs to send an ACK/NAK reply. Consequently, assuming that the programs are designed correctly, ACK can be replaced by a data message; that is, a data message is an implied ACK. If a an error is detected in a received message, the receiver must send a NAK reply; but if there is no error then the receiver may send either an ACK or its own data message. The benefit of implied ACK is improved efficiency with no increase in program complexity.
General Program
The analyzer and PC parallel port driver programs are similar, differing mainly by minor signal differences. In the quiescent state, the port is configured for input. Either unit can become the talker by negotiating with the analyzer's port device (Jack's chip). The other unit will receive the first byte without CPU intervention; i.e. a quiescent listener is not involved in the other unit's negotiation to become the talker.
Upon receipt of the first byte, the quiescent listener's CPU is interrupted. If the program is expecting a reply then if the byte is an ACK or a legal message length (between 3 and 0xF7) then the ACK is recorded (allowing the last output message to be discarded); if the byte is a NAK then the NAK is recorded. If the program is not expecting a reply but the input byte value is greater than 0xF7, it is an error.
When the first byte of an input message is clearly erroneous (it could be legal but still wrong) the program engages an error recovery process in which it receives and discards subsequent input either for a specified time period or until there is an inter-character time gap exceeding a specified threshold. Either way, the point is to not trust the first byte but to just let the sender exhaust itself and then send a NAK reply.
If the first input byte is a valid ACK (an ACK when one is expected) the program engages the ACK process, which advances the transmit queue, but remains in the quiescent listener state, i.e. ready to receive with an interrupt on the first byte. If the byte is a valid NAK, the program disables the receive interrupt and engages a retransmission process, which is essentially a normal message transmit. If the byte represents a valid message length (valid value and expected) the program engages the ACK process and then enters the active listener state by:
At the DMA input terminal count interrupt, the receiver:
The program enters the transmit state when either the transmit queue has a message or a received message needs a reply. Both of these conditions may exist at the same time. Further, the transmit state may be engaged even when the unit is already an active listener by an independent process inserting a message into the transmit queue. The transmit process, therefore, has to intelligently decide what to do as follows:
At the DMA transmit terminal count interrupt, return to the quiescent listener state by:
EMAIL FROM MAHA 6/1/00
2. Report 7, SENSORS, states that you would like to include 3-pin jacks for
in line sensors, 12 on the RPM and 8 on the LPM.
a- The final LPM design you approved earlier did not include those.
Are they still needed ?
b- Are they needed on the RPM?
3. Report 7, SCAN CHAIN REGISTER BYPASS, You mention the need of a 3-pin
'bypass' jack between each unit, 'unit' meaning an 8 bit register. Counting
my PLD as 1 unit, we will need 9 jacks.
Items 2, 3, and 4 additional sensors, will require 25 3-pin jacks that are
additional, not including the other jacks that are crucial to the
functionality of the board, which is only 5x 12.
I am suggesting to reconsider the necessity of all those jacks, as the board
might not fit or become too crowded.
4. Report 7, IN_LINE FLUID SENSORS: It is my understanding that we will not
include the optical sensors in the RPM, and I am providing a 3-pin jack for
each of those two sensors. Please let me know of any further requirements
that need to be included on the RPM concerning those sensors (i.e. optical
circuit and the 4-pin jacks).
If drawing #9631160 is needed, please provide a copy because Jill could not
retrieve it here.
5. Report 7, Shear Valve circuit & Report 15, Shear and Y Valve Control:
a- It seems that we will have 2 10-pin jacks for the 2 shear valves, using
only 5 pins of each. Is it possible we could use 1 10-pin jack instead?
b- With regards to the old shear valve 7-pin jack pinout, pin 5 is assigned 2
different PLD outputs (CW/CCW). Also, pins 6 and 7 will require me to take
the CWACK and CCWACK out of my PLD logic and assign them as outputs.
I am not very familiar with the overall design, therefore I'm questioning the
efficiency of including an interface for the old shear valve. It will cost us
additional PLD I/O and 4 extra jacks that will consume a large board-area.
Please let me know what you think.
6. I still don't have the means of bringing the following inputs into the
board:
SVx_OC, SVx_EN, YVx_OC, YVx_EN
David's Reply 6/1/00
Item 2.
The are only four in-line fluid sensor jacks. Report 7 may be a bit confusing. There are 19 general-purpose inputs plus four in-line fluid sensors. The distinction means little to the implementation, however, as the interface hardware is the same for either type of input. The report mentions the difference only to explain the derivation of requirements. Report 7 suggests that these 23 inputs can be served by three 8-bit scan chain input registers, one on the LPM and two on the RPM. Subsequently, in a meeting documented by report 10, we decided to divide the LPM into two identical general-purpose boards with no specialized inputs. Quoting from report 10 topic LPM and RPM sub-topic Partitioning, " ... we will move the strobed fluid sense circuitry to the RPM. The four wires from the two left front panel sensors will be routed to the RPM. This reduces the LPM to a simple block of 32 solenoid controls and 16 general inputs. It also puts all specialized I/O, except for vacuum/pressure control, on the RPM."
The only reason for including 16 inputs on each LPM is to be able to support all general input requirements, including the in-line fluid sensors, from boards mounted on the front panel, thereby eliminating the need for sensor wiring to cross from the RPM to the front panel. Although 32 inputs is obviously overkill, one 8-bit register on each LPM would only provide 16 total compared to the 23 needed. This situation shows why I wanted the bypass capability that you refer to in item 3. We not only have an obvious extra register in the LPM pair, but we might have more. It may prove better to have some general-purpose inputs provided by the RPM to avoid having right front panel sensor wires crossing over to the LPM. The best wire harness topology can't be predicted without having a specific instrument design, but our goal at this point is to provide general-purpose capabilities.
The answer to question 1a is that the LPM requires 16 general-purpose inputs with jacks as described in report 7. The answer to 1b is that, ideally, the RPM would also provide 16 similar inputs, but 8 would be better than 0, and 0 would still be acceptable. If the RPM provides any general-purpose inputs, the bypass jacks would be even more desirable here than on the LPM because our current thinking is to support all of the inputs from the one or two (depending on the instrument) LPM boards.
Item 3:
You don't need a bypass on your PLD, because it will always be used. The bypass jacks are .1" by .3", so 24 of them consume .72 sq. in, which is 1.2% of your board. If this presents a problem, we can bypass in groups, for example two or three registers per jack. Another possibility is to provide a binary progression of bypassing; i.e. one jack bypasses half of the registers; another bypasses half of the remaining registers; etc. Either of these solutions is acceptable.
Item 4:
3-pin jacks with the following treatment (as described in report 7) is correct:
Item 5:
a) Replacing the two half-used 10-pin new shear valve jacks with one fully used jack to support two shear valves would complicate wiring if two shear valves were actually used. The single 10-pin cable is a standard item while the shared cable would be unique. However, this problem is less significant than not having enough board space. If the concern is the dollar cost of the connector, we should use two. If there is a real estate problem, we should use one.
b) This is a nomenclature misunderstanding. CW/CCW is a single signal used to control the old shear valve's direction. When CW/CCW is high and ON is high, the motor turns clockwise. When CCW is low and ON is high, the motor turns Counter-Clockwise. These two signals are controlled entirely by program logic and need only simple scan chain outputs. CWACK and CCWACK are simple scan chain inputs, whose meaning is interpreted by software.
Item 6:
SVx_OC and YVx_OC are the shear valve and y-valve over-current detector signals generated by on-board circuits. SVx_EN and YVx_EN are configuration signals produced on board from two-pin jacks. The signal input to the PLD is pulled up to 5V by a 10K (e.g.) resistor but pulled down to 0V when a jumper is installed.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 19
6/7/2000 - 11/2/00
[Report 18] [Report 20]Fax from David McCracken to Ed Sewall 6/7/00
In my review of the existing MPM circuit, I wasn't sure about the best way to convert the differential signal riding on a high common mode voltage to a level that we could easily measure with either the CPU-timed single-slope or an off-the-shelf ADC. I suggested that a current amplifier could handle the CMV but would lose the signal in its input offset current variation. I assumed that any standard diff amp would require a split supply and reasoned that we may as well use the existing MPM circuit, whose major difficiency is needing a split supply. A circuit suggested by Linear Technology has subsequently come to my attention that seems to be ideal for this application. Two halves of an LT1884 amplifier form a voltage diff amp that measures +/- 2.5V Vdm riding on a 42V Vcm. The amplifier requires only a single-ended 5V supply. I imagine that other amplifiers could substitute for the LT1884. Will this work for us? I am sending a copy of the circuit to you at fax number 972-518-7633.
X AND Y-VALVE OVERCURRENT DETECTION
Fax from Maha to David 6/7/00
This is concerning the YVx_OC signals:
The Right Panel schematic shows the circuits for those signals originate
from the L298 SNS pin.
According to the design review, the Y-valve will have anL293D driver, which
does not allow a current sensing feedback from the motor.
Please advise how you want to accomplish that.
Also, according to the comparator circuit, the sampling resistor is a 1k
1/8W. Coleman and I think it's necessary to know the valves' motor load in
order to figure out the wattage needed on that resistor.
David's Reply
Your assessment of the L293D is correct. I suggested it as an alternative to the L298 because it is smaller and easier to mount on the board than the L298 and contains integral output diodes. I didn't notice that it does not have any convenient means to sense output current. I recommend using an L293E, which affords the size and mounting advantages of the L293D but not the integral diodes. However, it does provide sense/power pins similar to the L298 and it also provides a full 1A output compared to the 600mA limit of the L293D, which was also a potential concern.
Regarding the sense resistors, they are 1 Ohm, not 1K. I didn't look at this before, but you are certainly right. At the 4A maximum (total package but it could still be just one motor) output, the resistor would experience 16W. The shear valve motor draws less than 1 A except at a stall, so I think that we could get by with no margin. But we wouldn't want to use less than a 16W device in order to avoid compounding the problem of a locked mechanism by ruining the RPM. We clearly don't want a 16W resistor. I would recommend changing this to 0.1 Ohm, in which case a more reasonable 2W resistor could be used. However, this would develop a sense voltage of only 400mV at 4 A. The input offset voltage of the LM339 is approximately +/- 7mV and the two inputs should track each other fairly closely, so I think the comparator would operate correctly down this low. We should get Ed Sewall's opinion on this issue.
Fax from David McCracken to Coleman 6/8/00
John has gotten feedback from service (Dave Bannister and Marcus Reid) regarding a solenoid winding test as discussed in system design reports 16 and 17. The most common failure mode is a sticky plunger due to corrosion and/or contamination. It might be possible to detect a jammed plunger (or even one that just responds slowly) from the change, or lack thereof, in the reactance of the solenoid as the iron plunger moves into the coil's field. However, our DC drivers do not afford a means to test this phenomenon. Since we can't detect the most common failure, we see no advantage to testing the solenoid windings at all. Therefore, we want to abandon any further effort toward this capability.
11/4/00
[Report 19] [Report 21]The interim system design includes two versions of the APU board, one with four gather channels called APU1 and one with seven channels called APU3. The APU and MSM boards are similar in that each has a 68340 CPU that executes our new script language, communicates in the master-slave network, an controls I/O through a scan chain. The APU is specialized for gather and includes the necessary analog and mixed-signal circuitry for that task. The MSM is specialized for mechanical control and includes the stepper coprocessor (FPGA) and a motor scan chain.
Both APU1 and APU3 contain design and implementation errors, particularly in the scan chain and master-slave communication interfaces. Some of the physical units have been extensively modified to test design improvements, which have also been incorporated into the MSM design. We expect the MSM to remain reasonably stable for some time due to the greater diligence exercised in its design and implementation. The APU's CPU, memory, communication (including both I/O scan chain and master-slave), and basic system design have been examined and tested sufficiently to be reasonably sure that the amended design will be acceptable. The entire gather mechanism up to the DMA transfer into memory has received little scrutiny and is likely to contain design errors and, in any case, is inadequately documented. Also, we want to add two features to the design, a hardware-based gather decimation capability and a serial temperature sensor network interface.
DESIGN OUTPUTS
John V has determined that, due to the likelihood of a fifth channel being required in order to support impedance, the APU should contain no fewer than five channels. This is so close to the seven-channel APU3 that he and David McCracken have concluded that little benefit derives from continuing to support two different boards distinguished only by channel count. As with all decisions, this is subject to review, but at this time our goal is to design one APU board with seven channels. All experimental instruments, including those representative of lower classes (e.g. 3000 or 1200) can include a fully populated APU. Initial production units may include depopulated boards to save cost. For long-term production, new boards can be designed with fewer channels.
The gather sub-system requires complex wiring and, possibly, relatively simple circuitry off the APU board itself to support the analog front end, as discussed in System Design Report 12 (see [cdxsys12.doc] or [cdxsys.doc]). This aspect of the design needs to be solidified.
A complete analyzer will also require completion of the RPM (Right Panel Module), LPM (Left Panel Module), and VPM (Vacuum/Pressure Modules), all of which communicate with the APU via its I/O scan chain. None of these fall under the scope of the APU hardware design, which only needs to provide a functioning scan chain. RPM and LPM control software already exists as part of the script interpreter. Special VPM control software will be developed when VPM hardware becomes available.
TESTING AND VERIFICATION
DESIGN ANALYSIS
The interim APU design must be carefully scrutinized. Even functions that appear to work and have been reported as working are suspect. For example, cell counting was reported as having been demonstrated, tested, and verified, yet a thorough analysis revealed that the design contained an inherent error source that was not present in the design from which it was supposedly derived. Further, except by detailed analysis, the error would not be evident in any functional test, yet would cause an unacceptable counting error of as much as 5% (see [Reports.doc-BadCounting] under report 37 topic Analyzer Program Size and Functionality). System Design Report 7 topic Right Panel Module sub-topic Interim Design Errors [Cdxsys.doc-InterimDesignErrors] discusses the many design errors found in just the shear and Y-valve control circuitry on the interim RPM-- a design that was supposedly finished. That discussion should be reviewed both to suggest what kind of errors might be expected and to demonstrate the sort of analytic methods that can be used to find and correct them.
ANALOG TESTING
In addition to analysis, the APU's gather circuitry should be tested at two levels, the analog front end and the mixed-signal and digital control mechanisms. Running real blood under laboratory conditions is the only sure test but it is unrepeatable and can only provide statistic verification. For hardware and software development we need repeatable stimuli.
The analog front end is tested best using a signal generator, which can provide a signal envelope closely approximating what we see in real cells. The minimum acceptable stimulus would be provided by a single-channel arbitrary function generator. With this we can test:
Multi-channel stimuli would afford a more realistic test, particularly of the threshold circuitry. Three characteristics that could be tested more thoroughly with multi-channel stimuli are:
MIXED-SIGNAL AND DIGITAL TESTING
A digital signal generator could test the strictly digital circuitry using a square envelope to simulate cell signal input to the analog front end. This approach risks hiding real problems while creating artificial ones. The unrealistic amplifier overdriving could mask mid-range instability or skewing problems. Saturating the amplifiers could create abnormally slow response times. The steep slope of digital signals could produce temporary instability (ringing) in the amplifiers' outputs, which in turn create metastable conditions in the threshold detectors or ADC. On the other hand, a real problem of instability in the presence of slowly or non-monotonically changing inputs would not be revealed. The arbitrary function generator could address some of these problems. However, an alternative that affords the same ability as a digital signal generator to deliver large test vectors while avoiding some of the problems of very unrealistic front end signals is to use a multi-channel analog signal generator.
8-channel ADC boards for a PC are commonly available. Typically, these afford only 12 bits of precision, which, while less than the 16 bits of our circuitry, should still be sufficient for testing. We are mainly interested in not having every event saturate the amplifiers. Unfortunately, these boards operate more slowly than the maximum speed of our circuitry. To do a full-speed test we either have to use digital signals and accept their limitations or find or build a faster analog pattern generator. Software Implementation Report 52 topic Gaussian Distribution Simulation discusses one approach to building such a device [Reports.doc-GaussianDistributionSimulation]. One motivation for building instead of buying an instrument is that we can tailor it to exactly our needs. Another is that we anticipate providing this capability to field-service. A commercial instrument meeting our most rigorous requirements would be large and expensive. John V, David McCracken, and Jack decided to initially use a commercial 8-channel, 12-bit ADC PC board.
DOCUMENTATION
APU analysis, design, and testing should be documented in a manner consistent with the system design reports. David McCracken has written most of these so they tend to be rather literary. However, Mike Y's System Design Report 12 [Cdxsys.doc-Report12] demonstrates that other engineers can also document in this style. Less literary documentation, such as diagrams and spreadsheets, are acceptable as long as they can be referenced by a report that can be merged into cdxsys.doc. However, it is difficult to capture design rationales with visual documentation, although Jack has produced some diagrams that clearly demonstrate the differences between alternative designs. Ed Sewall's explanation of the motor winding test (k:\cdx\doc\msm\windtest.doc [..\msm\windtest.doc]) is an excellent example of how to effectively use both text and diagrams in one document.
Another alternative could be to present simple comparison lists. This could be particularly effective for analog designs. For example, Mike Y claims to have improved the DC offset of the amplifiers in APU3 compared to APU1. A table of comparative results of simulation and/or testing could document this.
Another particularly analog issue is the rationale for choosing components. To a lesser extent this is also true of digital components but analog components have many more parameters, many of which are nearly irrelevant to any particular use. A correct design will surely have considered all of the relevant parameters. Simply listing these and their requirements in each instance will substantially help long-term maintenance. It would also be reasonable to note the motivation for the more important parameters. For example, "this op amp must have an input bias current under 3uA because it is driven by a PMT with 10K output impedance".
The best test of documentation is to pretend to know nothing about the design in question and being tasked with, for example, determining whether a component can be replaced by a similar one; debugging a functional problem; adapting the design for a similar application. Documentation is adequate only if its author can honestly say that he wouldn't mind receiving it.
APU3 SCHEMATIC
This is the primary input, but the design, particularly the digital portions, cannot be trusted. In addition to any corrections discovered by further analysis and testing, the following changes are required:
DOCUMENTS
PEOPLE
11/9/00
[Report 20] [Report 22]MSM BIRTHING
MSM ICD32 TEST MACRO
COLEMAN'S 11/6/00 RESPONSE TO MSM ICD32 TEST MACRO
(1) FPGA use of A0 -- As far as the CPU to FPGA interface is concerned,
A0 is not used because all transfers are on even addresses.
At one time the FPGA used A0 to control access to the SRAM because
the number of bytes accessed from the SRAM (controlled by the
data written to the EC memory) is not always an even number, so the
FPGA was reading the SRAM byte by byte.
But, this has been changed so that the FPGA now always reads
a word from the SRAM and then determines which byte to use
internally.
So the external A0 signal could be eliminated.
(2) CPU access to SRAM -- I believe that you are correct to use the internal
16-bit DSACK with no waits to control accesses between the CPU
and SRAM without involving the FPGA. When the FPGA gets the bus
to access the SRAM it does not require any DSACK signal.
1/22/01
[Report 21] [Report 23]
The interim design ostensibly provided a complete system, including two intelligent units (APU and FPM), VPM, status board, left and right panels, etc. However, none of these were fully functional or complete and they were based on poor general architecture. We have been able to patch some of the boards to serve as breadboards for developing a more usable system design. The MSM is the only board designed to our more rigorous (e.g. actually working instead of pretending to work) standards.
Using modified APU boards and a hand-wired prototyping board, we have been able to develop the basic IMB (Inter-Module Bus) unit program, including IMB, host (data station), and I/O scan chain communication, and to improve the gather capabilities of the interim APU. New requirements for the APU as well as the need for more physical boards (especially ones less fragile than the heavily patched ones) dictate at least one more iteration before a prototype analyzer can be built.
The general architecture has been corrected and a new left panel board has been designed. Redesigns of the VPM and right panel board are in progress. The tower board has been subsumed into the MSM. No work has been done on the Status and Loader boards and, although the interim boards are incompatible with our new architecture, with considerable patching they could be used.
Coleman and Dave Billman have been verifying the hardware. Complete verification is feasible only after integration with the software that has been developed on the APU. This software was originally developed for the CD3200 CPUDCM; porting to the APU took less than a week. Given the substantial similarities of the APU and MSM, porting itself should be quick. However, the program doesn't currently support stepper control, as we were waiting for the MSM to provide hardware for testing. The script compiler also is involved in this, as its motor and ramp message generation remain untested and we want to change the motor position reference from signed to unsigned for greater consistency between literal and VAR-based moved commands. The script debugger will also be extended to support interactive stepper control for system development and testing.
In addition to motor control code, the basic program will require a change in use of the second serial port. Currently, the program assumes that the second port is used as a downstream (slave) IMB link, for example for the APU to communicate with the MSM. However, the MSM uses it for communicating with the barcode reader. This uses an RS232 interface with clock extraction from the data, which is significantly different from the IMB interface. It may not be feasible to fully test this hardware until the program has been written.
It is unlikely that all of this new software development can be completed in much less than four weeks. Until it is completed, the hardware design may be in flux.
UNRESOLVED ITEMS FROM SYSTEM DESIGN REPORT 20
System design report 20 [Cdxsys.doc/ApuRedesignInputs] discusses many of the issues involved in APU redesign. Some issues need additional clarification and discussion.
CHANNEL COUNT AND PERFORMANCE
System Design Report 20 specifies that the next iteration of the APU will contain seven channels. John's theoretical instrument contains eight. The difference between these two forms is actually much greater than the channel count alone indicates. The 7-channel form was proposed as a CD4000-class instrument with only one laser and no more than five channels simultaneously active. The 8-channel form with two lasers would, at times, operate all channels simultaneously and could experience as much as 60% greater coincidence due to conversion times. The total coincidence increase would be less than this, because conversion time is only part of the total and whether conversion time results in any coincidence is determined by the time between cells.
Coincidence due to clumping can affect accuracy, but coincidence due to conversion time does not necessarily hurt performance. It only reduces the amount of list data; cell count is not affected. A large increase in conversion coincidence could force method developers to compensate by slowing down the cell stream and increasing the count time. Alternatively, we could use a faster ADC or ping-pong two slower ones, but there has to be some limit to hardware improvements, as we can never achieve zero conversion time. For example, if we decided that 5% potential data loss were reasonable then, by watching the difference between the raw count and list data count during instrument development, we could determine whether we needed to take remedial action.
CDNext is intended to support the entire range of CellDyn instruments. It would not be cost effective to provide simultaneous eight channel sampling at the low end, but if we don't provide it in the next APU, we will not be able to even test it. A reasonable approach would be to provide the capability in a parameterized form in hardware, firmware, and software, as we are doing in the stepper motor control sub-system. If we could scale down simply by recompiling FPGA code and the CPU program, one very high end APU could serve to validate any reduced form. Then, reduced APU boards would serve only to reduce instrument cost.
Assuming that the next APU iteration does support eight simultaneous gather channels, we will need to be prepared not only to detect a coincidence problem but also to remedy it. We should investigate the speed range of available ADCs and determine, as part of the new design, whether they are drop-ins for each other or if other components have to change. We should also review the possibility of ping-ponging two relatively slow converters. The FPGA gather control code should be written to be easily adapted to faster conversion rates or structural variations (e.g. ping-pong).
REQUIREMENTS
The interim APU's gather control is based on the CD3200, which has only the one optical flow cell. Both the CD3500 and CD4000 would have served as better models, since they support the asynchronous impedance cell. We may want to review them for ideas, but most of the characteristics are fairly easy to derive from requirements.
The assignment of channels to groups should not be fixed but programmed on the fly based on definitions in the analyzer configuration (analyz.ini file). Given that channels are interchangeable, it would be reasonable to not require the system to support interleaved groupings, for example channels 0, 3, and 5 to one group and 1, 2, and 4 to another. However, sparse groupings should be supported. Just as the interim APU supports a gather definition such as channels 0 and 2, the new design should support, for example, channels 0 and 2 as one group and 3, 4, and 6 as another.
It would be reasonable to limit the number of groups to three. All existing instruments have at most two and John's theoretical instrument would be the top of the line. The groups can be identified simply as 0, 1, and 2. If it would reduce FPGA complexity, channel assignment could be monotonic, that is all channels in group 1 are higher than those in group 0 and all in group 2 are higher than group 1.
The current analyzer program's gather process supports decimation in software and we have discussed a hardware version. The hardware solution affords much better performance. The software alternative is provided only to reduce FPGA requirements if necessary. There is only one concrete example of decimation, in the RBC-PLT gather. Method developers are unlikely to discover a need for simultaneous decimation in two groups in the near future. Therefore, it would be reasonable to provide this only for one group at a time. We probably could further restrict this to a specific group (0, 1, or 2) but given that the channels that constitute a group is specified dynamically, it seems unlikely that such a restriction would appreciably reduce complexity.
There is no reason to support simultaneous multiple gather types, such as RBC and WBC. CDNext abstracts gathers to simply a combination of channels, count limits, and decimation factors. A name is assigned to this combination only to identify it in flow scripts and to the data station. Algorithms in the data station ascribe meaning to the channels irrespective of gather name and are free, for example, to use some channels as RBC indicators and others from the same gather as WBC.
Each channel group represents a single cell event. CDNext requirements specify that an event trigger occurs when the input signal to each channel exceeds its dynamically programmed threshold. Inactive channels have a threshold of 0, thereby always satisfying the condition. Extending this to multiple simultaneous groups is conceptually obvious but possibly somewhat complicated in execution. Each group needs its own trigger. In the single-group interim APU, each channel's threshold comparator output could simply provide one input to an AND gate with 0 thresholds in effect masking channels that didn't participate in the trigger. With each comparator routed to three AND gates, the 0 threshold masking technique cannot be used, because it would allow unrelated events to interfere with each other. If a channel were assigned to one group and participated in that group's trigger then it would automatically participate in the triggers of the other two groups by virtual of its non-0 threshold for the group in which it actually resides. Consequently, some other masking technique is required. Conceptually, each of the eight comparator outputs would be routed to the three (8-input) AND gates through three (2-input) OR gates. To prevent a channel from participating in a group's trigger, the CPU would write a 1 to the connecting OR gate, making the channel appear to always exceed its threshold.
When a group's threshold trigger condition is satisfied, all of its channels' analog values are captured. At this point, ADC conversion can begin if the ADC is not already being used for another group. It would not be useful to interleave conversion of channels of coincident groups, as this would complicate data storage and couldn't reduce analog hold time (in order to reduce droop) since the total number of channels holding at once can't be reduced by any means. We don't want to simply discard coincidental events, as this would exacerbate coincidence. Consequently, each group will have to wait its turn for conversion. The worst case droop occurs in the last converted channel when all three groups simultaneously detect events. This is a product of the overall number of active channels and would be the same even if all channels were assigned to one group. Worst case droop could be slightly reduced by remembering the order of events. For example if an event occurs in group 2 followed by group 0 then 1, when group 2 conversion finishes, it would be better to process group 0 before 1. More importantly, groups should be processed on a first-come-first-served basis.
CHANNEL GROUP TAGGING
Asynchronous channel groups must be identified (tagged) by the gather mechanism. This obviously means to write something into memory, but it isn't obvious whether to interleave the tags with the list data itself or to write the tags into another area of memory. Using another area could save a substantial amount of memory, because list data memory must be consumed in 16-bit words in order to keep the data aligned for the CPU. However, it would require the gather controller to maintain a second address (essentially another DMA). It could also significantly complicate the job of the CPU's DMA parse function, which manages the list data's circular DMA buffer. Tag memory would require another circular buffer, unless list data and tags were produced in lock step. This doesn't naturally occur. There is no reason to write a tag for every channel but only for each group. The relationship between the two DMA buffers would depend on the number of channels assigned to each group and the relative event rates of the groups.
Jack has suggested a use for the memory that would otherwise be wasted by tagging in a unified DMA buffer. He has modified the interim APU's FPGA gather code to measure the width (time) of the event in order to filter cell fragment noise by aborting the capture cycle when the event is too narrow for a real cell. The measurement could also be useful as an indicator of clumps when the event is too wide. He has proposed storing this measurement in the unused portion of the tag word. However, we need to answer why we wouldn't just use the upper as well as the lower width bound to immediately abort the capture, i.e. a band pass instead of a low pass filter. We know we can't use clumped events. Saving them and their width would seem to just waste time (and likelihood of conversion coincidence) and memory.
Another possible use for otherwise wasted tag memory would be to store timing information. One approach would be to store an absolute time stamp with every cell. Only two bits are needed for the group tag, leaving 14 for a time stamp. This could be relative to the start of gather. With the standard gather duration of seven seconds, this would afford 427 usec resolution. Many cells might have the same time stamp but this resolution is nevertheless adequate for detecting flow errors and declining rates. However, we don't actually need this information for each cell. The APU program's gather process reads the DMA destination address every half second (this is the default-- the actual sampling rate is programmable) when it reads the raw count. Since cells in the DMA buffer are in order, (linear) interpolation between the two sample times affords an adequate per cell time stamp. Having the FPGA time stamp each cell would afford one advantage. The current gather program is based on a direct correlation between the DMA address and cell count. If we use the single DMA buffer for both group tags and list data, this will no longer be true. The FPGA could count cells in a CPU-readable register to address this problem, but it seems that this would be almost as much work for the FPGA as time stamping each cell.
A variation of the absolute time stamp would be to store the time between events. Assuming that this was done in a way to avoid error accumulation, the advantage of this is that the FPGA could use a smaller timer, e.g. 8-bit instead of 14. This approach could also provide greater resolution, although 427 usec seems adequate. The only way to determine the absolute time of any cell is by parsing the entire data stream up to that cell. This would increase software effort if a snapshot scheme were used to reduce communication bandwidth or storage requirements.
SOFTWARE ISSUES
From the software viewpoint, the unified DMA buffer is significantly better than separate list data and tag buffers. The existing program to manage only one circular buffer is quite complicated and required several weeks just to debug. Wasting 14 bits of every channel group word may be a reasonable price to pay for limiting the program's complexity.
As already mentioned the existing list data time stamping by interpolating between snapshots would not work correctly with a unified buffer. There are at least three possible solutions to this.
Decimation by software requires a significant effort even without the added complication of multiple channel groups. The existing DMA parse program implements several parsing strategies, optimizing by number of channels and whether decimation is required. Even if only one channel group is decimated, all groups are affected because the program can't tell which data to discard without parsing the source stream. In effect, if one group is decimated, all groups pay a significant price in performance. If decimation is implemented in hardware, the program's complexity can be significantly reduced and performance increased. Whether hardware or software decimation is used impacts the DMA data stream handling in ways not apparently related to decimation. If the program has to parse the data stream for decimation, any additional parsing functions, such as to compress the channel group tags or convert count snapshots, have relatively little performance impact. If, on the other hand, decimation is done by the FPGA then any parsing functions change a simple copy process to a much more expensive parsing process.
Unless all of the sources are in series (which could be the case for two optics benches but is definitely not the case for the impedance cell) each group will need its own raw cell counter. Note that this is not the same as list data cell count. The interim APU uses two of the CPU's timer/counters for this purpose. Only one is really needed, because the 16-bit counter can be extended to 24 bits using its prescaler. However, assuming that one of the timer/counters would be freed up, software has already taken one for generating the IMB clock when not communicating with the data station via HSSL (contradicting system design report 17 topic MsmTimerCounters [Cdxsys.doc-MsmTimerCounters] and Conclusion 6 but for good cause). Consequently, the CPU could provide the counter for only one of the three channel groups.
Jack has proposed that the FPGA provide the raw cell counters. This is reasonable given that the FPGA controls the raw cell trigger. Although the CPU could provide one of the counters, the program would be simplified if the FPGA provided all three. As described in software implementation report 29 [Reports.doc-CountingMechanism] an accurate counting period is achieved by having the CPU's timer ISR start and stop counting on a delay derived from the timer interrupt. This consumes little CPU time and is quite accurate. There is no reason to burden the FPGA with the entire process. It is sufficient for the FPGA to provide a CPU-readable 20-bit (minimum would be 18) counter for each of the three groups. The counters are clocked by cell events but started and stopped (i.e. gated) under CPU control. Starting a counter could automatically clear it.
DATA STATION INTERFACE
The native IMB interface is essentially HSSL and the APU can talk to a HSSL-equipped data station. However, unless the PC's HSSL card is redesigned for PCI (instead of ISA) it will become increasingly difficult to find PCs that can support HSSL. We now use the APU's CPU expansion bus to connect an ECP mezzanine board, but we don't want to generally depend on this approach. Including native ECP and USB on the APU board would afford a less expensive and more reliable solution. During the development phase, it would be very convenient for all three interface types to be natively available. For initial production, the same board could be used but with the unused interfaces not populated to save a little cost.
There is no reason not to always provide the HSSL interface. We only need to include RS422 receivers and transmitters and a connector to take advantage of the UART that already exists in the CPU. We don't envision using the UART for another purpose at this time.
There are two unresolved questions concerning USB. One is whether to design for USB version 1 or 2, which is ten times faster. Version 2 interface components are now available and should be plentiful and inexpensive before we go into production. The specification and components are not as mature as for v1 and we currently have no means to test a v2 interface. However, v2 PCI cards should soon become available and we can always operate v2 components under v1 protocol and speed conditions. A v2 design would entail more preparatory effort than v1 for which we already have a design, components, and support (e.g. Phillips prototyping board) from interim APU3. We also need to consider the cost difference between the two at our expected production time and whether the v1 components may be obsolete at that time.
The second USB issue is the fact that we have not developed any software for it, either in the APU or data station, and don't know how easily (if at all) it fits the model suggested by HSSL and ECP. From a driver point of view, HSSL and ECP are very similar to each other and very different from USB. We proposed USB because it is and will continue to be the most common PC interface that appears (there is some question about how effectively the bandwidth can be utilized) to support our list data requirements. Rather than implementing it on the APU under the assumption that we can develop the software, we might want to implement it on a mezzanine card. However, even if the APU ends up with an unused USB interface, if unpopulated it would consume only a small amount of board space and not interfere with developing mezzanine-based alternatives.
The ECP interface presents several unresolved design issues. We have a functioning design that is implemented in a relatively large and expensive CPLD (Altera EPM7064). One way to reduce the real estate would be to put the logical functions into the main FPGA, but we may be pushing this beyond its capacity. Further, this approach would require additional buffers, as we can't expose our main FPGA to the hostile environment of the ECP interface.
Another issue relates to the question of moving the design into the main FPGA. One of the problems with the EPM7064 is that we feel compelled to use a package that can be socketed in case it is destroyed by the current surges and high static voltages that can occur in the interface. We need to determine how best to protect whichever programmable device implements the logic. It may be that the only safe approach is to use separate buffers that are both more robust and easier to replace, in which case safety of the programmable device becomes moot. We might also determine that schottky array voltage snubbers afford reasonable safety for an EPM7064 (socketed or not) but not for the main FPGA.
The only other problem of the ECP is that our current design has not functioned reliably with all PCs, laptops in particular. As explained in system design report 18 topic Parallel Port Link Design [Cdxsys-ParallelPortLinkDesign] we are not adhering to the standard ECP protocol as designed by Intel, because it causes both firmware (CPLD or FPGA) and software (especially PC driver) to be inefficient. Our implementation has been tested on desktops ranging from a 90 MHz Pentium (Dell), 200 MHz Pentium Pro (Dell), 450 MHz Pentium II (one Dell and two Compaqs), 800 MHz Pentium III (Soyo motherboard), and 900 MHz Athlon (on Tyan and Asus motherboards). The 800 and 900 MHz PCs experienced an occasional operating system (Win95 and Win98SE) bug in the form of a spurious DMA terminal count interrupt. It is possible that Intel's inefficient ECP protocol slows down most drivers sufficiently to keep the OS bug from being exposed. It may also be that most drivers don't use the ECP's DMA because Intel's ECP hardware design is so bad that DMA is nearly unusable anyway. Our driver's ISR detects and corrects this bug.
We have tested the ECP on only two laptops, a Compaq (John V's) and a Toshiba Satellite 2805-S201(two new ones). The Compaq did not work at all. Diagnostics in our (Windows VxD) driver revealed that hardware failed to automatically generate a required signal when switching to ECP mode. No other computers that we tested exhibited this behavior, but Compaq may have designed its ECP hardware to depend on side effects of the standard protocol. The Toshiba computer occasionally generated an impossible FIFO count of 128; our ISR had to read the FIFO 128 times to clear it even though there was no input and the FIFO can't hold more than 16 bytes. Our unique ECP protocol can't possibly be the cause of this problem. It is possible that the Toshiba's problem is caused by our replacing its standard Windows ME, for which we don't have a driver, with Win98SE and Win95 (both were tested). ME has a uniquely intimate relationship with its hardware, e.g. the infamous WinModem, and there may be hardware-OS dependencies that no one (except possibly Microsoft by design) knows about.
We may want to find at least one laptop that works with our design before committing the APU to it. Although our design fits into a 7064, Intel's less efficient protocol might require a larger device. However, there are pin-compatible versions with more internal resources.
VPM, RPM, LPM, STATUS PANEL, LOADER BOARD
VPM
The Vacuum Pressure Module represents a unique situation. It is located on the APU's scan chain like dumb I/O yet requires significant local intelligence. We have considered two general approaches to its implementation, in an FPGA or in a specialized micro-controller. The FPGA easily handles the scan chain requirements but is over-kill for the control functions, being able to effect them hundreds of times faster than necessary. We pay a price for this extra performance; the FPGA costs more than a low-end micro-controller, which could easily do the control functions, and designing the FPGA requires more specialized skills than programming the controller (for these simple functions). No general-purpose low-end controller can provide the flow-through shift register scan chain interface, but certain specialized ones can. For example, a low-end DSP can simulate the scan chain using its UARTs, DMA, and very fast interrupt service and program execution. Unfortunately, programming the DSP requires specialized skills roughly equivalent to those needed to design the FPGA.
I (David) have shown that an Analog Devices ADSP2105 can provide a scan chain interface, but no one else on our team has experience with this device. Meanwhile, Coleman has made significant progress implementing the VPM in an FPGA and we have decided that this will be our direction.
There is a third approach that we haven't discussed because the required components have only recently become available. Both Cypress Semiconductor (through its subsidiary Cypress Systems) and Atmel are releasing low-end micro-controllers with on-chip FPGA. The Atmel AT94K looks especially promising, because its CPU constituent, the 8-bit AVR, is relatively inexpensive and because it comes in a variety of FPGA sizes down to 5K gates, which is appropriate for our scan chain. We wouldn't have to pay for unused capability as we do in both the pure FPGA and DSP approaches.
At this point, it may be best to stay with the components and tools that we know, which means finishing the FPGA-based VPM. However, although the VPM represents a unique situation in the current instrument configuration, other similar instances may appear in the future. We should at least be aware of the two alternatives for this situation.
RPM
The Right Panel Module is our most poorly defined module. It provides the APU's scan chain access to all of the fluid controls located on the right panel of the instrument. This includes many special devices that have not been finalized, particularly shear valves, y-valves, and fluid sensors.
The interim RPM contains many technical errors. In fact, none of the special device interfaces function at all. The design was reviewed in System Design Report 7 topic Right Panel Module / Interim Design Errors [Cdxsys-InterimDesignErrors]. Based on this report, Maha redesigned the board, correcting both the original errors and errors in the review. System Design Report 14 topic Right Panel Module [Cdxsys-RightPanelModule] contains some of her findings and suggestions, particularly regarding the shear valve interface.
In hindsight, we spent too much time on the new shear valve interface. We aren't even sure that shear valves will be used and, if they are, the old circuit is adequate and needs only a few bits of the scan chain for control. However, the design work has already been done and the only penalties are a somewhat larger PLD and the drivers, which don't have to be stuffed. Maha nearly finished the complete redesign. Unless we find some problem that could be solved by eliminating the new shear valve interface, we may as well keep it. No matter what we do with the RPM at this point, it can't be optimal because we don't know what the final instrument configuration will be. Since any instrument needs the RPM, we should probably produce a limited number of the Maha's version, knowing that another redesign is likely.
LPM
The Left Panel Module is a redesign that is compatible with our new scan chain architecture and corrects the circuit design errors of the interim version. Given its simplicity and that it has been carefully redesigned, it should not have to change again.
STATUS PANEL
The interim Status Panel has the problems associated with all of the interim designs. It uses obsolete HC594, 75175, and 75174 components and its scan chain cabling is incompatible with the new architecture. It also exhibits unique problems. One is that it depends on a Dallas Semiconductor DS1267 serial dual potentiometer, which may not have a reliable long-term source. The other is that it wastes APU output scan chain bits. The DS1267 consumes two bytes of the scan chain to provide significantly greater volume and frequency resolution than the buzzer needs. It isn't clear that any frequency control is needed but, in any case, five or six bits would easily suffice for both, leaving six or four bits of two bytes free for buzzer on/off and the three LEDs. There are several ways to achieve such a division of two scan chain bytes. One would be to replace the DS1267 and HC594 scan registers with a PLD. An all-digital circuit in the PLD using PWM for volume control could replace the entire mixed signal buzzer circuit. This would require replacing the LM4864 audio amp with a fast H bridge driver, and the resulting total cost would probably be higher than the interim design.
We can tolerate losing one APU scan chain output byte but we can also tolerate a small increase in the cost of the Status Panel. More importantly, we don't want to waste design time on this module. If the DS1267 source is reliable then we probably should just update the interim circuit, replacing the obsolete components and rewiring the scan chain cabling. Otherwise, we should determine what the buzzer requirements really are and generate a more optimized design.
LOADER BOARD
References:
The interim Sample Loader Module contains the obsolete components and cabling of the interim scan chain design. It is also incompatible with the new architecture in unique ways. The items that need redesign are as follows.
The interim circuit contains four solenoid drivers, which don't have power down capability. No reason has been given for the absence of power down, but we can surmise that the designer was reluctant to use the 15V power for solenoid hold because it is the analog supply source (after stepping down to 12V to make it more stable). Clearly, the 12V output derived from the 15V supply should not be used for solenoids. We also don't want to use additional DB37 pins to beef up the 15V. The most effective way to use the DB37 pins would be to include only 5V, 24V and ground and derive the +12 and -12 analog as well as a separate 12V solenoid voltage from the 24V. The analog power requirements are low enough to be easily provided by inexpensive capacitor-only DC-DC converter/inverters. The 12V solenoid hold power is too great to be effectively generated without magnetics. It would be feasible to use the pins freed up by eliminating the +/- 15V supplies as 12V solenoid power, but this would not afford the most effective use. If the 24V supplies adequate solenoid activation current then it would be underutilized when the 12V supply took over. Efficient use of the DB37 pins is probably more important than solenoid power down, especially in the loader, which can easily dissipate the heat. However, we should at least investigate the feasibility of generating a 12V holding supply. In any case, it seems reasonable to replace the +/-15V supplies and linear regulators with capacitor DC-DC converters to reduce the power cabling requirements and increase the current capacity of the 24V (and possibly 12V solenoid hold power).
The tube detector circuit (two in the module) has been designed by Mike Y based on the CD3200 version. We need a written theory of operation and test procedure for this circuit.
1/26/2001
[Report 22] [Report 24]
The interim CDNext APU (APU1 and APU3) design contains many flaws and anachronisms and must be redesigned for compatibility with our latest hardware architecture. While doing this, we are taking the opportunity to reconsider the functionality and capability of the unit. The following items are under review.
These topics are all intertwined and overlap hardware, firmware, software, methods, and (hematology) algorithm domains. All of these domains were represented in this meeting.
LIST DATA MEMORY AND COMMUNICATION BANDWIDTH
Dave Billman asked why we would be concerned about wasting memory with channel ID tags. Assuming an instrument with three asynchronous groups gathering from eight channels and a full 16-bit tag for each group, the tags consume one of five plus one of four plus one of two words. Thus, tags consume 27% of the total list memory. In a single-pass method, which requires 100,000 cells, tags would consume 594,000 bytes out of a total of 2,200,000 bytes. The interim APU has 4,096,000 bytes of RAM. The application program plus stack and heap consumes about 600,000 bytes, script memory 50,000, and message buffers 130,000. The total memory consumption including list data and tags would be 2,980,000 bytes.
The existing APU, with 4 Mbytes of RAM, clearly can support the 27% list memory consumption of 16-bit channel group tags. The problem is communication and/or processing bandwidth. The most efficient (in term of CPU time) means for the 68340 to transfer DMA buffer data to the data station is to simply fill message packets directly from the DMA buffer without parsing. In this case, 16-bit tags would increase communication time by 27%. Alternatively, there are several ways that the 68340 could reduce the transmitted data. For example, it could simultaneously fill three message buffers, each with the data from a different group. One byte at the beginning of each message could identify the channel group, completely eliminating communication bandwidth consumed by tags. However, this could consume considerable CPU bandwidth. One of the two CPUs, the APU's 68340 or the data station's Pentium, has to parse the raw DMA data, but we would rather it be the 500 MHz (at least) Pentium than the 25 MHz (at best) 68340.
Software Implementation Report 27 topic Gather Status Report and Communication Optimization [Reports-GatherCommunicationOptimization] discusses communication bandwidth issues. Testing of simulated WBC list data transmission via HSSL showed that the analyzer cannot keep pace with a seven-second gather and only the storage capacity of the DMA and transmit buffers prevents data loss. ECP communication exhibits very nearly the same results, indicating that the bottleneck lies in the 68340's message generation (the CRC calculation is a prime suspect). WBC is cited as an example of communication issues because, unlike RBC, it involves very little parsing of the raw data. These results don't necessarily portend a problem. A 120 sample per hour instrument affords 30 seconds per sample, and sample handling overlaps processing by the data station. Further, these tests were done with a 16 MHz 68340. They do reveal that communication bandwidth may be a concern even if memory consumption is not.
Jack reported his observation of real RBC and WBC cells through the flow cell of the CD3200. The typical cell speed is 5 meters per second, the laser beam width .1mm. The cell is in the beam for 20 us and its 0-degree signal above the trigger threshold for 6 usec. He noted that the laser beam width can vary and the cell speed depends on the vacuum, which can also vary. Consequently, any use of cell width for noise filtering must use an adjustable width threshold.
In addition to being able to filter signal noise, Jack described one kind of coincidence problem that the width measurement can improve. When one cell is following closely behind another, it can lie in the other's conversion time "shadow". We have traditionally lumped all such shadowing together along with cell clumps as "coincidence". Jack pointed out even conversion coincidence alone has two forms, differentiated entirely by subtle timing differences; one in which the second cell is counted but not converted at all, i.e. no list data is generated, which is what we have always assumed; and one where list data is gathered at the wrong point in the cell due to signal sampling being delayed until completion of the conversion of the previous cell. In the latter case, the list data is entirely wrong. Jack explained that the width measuring process that he has created would report the second cell as not wide enough to be a real cell. Bodo and Diana both suggested that the width information would be valuable to hematology algorithms in this case but expressed concern that not generating the list data would deny the algorithms information necessary to properly analyze and adjust cell population counts. Jack argued that such data is erroneous and inconsistent and couldn't be of real value. Thus, it is better to force this situation to be handled like the other form of conversion coincidence, i.e. to not produce any list data. Bodo and Diana agreed with Jack's argument. Bodo also pointed out that the CD4000's slope-based event detector avoids this problem.
Mike Y argued that Jack's width measure, using the time that the size signal (0-degree in most instruments) is above the trigger threshold, does not yield an accurate measurement. A "small" cell in terms of signal size may, in fact, be nearly as wide as a "large" cell but will appear to have a smaller width because a smaller portion of its body generates a size signal above the threshold. He explained that accurate width is typically obtained by processing the same cell signal twice (using a delay line to produce a delayed image) using a relatively crude picture of the envelope from the first analysis to project where the cell really begins and ends.
David McCracken asked Mike whether a post-gather adjustment of the perceived width according to the height could improve the width accuracy. Mike explained that to accurately detect the trailing edge of the cell it is necessary to compensate for the height in the edge detector itself, i.e. in real time. Otherwise, the envelope information is lost and cannot be recreated. David asked whether it would be feasible to measure only the second half of the cell, using the position of the peak as the begin time and its height as a compensating factor in the trailing edge detector. Mike explained that the threshold and peak detectors don't provide very accurate time positions because of noise in the signal. He suggested that one way that might be used to generate a more accurate position of either the peak or the cell envelope directly would be with a very fast, relatively low resolution ADC possibly in combination with a DSP dedicated to analyzing the envelope.
Mike pointed out another possible pitfall of the width measurement. A typical good cell width is 5 um while a bad width is 2.5 um. Theoretically, we should be able to detect a two-to-one difference. However, the beam (or detector ?) aperture is typically 17 um wide. If the cell's width were measured by its time in the beam, there would only be a 15% width differential between these two. Jack had described his width measure as having a resolution of 125 ns over a typical 6 us event or approximately 2% (1 out of 48). Clearly, a 2% resolution can accurately reflect a 15% differential. However, it isn't clear that this is an apples to apples comparison. Jack claims the beam aperture is .1mm, five times Mike's 17um. Jack also claims a typical cell width of .6mm, 120 times Mike's 5 um. Also, the time that a cell spends in the beam is not directly significant in Jack's scheme, but rather the time that its size signal is above the trigger threshold. The two may be substantially different.
Diana and Bodo both expressed the opinion that cell width could be useful to algorithms. We don't have a lot of data for each cell and any extra can be valuable if it correlates to some morphological phenomenon. For this purpose, the width doesn't have to be accurate as much as consistent. Mike's technical objections can help us understand the limitations of Jack's current approach and suggest ways that we might measure its performance and improve it. That Jack's relatively simple scheme may prove adequate and that we have feasible means of improving it suggest that we should, at this point, plan to use cell width for real-time noise and coincidence filtering and to report it with the list mode data to the data station.
Jack reported observing 10 to 20 mV of noise in our APU gather circuitry. The 16-bit ADC yields a resolution of .1 mV over the 10 V signal range. Clearly, the 1.6 mV resolution of a 12-bit converter is adequate given this noise floor.
Mike explained that the CD4000 uses an 18-bit converter in order to avoid empty bins (at the low end) due to quantization when the linear measurement is transformed to a log histogram. This is not a good reason. First, an algorithm that can't deal with empty bins is flawed. For example, CD3200 valley finding avoids such dependency by using fuzzy logic instead of searching for an inflection point. Second, if the non-empty bin requirement is inescapable, it can serve no purpose to use noise to jiggle real values into all the bins when this random effect could be as easily realized using a random number generator in the data station (interpolating might be better anyway).
The CD4000's 18-bit ADC serves only as an example of what not to do. We originally agreed to the 16-bit ADC for CDNext because these parts are reasonably priced and fast enough for the originally proposed instrument, which would have no more than five simultaneously active channels. Now that we are considering eight channels, the conversion speed may assume added significance. However, this is difficult to determine. Mike Y pointed out that list data only needs to provide a statistically significant sampling of the cells. Diana and David McCracken argued that, while our current instruments have little trouble providing this for RBCs and WBCs, they typically do not provide enough PLT data. Mike countered that there just aren't very many PLTs. However, this means that we want to lose as few as possible to hardware coincidence. We don't know the extent to which this occurs. It may or may not significantly impact our ability to provide accurate results.
It would be a mistake to use Jack's noise report alone to limit list data resolution. The current APU boards are not designed correctly for high resolution. 16-bit data acquisition boards do exist even for operation from an ISA bus in a noisy PC, but they scrupulously shield the sensitive analog portions not only in the board layout but also in the environment, typically surrounding this area with a grounded can. The next APU design should pay much more attention to shielding. Whether we actually have a use for greater resolution than 12 bits is a separate question; we clearly have no legitimate use for noise.
An overarching goal of CDNext is to provide general solutions to problems throughout the CellDyn product line. A very high end analytical instrument may need greater resolution than a bench top clinical unit. Since we have suggested not using log amps, less than 12-bit resolution is the minimum acceptable. Given the inherent noise floor, greater than 16 bits is useless. Therefore, we only have to accommodate 12, 14, and 16 bits for the entire product line. The configuration of the first CDNext instrument is still not fully defined. At this point, it may be best to try to achieve the best resolution practicable. However, in any case we should try to parameterize the hardware, firmware, and software to support 12 and 16 bit data. We should at least review the alternative ADCs to determine whether they can be interchanged and how best to take advantage of their differences, particularly conversion speed.
During the meeting, Mike proposed transmitting to the data station list mode data with lower resolution than 16 bits in order to use some bits as channel tags. In this case, it would make no sense to use a 16-bit ADC. After the meeting, he elaborated a scaling scheme where 16 bit data is represented by its lower 12 bits if its upper nibble is 0 or by its upper 12 bits otherwise. One of four remainder bits (of a 16-bit word) would be a shift indicator and the remaining three could indicate the channel. It should be pointed out that, in general, we only need two bits to indicate channel group instead of specific channels, because the data station will know the channel constituency of each group in each gather type.
Regarding any formatting scheme that substitutes some form of scaling for a flat value, we should recognize that it is impossible to achieve the same range and resolution in a smaller space. Scaling affords higher resolution of small values at the expense of resolution of large values. Whether we can afford this sacrifice depends on whether we need the same resolution in all parts of the list mode data range. In fact, a case can be made for this sacrifice. We have experience with the low resolution of PLT data making it difficult to analyze the population, but we have little experience with high resolution higher in the data range. Only the CD4000 has had this capability and its algorithms do not use the multi-angle analysis or fuzzy logic that we are promoting for CDNext. Further, Bodo points out, it only uses 15 bits of its 18-bit potential. However, it is likely that a single-pass analyzer must have the full 16 bits throughout its range and, although we are not planning such an instrument for the first member of the CDNext family, we would like to be able to at least experiment with it. Consequently, depending on the instrument we may or may not be able to utilize Mike's scaling suggestion.
If data were transmitted as 12 bits, by means of scaling or because the ADC actually produces this output, three or four bits would be available for channel group tagging, but cell width would have to be transmitted in another byte and stored in the DMA buffer in a word. This affords no reduction in memory or bandwidth consumption. The width that Jack is currently measuring resolves only one part in 48, which can be represented by six bits, leaving two bits for a group tag in compressed (DMA width + tag word to byte) messages.
Mike Y addressed the issue of decimation not always being effective, suggesting an adaptive form that predicts a suitable gather ratio based on the ratio of cell species seen at the beginning of counting. This is similar to what the DmaParse function now does to preset the DMA counter when decimating. Mike's proposal was not further discussed in the meeting.
After the meeting, Diana and I (David McCracken) discussed whether decimation represents an unnecessary legacy function. The existing instruments suffer a common problem in mixed species gather (i.e. RBC/PLT). As soon as one species meets its count quota, all gathering ceases because continuing to count one and not the other would skew the relative counts. If one species substantially outnumbers the other, the less represented species would not have enough list data to form an accurate picture of its population. Decimation counters this.
CDNext implements an improved gather irrespective of decimation. Each gather type is described by a definition in the analyzer configuration file (analyz.ini) that includes the maximum count of each of two cell species. When either species meets its quota, the raw and list counts of both are recorded, but the laggard species continues to be gathered until it too meets its quota (or until timeout). Like decimation, this method provides an accurate relative count of the two species as well as sufficient (if possible) list mode data. Unlike decimation, it is inherently adaptive. It is also simpler than even non-adaptive decimation.
The only possible advantage of decimation is that it spreads the species selection over a broader time period. If the decimating process accurately predicts the ratio, both species are sampled over the same period. This could be significant if the blood is not uniformly mixed or if it deteriorates during the gather time. Thus, it would be superior to the quota system for gathering WBCs, which can experience a declining rate due to on-going cell lysing. The RBC/PLT aliquot is not lysed and the population should not change during the gather period. The CDNext gather definition provides decimation in addition to the quota scheme for generality, but, in reality, decimation is used only for RBCs and PLTs, which are adequately served by quotas. Testing the gather program showed that decimation consumes substantial CPU time (as predicted). At this point decimation (especially adaptive) is an unnecessary luxury that should be discarded.
Feb. 5, 2001
[Report 23] [Report 25]
GATHER SYSTEM DESIGN MEETING COMMENTS
Report 23 discusses Jack's cell width measurement experiments [Cdxsys.doc-CellWidth]. Diana warned that these may all be normals and that, in fact, any average or typical measurements would mostly reflect normal cells. Real-time use of cell width must treat abnormals identically to normals. If this isn't possible, the width may still provide valuable information, but it must be analyzed by data station algorithms in the context of a complete view of the sample population.
Report 23 discusses list data decimation [Cdxsys-GatherDesignReview] [Cdxsys-DecimationReview]. Although Gather Design Review item 5 states that decimation has been used "on most of the CellDyn instruments," Diana felt that the report might have, nevertheless, erroneously intimated that all CellDyn instruments do decimation. It is important to note that decimation is not a requirement and may not even be the preferred solution to the problem of mixed populations.
Report 23 conclusion 7, that decimation will be entirely eliminated in favor of the quota system, is based on the assumption that a burst sampling of the over-represented cell type is as good as periodic sampling [Cdxsys-DecimationSampling]. Diana reviewed this analysis and concurred, adding that decimation works only if there is a large separation between two populations, which is only true for RBC and PLT. Unlike RBC/PLT, WBC gather is subject to declining rates due to on-going lysing, but there is no real-time means of distinguishing between WBC cell types. Only complex algorithms based on an overview of all list data can provide do this.
The topic of extended count times was not discussed in report 23. Because it can be effected entirely in software, it does not impact our system and hardware design, but Diana felt that it was relevant to the discussion of decimation. The intent of both is to gather sufficient list data of an under-represented cell type to form an accurate picture of the population. In the case of decimation, we assume that there is a sufficient number of all cell types but one population dominates the count. Extended count is used when not enough cells are seen in the normal counting period. The two situations may occur together, for example if the PLT population is very small; or alone, for example a sample with sufficient numbers of both RBC and PLT but a high percentage of RBC, or a Leukopenic sample, which simply has few cells of any type.
We have always been concerned about how extending the count might complicate parallel operations. For example, since lysing requires an incubation time, we count RBCs while WBCs are lysing. If the RBC count period is extended, the WBCs will be lysed for a longer time than normal because they will have to wait for the flow cell(s) to become available. A brute force solution would be to reduce sharing by duplicating fluidic components. John's proposed two-laser system represents a more subtle approach, taking advantage of the separate paths to optimize for the specific cell types expected on the paths. However, this still requires extra hardware and can't necessarily be applied across the entire product line. A less expensive solution is to predict whether the count period needs to be extended before parallel operations are started. This costs nothing but is unlikely to yield the fastest throughput in all circumstances. Another potentially useful technique might be to finish processing one sample even while starting the next as long as there is no hardware conflict. Existing instrument programs may not be able to handle such overlap but CDNext will be able to if it could be useful.
Clearly, the question of count extension must be answered at some point in the design of any particularly instrument. For generality, at this point we are simply developing support for both multiple asynchronous paths and adaptive scheduling and not suggesting that either one be used in lieu of the other.
In the meeting described by Report 23, Mike Y mentioned that the CD4000 provides a special analog front end for gathering PLT. Supposed RBCs, as indicated by their signal strength in one channel are amplified by one gain while PLTs are amplified by a larger gain. Nearly all of the gain and threshold circuitry is duplicated as well as a portion of the conversion and storage circuitry.
Jack examined the CD4000 circuits that Mike provided and compared them to the interim APU. He found that the APU does provide some separate circuitry like the CD4000 but not separate gain control. He explained that separate gain control could be valuable, not only to provide better resolution in the PLT range, thereby reducing the significant quantizing that we see in the 3000 series instruments, but also to improve triggering on PLT. Increasing the gain of small events might eliminate one of the major cell width problems that Mike warned about.
Initially, Jack and I (David McCracken) felt that the lack of separate RBC and PLT gain control in the current APU renders any other separate circuitry useless. An analog upper threshold detector would serve no purpose if every event above the lower (PLT) threshold were converted regardless of the upper threshold. However, if we are decimating (or using the quota system and the RBC quota is reached before the PLT) the FPGA could use an upper threshold to determine whether to convert the event, thereby reducing hardware-based coincidence. The FPGA could reduce coincidence without help by converting the reference channel and then using its value to determine whether to convert the remaining channels, but the analog detector would afford skipping even the reference channel conversion.
Considering the importance of PLT (number one requested improvement) and that the main problem seems to be insufficient list mode data, anything that improves gathering them should not be underestimated. The analog upper threshold detector clearly can reduce coincidence if used properly. A separate gain circuit can also improve the chances of getting good PLT data. Thus, the CD4000 approach appears to be best. However, it is not the most efficient. Jack and I suggest a more efficient architecture, in which the upper threshold detector is used to turn off a gain multiplier (similar to a counter prescaler). The lower threshold detector alone determines whether to gather the event. The upper serves only to enable the gain multiplier on signals below the programmed upper level and to indicate to the FPGA whether the event is small or large. The FPGA can be told to convert all events in this channel group, only the ones below the upper threshold, or only the ones below. The main gain and threshold circuitry is not duplicated.
It is likely that we will want to boost the gain of all channels in the RBC/PLT group. The gain multiplier approach is significantly less costly (in board space and components) than full analog path duplication but it is not free. Therefore, it should be provided on no more than four channels. To take advantage of the improved ability to gather RBC/PLT data, these channels must be used. One channel can be dedicated as the upper threshold detector but, of course, it can be attached to any measurement detector. It can turn the gain multiplier on in the other channels in the group as well as its own. This should be selectable, as a particular instrument might not assign all four channels to RBC/PLT. The upper threshold itself needs to be programmable and the FPGA must store the scaling bit with the data in the DMA buffer.
If this mechanism can be successfully implemented, it can substantially reduce ADC requirements in terms of both speed and resolution; speed because our primary motivation for increasing speed is to reduce the effect of RBC coincidence on PLT list data; resolution because we need greater resolution mainly to avoid quantizing in the low end while still covering the full RBC/PLT range (we may still want a large range plus resolution for a one-pass instrument).
Jack will explore this idea more fully and determine whether it has any serious flaws. It would also be helpful for Mike Y to review and critique this suggestion.
APU AND MSM HARDWARE DESIGN REVIEW
David McCracken and Coleman reported finding the following bugs in the MSM.
Coleman reported that the APU's CPU configuration, which has served as the model for the MSM, incorrectly (for the MSM) configures CS3, the FPGA chip selector, as requiring external DSACKS, which report the size of the target device. DSACK should be internal byte. David McCracken will correct this in the MSM BIOS program (as well as in ICD32 macros) and, after also applying the fix for the RAM byte write problem, retest NoHau at 25 MHz.
Coleman questioned the significance of the CS1 RAM access configuration indicating 16-bit internal DSACK. This is confusing and none of us could clearly interpret the Motorola 68340 hardware manual's explanation. However, we know that bytes can be individually read and, after Coleman's fix, written. Therefore, we can assume that the 16-bit internal DSACK simply means that the device supports 16-bit access, not that it exclusively supports 16-bit transfers. Presumably, it would be the programmer's responsibility to not attempt to access bytes from a device that supports only words. In fact, this is a problem that Jack and David recently had to address. Jack had implemented a word-wide (actually 12-bit) control register in the APU's gather control FPGA and was unable to write correctly to it using the script debugger's direct memory write function, which used a byte stream to write to multiple-byte locations. The script language was changed to support word and long streams.
BYTE WRITE FAILURE
After the meeting, Coleman, Dave Billman, David McCracken, and Robert D investigated the reported bugs. Coleman determined that the RAM byte write problem was caused by RAM control signals not returning to inactive states. The CPU and FPGA can both access RAM, as explained in system design report 14 topic Msm; Bus Mastering [cdxsys-BusMastering]. Five control signals, OE, UB, LB, and the two OE (one for each RAM chip) are shared by wire-or, which requires passive pullup to the inactive state. This requirement was met with resistors R156, R192, R193, R113, and R157. These were 10K, which did not provide a fast enough signal return to inactive. Thus, some of the signals from the program read cycle remained active in the subsequent write. When the resistors were replaced with 1K, the problem was corrected. This also appears to have corrected the memory access warnings from ICD32 on dump and fill commands.
7-SEGMENT LED WRITE FAILURE
David McCracken investigated the NoHau 68340 BDM debugger write to 7-segment LED failure at 25 MHz. Changing the FPGA/IO CS3 DSACK to 8-bit internal (see MC68340 User's Manual p. 4-32, 4.3.4.2 Address Mask Registers) as Coleman suggested, did not correct this problem but the resistor change did.
BDM DEBUGGERS WITH 25 MHZ 68340
It should be noted that NoHau's EMUL300 68340 BDM debugger does not work correctly at 25 MHz in the configuration that we have been using for CD3200 CPUDCM and CDNext APU development. We have been using the POD connection NoHau calls "via BERG connector", which doesn't require a separate clock connection from the BDM adapter to the target. When this is used at 25 MHz, single-stepping actually double-steps. The CPU correctly executes the apparently skipped instruction, but the result is very confusing. This is what P&E Microsystems said would happen with their old ICD32 adapter. However, they also explained that the problem could be corrected by replacing the 500 ohm timing resistor in the adapter with a 220 ohm (or buy their new Rev. E adapter). NoHau technical support says that we have to use the "direct (with CLKOUT)" configuration. Unfortunately, this requires the yellow wire on the pod to be connected to the CPU's CLKOUT signal, which complicates attaching the debugger. Fortunately, the MSM (and future APU) CPU bus extension connector provides this signal (on pin B20, which has been pointed out with a white mark on some boards). For any future boards (including the APU in progress) this signal should also be brought to a one-pin (two if that is easier) jack near the BDM connector.
FLASH MEMORY ACCESS
I (David McCracken) mentioned in the meeting that flash memory did not appear accessible. The byte write problem was a factor, but fixing this did not correct the flash problem. The flash problem resulted from a hardware-software incompatibility and an unfortunate sequence of events. First it should be pointed out that virgin Atmel flash parts can be written at will, subject to the obvious sector limitations. Thus, Coleman and Dave Billman were able to initially demonstrate that the flash could be written and read. Our programs lock the flash after writing it to prevent its being accidentally overwritten. In my investigation of the RAM byte write problem, I tried to load the BIOS into flash, because this process would have the CPU read its program from RAM but write (bytes) to flash, not RAM. Writing to flash failed, perhaps because of the dangling control signals, but the handshake that the program used to open and then close the flash to write succeeded, locking out the sort of testing that Coleman and Dave had used but not leaving behind any evidence of success.
After the dangling control signal problem was corrected, the flash still did not appear to be written even though the BIOS write program superficially appeared to be executing correctly. The problem was that the MSM flash chip is an AT29C020, which has a 256-byte sector, while the programs were written for the APU, whose AT29C512 has a 128-byte sector. Using NoHau to program one (128-byte) sector at a time revealed that the first 128 bytes were correctly written and then promptly wiped out by writing the second 128 bytes in the same (256-byte) sector. Previously, I looked at only the first few bytes of the flash to confirm that writing had occurred. Looking at the second half-page would have exposed this problem without resorting to the debugger.
The program writes to flash in two distinct ways. The BIOS program writes itself into flash and the application program writes "nonvolatile" data piece-meal into several pages reserved for this purpose. We already were using a 256-byte page in order to support the 16-bit flash pair in the CD3200 CPUDCM. The 128-byte and 128-word programs are different not only because one writes the page using two successive 128-byte transfers while the other uses one 128-word transfer but also in the use of a byte-wide magic handshake vs. word-wide (to open up two devices at once). The new 256-byte mechanism shares elements of both of the older ones. Now we have three mechanisms. The program only supports one type, selected at compile time by definition in analyz.h, like most target-specific configurable features. This is as far as the program can be extended without extensive changes.
To reduce the variety of hardware design languages we use from VHDL (Coleman), AHDL (Jack), Verilog (Dave Billman and the rest of Dallas) we agreed to try to converge on Verilog. This places the greatest burden on Jack, who has experience with VHDL and AHDL but not Verilog.
Jack will investigate the possibility of using Verilog for designing Altera devices, both the larger main ones used in the APU and MSM and the small devices like 7032, '64, etc. Even if we have to continue using AHDL for the smaller devices, Jack needs to become familiar with Verilog in order to collaborate with Dallas. Robert D and Dave Billman will provide information about Verilog classes in the San Jose area that could provide appropriate training for Jack.
Robert D brought up the possibility of alternative logic devices, mentioning that Xilinx has offered some very aggressive pricing to compete with Altera. Since our designs will be done in Verilog, we should be able to move readily between the two companies (this is another reason to get away from AHDL). However, we need the appropriate fitter programs even if we are using Cadence programs for generic Verilog design and simulation. It would be appropriate for Robert to investigate the Xilinx tool situation.
ECP ON APU
David McCracken expressed concern about the drain on main FPGA resources to implement the ECP in it. Coleman responded that anything that fits into a 7064 (as does our developmental ECP) would not meaningfully impact our larger FPGAs. There is still a question of pin usage, however. All of the CPU interface signals are already available to the FPGA, so only the cable and buffer control signals would consume additional device pins. The cable interface comprises 8 bi-directional pins, 4 outputs, and 4 inputs. If all latching is done in FPGA registers, the only additional pin is for direction control. This would consume 17 I/O pins, which is fairly significant. The impact of this was not discussed.
The MSM's master interface hardware supports only the HSSL-like IML (Inter-Module Link). However, for off-line development of instrument modules, we would like it to support ECP and USB. We intend to provide this with adapters that plug into the CPU bus extension connector. We have used this approach to add ECP to the interim APU. The interim connector is not compatible with the MSM's but the next version of the APU will match the MSM, so they will be able to share any new master interfaces that we develop.
David stated that the MSM's FPGA, mapped to 0x00500000, would interfere with the APU's mapping of the ECP interface chip. Jack argued that the ECP, mapped to 0x00570000, would not interfere. Reviewing MSM.ICD, we see that the actual FPGA/IO mapping is 0x005xxxxx, so there is a conflict. If the MSM doesn't require the full range that it has been given then its range could be reduced to clear the conflict. Alternatively, the ECP could be remapped.
Coleman suggested using CS2 to select the ECP. If the APU's ECP is folded into the FPGA then CS3 would still be used for the APU, but this would present no problem to software. The APU and MSM programs would see the ECP at different addresses but, while very similar, the programs already differ in a number of ways, particularly address mapping. CS2 is brought out to the MSM's CPU bus extension connector. This represents a good general solution to mapping new master interface hardware. Non-native hardware will be selected by CS2. Native hardware will be selected by CS3 through the FPGA. Addresses will be determined at compile time; i.e. the source code can be shared but not the executables (which would be the case for other reasons anyway).
APU NATIVE MASTER INTERFACES
Unlike the MSM, we don't expect the APU to use its HSSL master interface for a production instrument. It is too slow for the increased volume of list data and it requires the data station to have an ISA connector. At this time, we are not sure whether ECP or USB will be used. David McCracken suggested that both ECP and USB be native, because neither consumes much board space and their relatively little cost can be reduce to practically nothing by not stuffing the components. Dave Billman, who will be working on the APU's USB interface design, agreed and will be looking into the components and circuitry needed for USB2. David McCracken presented Philips ISP1581 USB2.0 controller as evidence of hardware availability. This particular device would be used in the PC, but Philips is likely to be producing a peripheral interface device as well. Cypress Semiconductor has announced an interface controller, the SX2, which should be usable on the peripheral end (APU).
Robert D asked about the speed of USB2, to which David McCracken erroneously replied 100 Mbits/sec. It is actually 480 Mbits/sec. For a single APU system, USB1 is adequate, even given that its inefficient pseudo-interrupt protocol severely degrades performance in our application (where the peripheral generates a lot of unrequested data). We are considering a multiple APU system in the future, for which USB1 is unlikely to be adequate. ECP cannot support such a system because of cabling and protocol restrictions. If a USB2 interface is provided on the APU, a PC with only USB1 support will still be able to talk to it.
In keeping with the decision to access native master interface hardware through the FPGA, the CPU should control the USB interface through the FPGA, leaving CS2 for exclusive use by hardware attached to the CPU bus extension connector.
Dave Billman agreed to investigate the various ADC options, ranging from half-flash 12-bit to our current 16-bit successive approximation. He will also review the ways that replaceable conversion circuitry might be provided, including pin-compatible components, component sockets, and plug-in module. David McCracken indicated a preference for the latter because it affords the greatest flexibility at only a slight increase in cost. However, it does require a small board and, therefore, requires more design and development effort than pin-compatible components.
We are only concerned with hardware variation, as we can easily reprogram the FPGA and software. However, there may be significant variation in the ADC access means, for example a parallel-accessed half-flash vs. the serial 16-bit part. Some variations may be difficult to accommodate efficiently.
The speed of the channels' analog front ends, particularly the sampling mechanism, may be a limiting factor. None of us at the meeting felt qualified to analyze this aspect. The limit is a relatively straightforward question that Mike Y should be able to answer. If he is not available, we would like to get Ed Sewall's analysis. In fact, because of Ed's proven documenting and analytical ability, we would like to get his analysis of the entire analog front end. However, Robert suggested that he might be tied up with other assignments for a while.
Coleman reported that he is nearly ready to simulate his FPGA-based VPM design. Therefore, we all agreed not to further consider alternatives at this time. The status of the RPM and LPM boards is unclear. Robert D offered to check on the LPM and volunteered Coleman to check the RPM. David McCracken offered the opinion that the LPM has already been laid out but provided no evidence of this. The RPM will require functional review. We left off on this with Maha perhaps still unclear about the strobed fluid sensors. David and Jack can review the RPM and suggest how to finish the design if Robert or Coleman can provide the latest documentation.
In system design report 22, David McCracken expressed concern about the long-term availability of the Dallas Semiconductor DS1267 dual serial potentiometer and also about how the Status Panel circuit overall misuses the APU's I/O scan chain. Robert agreed to investigate the availability issue. John said that, while volume control is needed, there is no frequency control requirement, so a single potentiometer would be acceptable. John suggested that the Status Panel is not important enough to take up design time. David suggested that it could be made compatible with the new scan chain by patching, but only if the dual pot consumes a multiple of 8 scan chain bits. It turns out that the part consumes 17 bits and, therefore, cannot be used without some other shift register on the board consuming 7 bits instead of 8. This is not a consequence of the new scan chain design but of the interim design simply being wrong in the first place.
During the meeting, John suggested that the status board is not important enough to spend time redesigning. David volunteered to provide Johnny Lo with instructions for modifying the Status Panel to make it compatible with our new scan chain without redesigning the circuit. Given the new information about the DS1267's scan chain bit consumption, this is not feasible. The circuit must be redesigned or not used. John had suggested that the Status Panel doesn't provide any critical function, but it does provide the panel switch. We can easily add the switch to any position on the APU's scan chain.
This is an ideal situation to demonstrate the value of the scan chain prototyping board. It can easily stand in for the Status Panel, consuming a jumper-selectable number of bytes in both the output and input scan chains. The prototyping board provides a flow-through scan chain interface but this can easily be rewired to match the loopback of the Status Panel. We can start with just the panel switch and add functionality later (this would be a good project for a less senior engineer).
As with the Status Panel, John suggested that we are too busy with more important boards to do much with the Loader Board. However, we need it in order to control a loader. Also, since it is located at the end of both the MSM's I/O and motor scan chains, it affords the only opportunity to implement scan chain feedback registers.
Unlike the Status Panel, the Loader Board contains substantial circuitry that we want to keep and we don't want to redesign. Replacing this board with a prototyping board would require hand wiring the tube detector circuitry, which would be more difficult than reworking the interim board's scan chain interface. David McCracken volunteered to develop a set of instructions for Johnny Lo to adapt the interim board. The changes (from most to least likely to be implemented) comprise:
3/13/2001
CONFIGURATOR SELECTION MISSING RESISTOR
The BIOS program needs to know whether to wait for the configurator to program the FPGA or do the configuration itself. The CPU has no means to determine whether the configurator or a Bit Blaster is connected. Instead, we have to tell it via PA3 (CPU pin 133) which is connected to JU3 pin 1. JU3 pin 2 connects to ground, so installing a jumper grounds PA3. However, opening the jumper floats PA3. A 10K resistor should be added so that PA3 reads 1 if JU3 is open or 0 if closed.
When the BIOS starts, it will configure PA3 as input and read it. If it reads 0, the FPGA configuration program will be called. Otherwise, the BIOS will wait for the FPGA to indicate that it has been configured by EPROM or Bit Blaster. It signifies the completion of this process by bringing both DONE and STATUS high. The CPU reads these on PA0 (CPU pin 130) and PA1 (CPU pin 131).
RAM OVERWRITE DURING FPGA CONFIGURATION BY CPU
RAM Overwrite
Often during FPGA configuration by the CPU, static RAM contents are changed. This should not happen. One of the motivations for providing the CPU with access to RAM independently of the FPGA (via PLD U7 A1010) is to make it possible for the CPU to configure the FPGA by executing a program in RAM. Obviously, it can't do this if the process itself changes RAM contents. While there are several feasible workarounds for this particular situation, we currently can't explain the phenomenon and, therefore, can't say that it won't occur in less tolerant situations.
It is unlikely that this problem is rooted in software, because it doesn't always happen. Further, it appears to occur less often at certain temperatures. If the MSM unit is maintained between 74F and 75F, the FPGA configuration process can be repeated many times without incident. When the board is colder or warmer by even a few degrees, the problem is prevalent. However, experiments also suggest some kind of system memory effect, because after bringing the board to the ideal temperature and configuring the FPGA without RAM overwrite, the phenomenon occurs much less frequently at other temperatures as long as power is maintained. If power is turned off, the problem returns and again exhibits temperature sensitivity.
Adding to the complexity of this phenomenon is what the logic analyzer tells us about it. It shows that the RAM write signal, NWE_SRAM from the PLD (pin 5 of U7) does not go low or even glitch while the CPU configures the FPGA. Deliberately writing to RAM by program and interactively through the BDM (using ICD32) both show that the logic analyzer is properly configured to trigger on NWE_SRAM going low. In other words, the RAM can be written without an active write signal when the FPGA is being configured.
The CPU also throws in some anomalous behavior, for example, generating an external write (RNW measured at J11 pin A11-- it might be relevant that the CPU pin was not tested) when the FPGA configuration program writes to Port A. This is not supposed to happen when the CPU's Module Configuration Register (MCR) SHEN1/0 bits are both 0, as they are automatically at CPU reset (see MC68340 User's Manual section 4.3.2.1 p. 4-21). All four possible bit combinations for the two bits were tested, in case of documentation error, with no effect. However, this hardware bug does not contribute to the RAM write problem, because the logic analyzer shows that no RAM write signal is generated.
Test Program
To help investigate this problem, the ICD32 macro fpgaram.icd was developed. The CPU can't execute the configuration program from RAM even for testing, because the random program changes cause the CPU to behave erratically, sometimes actually writing to RAM in what appears to be the cause rather than the effect of the original problem.
To support testing this problem and increase our normal options, the FPGA configuration program, originally from the APU, has been divided into two parts, the program and the FPGA image. The program takes an optional argument (in A1) that tells it where to find the FPGA image source. If A1 is 0 the program uses the image bound to it when it is built.
The fpgaram.icd macro first loads fpga.s, the normal configuration program with FPGA image, into RAM. Previously, fpga.s was loaded into RAM and copied from there into flash using the CopyToFlash program, which is part of the ftools suite of utilities loaded into flash when a unit is first initialized. Thus, fpga.s in RAM is identical to its counterpart (in fact its clone) in flash. The program is position-independent and can execute in any location. After loading fpga.s, fpgaram invokes the ComputeChecksum (another ftools utility) to calculate a simple checksum of the FPGA image in RAM. The result appears in D0.L. The macro pauses (using the capture file trick) to let the user record this value. Then fpgaram.icd invokes the FPGA configuration program in flash passing it the address of the FPGA image in RAM. Execution is unaffected by any changes to the program in RAM. When the configuration program completes, the macro recomputes the checksum of the FPGA image in RAM. If this doesn't match the original value, RAM has changed.
fpgaram.icd only tests for change to the FPGA image. To test for changes to the program, the ICD32 VERIFY command can be used to compare the entire program in RAM to the fpga.s file from whence it came. VERIFY reports the earliest point of change, which appears to always be at x8007 (the program begins at x8000) when there is any change.
There is an ulterior motive for fpgaram.icd besides testing this problem. Testing new FPGA firmware using the CPU for configuration is easier than programming an EPROM or using a Bit Blaster, both of which require equipment and unautomated procedures. Even if both the configuration program and the FPGA image have to be first programmed into flash before the CPU can configure the FPGA, this still affords a faster and more convenient approach than the other two. However, if the FPGA image in RAM could be used, the process would be even faster and less wearing on the flash.
When it became apparent that the program plus image in RAM was not feasible, fpgaram.icd was developed to test whether it would be possible to embed only the configuration program in flash and use RAM for the FPGA image. The test in fpgaram.icd doesn't fully answer this question, because the FPGA seems to function correctly even when RAM is overwritten (as long the program executes from flash).
Dumping the RAM and flash FPGA images (these are captured to the 340stat file, which is subsequently trimmed in Brief to create matching files for comparison) after configuration causes a checksum change shows that only the last byte of the image is changed. This would suggest that perhaps the change occurs only after the program has consumed all of the image. However, this is not consistent with the fact that the fpga.s program can also be overwritten even when it is not read.
Absent some explanation for this phenomenon, we have to assume that it is not safe to read the FPGA image from RAM. Therefore, all FPGA configuration (ICD32) macros, other than fpgaram.icd, first program the image into flash.
Confirmation
Coleman recreated the test using the same software but with a different MSM board and repeatedly found that the RAM was not damaged. I (David McCracken) repeated the test on another board and found it to also function correctly. I tested the originally bad unit again and it continued to exhibit the problem. The bad board has many wires attached for probing and these may be contributing to the problem. Both the good and bad boards were connected to the same power supply, but the bad board was mounted near other boards while the good one was more isolated mechanically.
MSM CONFIGURATION METHODS AND TOOLS
The MSM affords considerably more FPGA configuration flexibility than the APU. As with the APU, the FPGA may be configured by Bit Blaster or by CPU. The MSM's FPGA additionally can configure itself via a serial EPROM chip. Also, because the MSM can access its RAM without first configuring its FPGA, a new unit can be bootstrapped without a Bit Blaster (or pre-programmed flash). Even if the problem of RAM being overwritten while configuring the FPGA can't be corrected, having viable RAM and flash allows us to copy general-purpose utilities and an FPGA configurator program into flash through the BDM before the FPGA is configured.
HARDWARE PREPARATION
Each of the three FPGA programming methods needs to drive the FPGA's clock (pin 155), data (pin 156), and configure (pin 105) inputs. Note that the current MSM schematic shows an incorrect FPGA package pinout; the first pin on each side going counter-clockwise should be 1, 53, 105, and 157, not 1, 55, 107, and 161.
The Bit Blaster absolutely does not work when the EPROM configurator U5 is installed. The CPU may be able to configure the FPGA even with U5 installed by connecting Bit Blaster connector J6 pin 7 to ground (J6 pin 2 or 10), as this should force U5 outputs to high impedance. Considering the problems we already have with RAM overwrite, it is best to be conservative and assume that U5 must be removed for FPGA configuration by CPU as well as by Bit Blaster.
Only the CPU's BIOS program reads PA3 to determine whether to invoke the configuration program. All of the ICD32 macros are hard-coded to either invoke it or not, depending on the purpose of the macro. Therefore, whether JU5 is open or closed has no effect on the macros. If JU5 is open, the BIOS will not try to program the FPGA, so whether the configurator is installed or the Bit Blaster connected are immaterial in this case. If JU5 is closed and, therefore, BIOS configures the FPGA, a connected Bit Blaster probably doesn't interfere, because the Bit Blaster tristates its outputs when not configuring the FPGA.
METHOD SELECTION
Each of the three methods has advantages and disadvantages. The CPU-based method requires the least amount of hardware-- the ICD32 BDM debugger, which is needed anyway for hardware testing. It also can be used to reduce production costs and improve reliability by eliminating the separate programming and stuffing of U5. However, it is currently unproven and the RAM overwrite is potentially a problem. Both the CPU and Bit Blaster methods are slower than the configurator, not only in the configuration process itself but also in ancillary procedures. The CPU method may require a separate configuration phase (like the APU) for debugging sessions and the Bit Blaster requires multiple manual steps, including interaction with Altera's control program (a free but mostly crippled version of MAX Plus II). Unlike the APU, a virgin MSM can be brought up without a Bit Blaster. However, the Bit Blaster can still be useful as a known reliable method that doesn't require burning EPROM configurators or writing to flash.
To sum up method selection:
PROGRAMS
The FPGA fitter produces three files that define the program image. These contain identical information but in different formats suited to subsequent operations. The EPROM programmer accepts the pof file; the Bit Blaster control program accepts the sof; and our own programs convert the ttf file to forms that can be incorporated into the CPU method. The EPROM programmer's use of the pof file and the Bit Blaster controller's use of the sof file are obvious and require no explanation or support.
The procedures and tools for using the ttf file are proprietary and require explanation. The explanations and many of the batch and macro files assume that the FPGA program file is called MSMFPGA.TTF and that files derived from this have the same root name but different extensions. All batch and macro files can be edited to change this assumption, either using just a different root name or changing the root as well as the extension of derived files. Programs that cannot be simply edited make no such assumptions.
Ftools
Most of the programs listed above are either editable text files, which can easily be understood by inspection or compiled/assembled/linked programs, which should be treated as black boxes. Ftools is unique in that, although it is an assembled/linked program, the utilities that it contains can't be used without knowing some internal details.
The utilities are invoked directly by address. To provide some stability without unnecessarily restricting program changes, vectors at the beginning of the program point to each utility. Whether by other compiled/assembled programs, by ICD32 macros, or by ICD32 command, the utilities should be called through their vectors. Only UnWriteProtect, which unlocks flash for any subsequent writing, has no vector. The following excerpts from ftools.asm explain how the current tool set is used.
; Execute from:
XDEF ConfigureFpga ;00807E00 Flash
XDEF CopyToFlash ;00007E04 RAM
XDEF CopyTools ;00007E08 RAM
XDEF ProtectFlash ;00007E0C RAM
XDEF ComputeChecksum ;00007E10 RAM or flash.
XDEF BgndHere ;00807E14 Flash
XDEF LoopHere ;00807E16 Flash
XDEF UnWriteProtect ;00007E18 RAM or flash.
FLASH EQU $00800000
FPGALOADER EQU FLASH+$8000
LAST_SECTOR_BYTE EQU 255 ; 127 if 128 bytes/sector like AT29C512.
; 255 if 256 bytes/sector like AT29C020.
BeginTools:
; Use vectors to allow programs to change without affecting addresses.
; CopyToFlash/doCopyToFlash and CopyTools/doCopyTools must be operated from
; RAM because they write into flash and the flash can't be read while being
; written. doCopyToFlash is copied into flash, because it might prove useful
; in an emergency (it would be copied from flash into RAM for execution.
ConfigureFpga:
BRA doConfigureFpga
CopyToFlash:
BRA doCopyToFlash
CopyTools:
BRA doCopyTools
ProtectFlash:
BRA doProtectFlash
ComputeChecksum:
BRA doComputeChecksum
BgndHere:
BGND ; Return gate when RAM control is lost.
LoopHere: ; Safety net for BgndHere and jump target if no debugger.
BRA LoopHere
;------------------------------------------------------------------------
; UnWriteProtect opens the flash to unlimited writing. There is no vector
; to it, because we don't want anyone going to here by accident. The preceding
; instruction is a jump, so the CPU can't just fall into this. Don't use this
; just for writing but for development of flash utilities or to bail out of
; a locked up memory situation.
;.......................................................................
; ---------------------------------------------------------------------
; ProtectFlash (re)establishes write protection on flash. This should be
; invoked once when initializing a new unit, typically immediately after
; invoking CopyTools.
;.....................................................................
;------------------------------------------------------------------------
; ConfigureFpga programs the fpga by calling the embedded programmer. The
; programmer is hard-coded to run at 8000. A1 passes the source address. If
; this is 0, the FPGA programmer uses its own image. A2 passes the address
; that the programmer should return to when done. This is intended to run from
; flash by may run in RAM. However, it is likely likely to be damaged by the
; programming process and act wierd upon return, although the FPGA should be
; properly configured at this time.
;.......................................................................
; -----------------------------------------------------------------------
; ComputeChecksum computes a simple sum of all of the bytes from A0 through
; A1-1, storing the result in D0.L. Use this to determine equivalence between
; two ranges or one range before and after some potentially damaging activity.
;.........................................................................
;-------------------------------------------------------------------------
; CopyToFlash copies bytes from A0 through A1-1 to A2 in flash. This pauses
; at every sector rollover, for the flash to program itself, and then
; executes another magic handshake to enable writing to the next sector. The
; destination doesn't need to begin or end on a sector boundary. Whether the
; destination ends on a boundary or not, this function waits for the final
; sector to finish programming before jumping to the BGND instruction used to
; return control to the debugger.
;........................................................................
;-------------------------------------------------------------------------
; CopyTools copies these tools from RAM into flash memory. It doesn't copy
; itself. This is usually the first step to setting up a new unit, because it
; can be used to bootstrap the rest of the system. It doesn't write protect
; flash because it jumps to doCopyToFlash, which may be called repeatedly and
; we don't want to wear out the protector (presumably it uses flash).
; Therefore, normally after invoking CopyTools, ProtectFlash should be
; invoked. This only needs to be done once to establish software-based write
; protection.
;........................................................................
CONFIGURATION SCENARIOS
ICD32 MACRO USAGE
The ICD32 program automatically executes any local file called startup.icd. This can be very confusing if startup.icd does anything other than simple configuration. The best policy is to have no startup.icd file or one that is just a copy of msm.icd.
Many of these macros load S-record files into flash. There is little user notification as to success or failure and what there is goes by very quickly. Even if the source (S) file doesn't exist, the macro seems to execute without any particular problem. You can tell that something went wrong by subsequent usage failure but, at that point, the cause of the problem isn't obvious. ICD32 is a fairly crude program and affords no means to correct this difficiency. Therefore, before invoking any of the macros that load S-record (or binary image) files, be sure that they do exist.
BUILD AND LOAD TOOLS INTO FLASH
This requires the same environment as for building the MSMBIOS (or APUBIOS) program. In the MSM directory type "m f" or "m ftools.s". Start ICD32. Type macro msm, followed by macro pgmtools.
CREATE AND LOAD A NEW NORMAL FPGA VERSION
After a new FPGA version has been tested, it can be merged with the loader program to become the standard configuration. This is really a configuration management issue, because fbinpgm2.icd can be used to load the new version over the old one to produce the same result. Overwriting the old version in this manner is similar to patching object code in a program-- useful for testing, but not for program maintenance.
The procedure is as follows:
LOADING A NEW BIOS
For a new unit, there is only one way to load the MSMBIOS.S program into flash: start ICD32 and type macro program, first verifying that MSMBIOS.S exists. We also have a batch file called CPU.BAT, which presents a menu of options, including to program the BIOS, but this is just a shell over the ICD32 with macro program.icd.
To update the BIOS in a unit, the ICD32 with program.icd macro method can be used or the BIOS can be updated through the master interface. The latter is generally simpler because it doesn't require a BDM debugger. After downloading MSMAPP, the script debugger affords a dialog to configure the target, including loading a new BIOS. Open the top-level menu Target and select the target (if attached to more than one IML unit, e.g. APU and MSM) if it isn't already the selected one. Select the Configure item, which opens the Configuration dialog. Pressing the Replace BIOS button opens a file dialog. The default file is the local (script debugger's default directory) MSMBIOS.S but you can select any file from any location. MSMAPP will only accept real BIOS programs (it is pretty smart about this) but since this will reprogram the BIOS in flash, it is a good idea to be reasonably sure of the program. This method is convenient but is available only if the existing BIOS works well enough to download the application program.
BUS ERRORS UNDER BDM DEBUGGERS
In system design report 24 topic MSM Bugs [cdxsys-MsmBugs] item 3, Coleman asked why the ICD32 debugger issues multiple bus error warnings at every program step. The cause lies in the memory display windows, F3, F6, and STACK. At every step or break, the debugger tries to fill these windows with current data from the target. For every access that fails, the debugger displays an XX in the window and reports "BERR Terminated bus cycle -- Debugger Supplied DSACK".
Jack, Coleman, and I had all concluded that this error report was nothing more than an annoyance and that the condition had to be benign, because the target seemed to function correctly. However, just as compiler warnings are useless if we become complacent and ignore them, seeing this bus error all the time may lead us to ignore it when it actually indicates something useful.
We can easily get rid of the warning from F3 and F6 simply by pointing them (by MDF3 and MDF6 commands) to good memory. By default they point to 0, which is nearly always good. The STACK window is more of a problem. CPU register A7, the stack pointer, points to the end of the stack, which is normally the last word in RAM. The ICD32 debugger displays the stack from this address forward, which explains why the first word in the STACK window has a value while the rest display as XX. To reduce this misleading warning from the debugger, the BIOS program has been changed. It has little use for the stack initially, so the SP vector has been changed to point to a lower address. Just before the first stack use, A7 is initialized to the end of RAM to avoid wasting memory and at this point the warnings will resume. No doubt we will all forget this explanation but program comments will explain it and even from the object code alone, it will be obvious that changing the stack pointer is what causes the problem.
This same problem also occurs when using the NoHau debugger if the watch or data window contain inaccessible addresses. However, instead of warning, the debugger just steps incredibly slowly. If you find this happening, redirect, clear, or close the offending window.
APU REDESIGN ELEMENTS READY FOR SCHEMATIC ENTRY
Several major aspects of the APU redesign are still under investigation, including ADC subsystem, gather decimation and bubble detection FPGA firmware, ECP and USB master interfaces, and serial temperature network. However, the APU3 design, which will provide the basis for the next design, contains elements that are ready now for schematic redesign. These are as follows.
5/29/01
[Report 25] [Report 27]
PLACEMENT-FREE CHANGES
CHANGES THAT REQUIRE REPLACEMENT
PLACEMENT-FREE CHANGES
CHANGES THAT REQUIRE REPLACEMENT
PLACEMENT-FREE CHANGES
CHANGES THAT REQUIRE REPLACEMENT
PLACEMENT-FREE CHANGES
CHANGES THAT REQUIRE REPLACEMENT
6/28/01
[Report 26] [Report 28]
DESIGN MEETING
PARTICIPANTS
John V, Robert D, David B, Jack W, David McCracken. With additional input from Jerry S.
VENU
ADD (Santa Clara) June 26, 2001. Primary purpose to review project work partitioning; secondarily to review hardware corrections.
System design report 26 describes two categories of corrections to the MSM, RPMD, APU1 and APU3, replacement-free and changes requiring component replacement. The motivation for this division is that component placement requires extra skill, experience, and oversight. Replacement-free changes can be effected by a relatively inexperienced operator with modest oversight requirements by David B. The design changes will be quick, leaving David with time to work on more challenging issues.
The MSM hardware is essentially done and the RPM and LPM corrections are fairly minor (reviewed later in this report). John V suggested that we build 20 more MSM boards, as we are reasonably confident in its design. Since all of the proposed instrument forms require one each of the MSM, APU, and VPM, this implies that we would want to build 20 APU and VPM boards as well. Both of these other boards require substantially more rework than the MSM.
The basic design of the APU has been largely determined by the APU3. We have still not decided whether to use a DSP or FPGA approach to the VPM. Since David McCracken is the only team member with significant DSP experience, we decided that he should take responsibility for the VPM. We agreed that David B, in addition to sheparding the MSM, RPM, and LPM PCB corrections, will share responsibility for the APU with Jack W.
Jack has concentrated on cell data gather mechanisms. We decided that he should continue to be responsible for this portion of the APU design while David B will work on previously described changes and on a revised (parallel data) ADC. The APU's FPGA program is currently written in Altera's AHDL. David McCracken suggested that Jack convert this to Verilog. However, both Jack and David B argued that this should only be done when and if the need arises for B to do substantial reprogramming. McCracken argued that converting the ADC interface from serial to parallel represents such a situation. Jack and B will decide this issue.
After investigating an as yet unexplained ADC error, Jack's time will not be fully consumed by the APU redesign and he can help with deploying the new technology in the breadboard instrument, which is a revamped CD3200. David McCracken will continue helping this effort and developing the script debugger. Jack, David, and Johnny Lo will all work on the breadboard's analyz.ini file and test scripts.
APU1 and APU3 use a serial data out ADC. To avoid introducing noise, there is no overlap of conversion and readout so 100% of the readout time accrues to the overall access time. The cheapest way to decrease coincidence is to use a parallel output ADC. The only downside is that the ADC's instantaneous digital power requirement goes up as a result of the need to switch 16 outputs at once without creating ground (or power) bounce sufficient to cause apparent control signal changes.
While the change from serial to parallel data represents a major improvement, there are additional minor improvements to consider. David McCracken suggested fly-by data transfer mode as one of these. With a serial output ADC, the FPGA has no choice but to read the output into its own register (SIPO) and then write the register to the shared CPU bus. With parallel output, the FPGA can effect a two-stage read/write transfer, which resembles the serial to parallel version, or a fly-by transfer, which is theoretically twice as fast. In a fly-by, when the ADC is ready to post data, the FPGA negotiates for the CPU bus (as a bus master or DMA-- the differences are minor) and then drives the ADC output data directly (possibly through a buffer) onto the CPU bus. In addition to being faster, the fly-by approach eliminates a 16-bit register in the FPGA. The two approaches should have approximately the same control complexity. If there is any difference it would be that the fly-by is slightly simpler. David B will include a review of these two transfer modes in his APU circuit redesign.
Jack described another means of reducing cell data conversion coincidence in the special (and important) case of RBC/PLT gather. This topic is discussed in numerous places in hardware/system and software implementation reports; e.g. Software Implementation Report 31 [Reports.doc-HardwareDecimation]; System Design Report 22 topic Multiple List Data Sources [cdxsys.doc-MultipleListDataSources]; System Design Report 23 topic Decimation Review [cdxsys.doc-DecimationReview]. To reiterate, our instruments gather RBC and PLT data from a single fluid stream, distinguishing the two by size. Theoretically, we could simply gather all cell data and send it to the data station for analysis. Practically, conversion coincidence prevents some cells from being measured. Typically, there are many more RBCs than PLTs and gathering unspecific cell data results in much more RBC data being collected than needed in order to collect sufficient PLT data. CD3xxx instruments don't distinguish between the two types during the low-level conversion process but they do use a software "parse" to decimate RBC data that is sent to the data station. This only reduces communication bandwidth; it doesn't improve conversion coincidence. The CD4000 implements a separate size-controlled amplifier/gain/threshold channel for PLT vs. RBC.
Conversion coincidence can be reduced for PLTs by decimating RBCs prior to conversion. There are two ways to do this. One way is to convert a reference channel and not convert the others if the reference channel value is above the programmable RBC threshold. The other does the equivalent function in the analog domain using a separate PLT channel, skipping all conversions for RBC. In either case, the FPGA would allow only the programmable fraction of RBCs to be converted. For example, if the fraction were 5 then 4 of every 5 RBCs would be not converted.
Implementing the decimation filter in the analog domain affords the best coincidence reduction but requires substantial hardware. Jack explained that it also could correct a separate problem that we have previously discussed, which is the cell width error caused by signal height variation. See System Design Report 23 topic Cell Width [cdxsys.doc-CellWidth]. If a complete channel, including gain as well as threshold, is available to PLT separately from RBC, the width will be, in effect, normalized. APU3 circuitry does not correct this problem because it only provides a separate PLT threshold and not a separate gain block. Whether the width measure affords significant utility remains to be proven. John also pointed out that the main signal problem with PLT is that the typical PLT measurement is only slightly above the noise level. Whether additional amplification affords better separation is unknown. Nevertheless, we decided that it would be beneficial to provide the hardware support that Jack has requested, giving him the opportunity to prove its advantages. The only downside is circuit cost, and this can easily be reduced by eliminating elements that don't prove valuable.
Jack showed how he has been able to steal an unused channel on APU1 by routing (via a selectable multiplexer) the expected RBC/PLT size reference channel input to it. This approach assumes the ultimate usage of two channels, which is something we would prefer to avoid. McCracken suggested implementing an eighth channel (the APU3 has seven) whose input is selected from any of the other seven or from an independent source via an 8-channel multiplexer. This preserves the independence of the seven channels that John has requested for a 2-laser system, while allowing any of the channels to serve as the RBC/PLT reference. We also considered a full crossbar matrix, allowing any input to route to any channel, but this affords little benefit. In particular, it doesn't address the fact that John's 2-laser system cannot be implemented if one of the channels is dedicated to PLT.
The CPUXBUS (mezzanine connector) interface is primarily intended to support alternate data station communication means. We have previously discussed whether any link other than HSL, which is essentially native to the CPU, should be provided on board the APU. The ECP link has been an attractive candidate because it is mature and fairly stable. However, we have encountered problems with both Compaq and Toshiba notebook computers' ECP implementations. USB has been rejected because the 1.0 specification affords terrible performance for instrument communication (mainly because of the poor design of the fake interrupt capability) and the 2.0 specification is in chaos. FireWire is much better for instrument communication, a point that is apparently not lost on peripheral manufacturers, many of whom have refused to develop USB implementations. Microsoft has finally decided to support FireWire (in XP) and is not supporting USB 2.0. In consideration of this chaos, John and David McCracken decided not to include any native link, other than HSL, in the APU (no other native link was ever considered for the MSM).
The mezzanine-based ECP interface, which we have implemented, and any USB, FireWire, or raw Ethernet interfaces that we develop will be peripherals of the main MC68340 CPU. Data is transferred to and from the CPU's bus programmatically by the CPU itself or via one of its DMA channels. McCracken discussed the possibility of another kind of interface, in which the communication controller is an intelligent CPU. Two examples of this are a direct memory interface to a single-board computer data station located very close to the APU and a TCP/IP coprocessor, which would also be essentially a direct memory interface. One way to implement a memory interface is via a dual-port RAM on the mezzanine card, mapping one side into the data station or coprocessor address space and the other into the 68340's address space. A potentially cheaper approach is to give the coprocessor (or data station) bus master capability, allowing data to be exchanged though main RAM.
We have experience with bus-mastering on the 68340 bus, as the MSM's motor control coprocessor (FPGA) cannot function any other way. The MSM represents a simpler hardware design situation because it uses SRAM. The DRAM used in the APU complicates interfacing both in its multiplexed address and refresh requirements. McCracken explained that Colemen multiplexed memory control signals simply by wire-OR and the only problem was that the original pull-up resistors didn't provide enough current to terminate certain transactions before the next access cycle. This particular problem probably would not have occurred in the APU, as the DRAMs' RAS precharge time would overlap the control signals' recovery time.
McCracken asked B to consider adding the capability of bus mastering through the CPUXBUS. This capability is not essential, as we can always use the DPRAM approach, but it might cost practically nothing and prove useful in the future.
Jack reported that, in the APU3, when the CPU reads the ADC the first reading is 20mV high. Subsequent readings are correct but if no reading are taken for approximately seven seconds, the next reading will again be high. He and Mike Y have examined this phenomenon theoretically and have no explanation. B opined that we should not build another APU until the source of this error is at least revealed and corrected in hardware if possible. While a software "fix" may afford a means of covering up the problem, it is not a viable long-term solution.
McCracken and B suggested a practical method for finding the source of the problem by using a digital scope to test both ends of the signal path (between the external source and the ADC input) and progressively working toward the middle to find some signal that coincides with the anomalous. B further suggested writing a script to continually recreate the problem by, in an endless loop, repeatedly taking two readings and then waiting for seven seconds. This is a good demonstration of the power of the scripting system to simplify test procedures. Automating an optimum stimulus allows undivided attention to be concentrated on probing the system under test.
OTHER APU ITEMS
See System Design Report 23 topic APU Status [cdxsys.doc-ApuStatus].
System Design Report 26 topic MSM PCB Corrections That Require Replacement item 6 [3-4-Wire Motor Flags] describes one possible solution to the problem that we currently have two kinds of motor flag opto-interrupters. The most obvious difference is that one type has a 4-wire connector while the other has three wires. The difference is deeper than this, however. The 3-wire device has an integral resistor to limit the LED's current, in addition to combining the LED's (cathode) and receiver's (NPN emitter) ground leads to eliminate one wire. The 4-wire device simply presents the four raw connections, LED anode and cathode and receiver collector and emitter.
If the 4-wire device's LED's anode were driven directly by 5 V, as we do with the 3-wire device, the LED would be immediately destroyed. Consequently, we can't provide a universal jack by some wire trick, such as pins 3 and 4 both ground and pin 4 NC for all 3-wire devices. The suggestion in Report 26 doesn't attempt to make a truly universal jack. It only provides a means of adapting a 3-wire jack to a 4-wire device without having to change the PCB (other than cutting a link). If a 4-wire device were plugged into an unmodified such 3-wire jack, it would be destroyed, and a 3-wire device would not function in a modified jack.
The only way to make a safe universal jack would be to limit the LED current (the circuit has to limit the phototransistor receiver's current in both cases). This would have to be done with a very low dropout current regulator because the 3-wire device is designed for 5V. Therefore, standard techniques like an adjustable 3-terminal regulator configured to regulate current will not work. Anything more complicated would consume too much board space. However, an even simpler approach is to use a junction FET with its gate shorted to its source. In this circuit, an N channel FET's drain is connected to 5V while its gate and source are connected together and to the load, the LED's anode. The FET's Idss (drain-source current with gate connected to source) is limited by its pinchoff voltage to somewhere between 2 and 20 mA, depending on the device and the operating temperature. Such a broad range prevents this feature from being used in all situations, but some FETs exhibit significantly less variation than others. The problem is finding the right device. This is clearly the best approach if we can find the right device. If we can't and no one offers an alternative, we should implement the brute force suggestion from report 26.
The MSM, RPM, LPM, and VPM all drive solenoids. For all of these except for the VPM, which has been built yet, we have used two-pin .1" OC jacks. The CellDyn standard solenoid jack is a much more robust 5/32" OC with .045" pins. John pointed out that we will be changing to a plug at each solenoid to allow individual replacement without having to dismantle the wiring harness. Therefore, the jacks on the boards will see considerably reduced mating cycles. Nevertheless, we decided that the more robust jack would be preferable, at least where there is no size constraint. The LPM can easily accommodate the larger jack just by moving the feedback LEDs. The current RPM layout is much less amenable but, unlike the LPM, the RPM board itself is not size-constrained. David B will review the MSM to determine which jacks it should use for the few solenoids that it supports directly.
Jerry S examined the LPM and RPM boards and opined that they must be fused for the per-board current limit allowed either by UL et. al. standards or by Abbott's own requirements. He said that these should not be self-resetting, e.g. Polyfuse, because that introduces an additional safety consideration. He said that, although the fusing requirement is dictated by the potential current draw of all 32 LPM or 24 RPM solenoids turning on simultaneously, we should not allow this to occur, thus avoiding nuisance tripping. We have not fully address how to prevent this at turn-on. In fact, one purpose of B's trip was to try to understand why all of the solenoids on his LPM under test in Dallas come up in the on state while ours in Santa Clara pulse very briefly and then turn off. We suspect that his APU's FPGA program is out of date. In any case, we need to determine whether the brief pulse draws sufficient current to blow the required fuse. We might have to add brute force current limiting devices, such as MOV varistors or thyristors. Jerry suggested dividing the solenoids into separately fused banks, but this only reduces the size of the fuse and doesn't address the issue of total current draw. He also reminded us that the high voltage (supposedly 24V but he says it is really 32V in the CD3200) and low voltage (not 12V, as indicated, but 17V according to Jerry) need to be separately fused. An alternative might be to limit the current in the return (ground) power to reduce the number of fused points. This could also simplify any current limit circuit that uses a MOSFET for the pass element, as N-channel MOSFETs are cheaper and more efficient than P-channel but can only be used in the return leg without gate voltage boosters. Jerry suggested Pico as a source for small soldered-in-place fuses.
We obviously could benefit from Jerry's experience in power control and requirements. John is going to try to get some additional time from him to review all of our circuits and to possibly provide on-going oversight.
The current versions of the RPM and LPM terminate all the scan chain's flow through signals, CLK, NPCS, and RESET. This is incorrect. The APU's scan chain travels through the RPM, STATUS board, and LPM, ending at the VPM. If every board terminated these signals with a fixed 130 Ohm, as do the RPM and LPM, each signal would see a 32.5 Ohm termination, which is too low to operate reliably even at the slow speeds we are using to avoid having to be very careful with our treatment of the lines. Eventually, we may want to increase the signaling rate, in which case, we certainly can't tolerate even minor termination discrepancies.
As discussed in System Design Report 5 topic End Board [cdxsys.doc-EndBoard] a terminating plug affords the most convenient optional termination approach, terminating all three signals while simultaneously looping the MOSI output back to the end of the MISO input. Only the RPM and LPM need the optional plug. The status board loops back to the RPM, which can provide the termination when necessary. The VPM is always terminated, as it is located at the end of the chain no matter how it is used. The RPM or LPM is terminated only if used as an end board during development. In production, both are unterminated and don't need the plug adapter. Therefore, a few hand-made adapters (it is a very simple circuit) will suffice.
STATUS BOARD LOOPBACK
In testing the new LPM, we were at first confused by the fact that the scan chain functioned correctly if the APU's chain went through the LPM to the RPM but not vice versa, the intended topology. David B explained this by pointing out that there was no STATUS board connected and the chain was, therefore, broken. John has stated that the STATUS board is very low priority. Therefore, we will plug a passive loopback adapter into the STATUS board jack of every RPM.
SHEAR VALVE INTERFACE
We attempted to test the shear valve interface but encountered a problem in the FPGA design. The requirements originally proposed in System Design Report 7 [Cdxsys.doc-ShearValvePld] with corrections in report 14 topic RPM PLD [cdxsysdoc-RpmPld] is very flexible and, therefore, complicated. It enables one circuit to support three alternative shear valve control implementations. We have subsequently decided to support only the old implementation, in which power and low-level signal interpretation (conversion of overtorque and CW/CCW signals to CWEOT and CCWEOT) is provided by a board attached to the mechanism rather than on the RPM. One major motivation for this decision is that, while the switched-mode L298 driver may be more efficient, it is also much more sensitive to over-current failure. As Jack and David McCracken showed in the CD3200, the linear drivers in the old circuit will not fail even if the motor stalls for an extended period. Also, the motor is not damaged. Nothing is known for sure about the L298 circuit but McCracken reports experiencing uncontrollable L298 overcurrent failures on previous projects (it is really not a very robust device).
The problem with the current shear valve interface in the PLD is that it retains some of vestiges of the more complex design. In particular the CW/CCW and ON/OFF outputs are mutually dependent. For the old shear valve, we want two independent scan chain outputs and two independent inputs. Further, we can permanently enable this mode of operation.
YVALVE INTERFACE
We were going to test the Yvalve interface, but the jacks were missing on the RPM and we didn't have time to add them. B and McCracken will independently check out the Yvalve.
10/10/01
SCAN CHAIN SCLK RELATIVE TO NPCS
We previously discussed the fact that NPCS is now going high coincident with SCLK falling edge but goes low in the middle of the low half-period of SCLK. We agreed that both NPCS transitions should occur in the same relationship to SCLK and that this should not be coincident with SCLK. I would like to suggest a further refinement. For the loader board, I have included 74HC165 shift registers for the scan chain integrity test register in both the I/O and motor scan chains. This part has a positive clock inhibit, which can be connected directly to NPCS. Regardless of whether the other registers shift during NPCS high, the feedback register must not shift or else the output byte that it captures will be lost. The 165's inhibit is fairly crude. If it goes high (inhibit) while SCLK is low, a shift will occur, which would cause the register to lose one bit of the feedback byte. Unless it creates a problem for the 595s or 597s, I suggest that NPCS change states in the middle of SCLK high half-period. It appears that this would mean moving the rising edge of NPCS to occur 62.5 nsec (1/4 SCLK period) sooner than currently and its falling edge 125 nsec (1/2 SCLK period) sooner. Would these changes negatively affect either the internal (to FPGA) scan chains or the external (595, 597, and RPM PLD) registers?
In our ongoing discussion of the APU's analog inputs, John and I were not entirely clear about how many independent questions need to be resolved. It seems to me that there are three questions: how to arrange input jacks; whether to use the same trigger input amplification for all channels or to tailor some channels for specific presumed uses; and what, if any, special circuitry to provide for RPB/PLT separation.
1. I think we have agreed that the input jack question is best resolved by the following. The input jack is identical for all channels. Next to the input jack of three of the channels will be a 2-pin jack for PMT control output and test input. The most convenient cabling therefore results from using these channels for PMT input, but the option exists to use other channels or fewer channels for PMT.
2. Regarding the trigger input differences, Mike Y has explained his reasoning as primarily a comfort issue for methods developers, who "think in log terms" for certain signals. John and I have suggested that this is not a very persuasive argument. Nevertheless, the possibility remains that having some control over the treatment of these signals could be useful. This isn't strong enough to warrant sacrificing input generality but there is a reasonable middle ground, which is to create a common circuit that can accommodate all variations by changing resistor values (possibly using 0-Ohm resistors in some places). The initial boards should all be stuffed for the more common arrangement. If in practice we find a need for Mike's specialized treatment, we can easily recreate his circuits by changing resistors. If not, we can decide whether to keep the configurable circuit or replace it with the non-configurable standard one.
3. Regarding the specialized analog support accorded RBC/PLT, Mike Y was the major proponent of a completely separate channel, with its own gain, for PLT vs. RBC, but in our most recent discussions he has voiced reservations about this himself, citing certain advantages to using the same channel for both. In particular, as Mike points out, unpredictable differences between the RBC and PLT hardware (both in signal levels and input timing) may cause uncorrectable errors. I suggest that we not provide even the option of using a separate channel for PLT vs. RBC. A second question related to specialized RBC/PLT hardware is whether to provide an analog RBC/PLT discriminator. Jack likes this because it can reduce coincidence due to ADC availability by allowing the FGPA to determine whether to collect list mode data for a particular cell without having to convert any of its signals. Without this, decimation requires a size signal to be converted to digital so that the FPGA can determine whether the cell is above or below the PLT/RBC threshold. Considering the low cost to provide such a signal, it seems reasonable to do so.
NEAR-ROOM TEMPERATURE CONTROLLER
We previously mentioned the need for a near-room temperature controller for one of our reactions. We asked Ed Sewall to look into this and I provided a possible starting point in an article that describes a binary (heating and cooling) temperature controller originally intended for use with a communication laser. It should be reiterated that we don't need this for controlling our lasers' temperatures-- but some of the requirements appear similar. Ed suggested that he needed more specific requirements before embarking on this. The fluidic design is still being formulated and, in fact, partly depends on what kind of temperature control we can provide. I think it would be good to provide some kind of controller even if we don't know all of the details. We are mostly short on thermal mass specifications and required times for reactants to be brought to the set temperature. The reactants themselves comprise less than 10 ml and, therefore, play only a small role in the requirements. Also, given the relatively slow transfer of energy through the glass or plastic reaction chamber, the reactants will have to be heated by the chamber itself. Consequently, it would be preferable for the chamber to have a large thermal mass and to be maintained at a steady temperature by the controller. The injection of reactants would have no immediate effect on the error term. The major error sources would be in the chamber's energy transfer to its external environment and in bringing it back to the set temperature after rinsing. We don't currently have an answer to all of these questions. However, we do have the following specifications:
1. The external instrument environment can vary between 15 and 35 C. The instrument's internal temperature may vary between 15 and 40 C.
2. The reaction temperature may be set between 22 and 27 C.
2/18/02
BACKGROUND
The RSH (Random Sample Handler) transport unit ("robot") contains the vertical (Z-axis) and theta motors plus nine sensors and sensor conditioning circuitry. The two motors are connected directly to the control unit via a 9-wire cable. Sensor signals drive 18 wires of 25-wire cable through differential voltage (RS422) drivers. The remaining wires in this cable are used for driving the picker rack detector LED and for power (DGND and +5V) and shield. The two cables are subject to constant movement as the transport unit traverses the track. They are enclosed in a plastic chain to reduce abrasion and eliminate kinking. In prototype quantities the cable assembly costs $288.
It has been determined that, as currently designed, the RSH is too expensive for our instruments. Our overall goal is to reduce its cost while maintaining or improving its reliability. We have two ways to do this; by adapting it to less rigorous performance requirements and by identifying details that can be improved.
ANALYSIS
Our performance requirements for the transport unit are not significantly different from the original requirements. However, the design contains several details that may be modified to reduce cost and/or improve reliability. They are:
All of the signals except for one, theta align, have very relaxed timing requirements. The motor home signals are only used during homing, while the rack and transport position detectors have tens of milliseconds to respond. In any case, our I/O scan chain period is between 20 and 68 usec. (depending on number of bits) so any serializer period under 10 usec. would have no effect. Therefore, the parallel transmission of sensor signals is a conceptual error, considering the high cost and reliability issues associated with cabling in a moving assembly. If we were starting this design fresh we would never consider this approach. Serial transmission is, without a doubt, cheaper and more reliable. However, the cost saving is relatively modest. The most radical serializing approach would use just one differential pair, eliminating 16 wires and saving only 16 dollars. The reliability improvement is real but difficult to quantify.
Eliminating the motor cable would yield a significant cost savings. Reducing the 25-wire cabling requirements by serializing the sensor signals affords a means to merge the motor wires into the cable. However, there is considerable fear that the differential signals would be affected by the high current motor drive signals. Don Walker expressed both this and concern that the awg 26 wires used in the 25-wire cable would not be adequate for the motor current. He cited an "EIA Normal Load" of 1 amp for awg 26 wires vs. 2 amps for the awg 24 currently used in the 9-wire cable. This rule-of-thumb is probably based on signal voltage drop and is not relevant to current-driven stepper motor control. In any case, we drive each of the two motors with a maximum current of 0.6 amps. The induced signal error issue requires thorough analysis and testing.
The decision whether to change from the flex circuit to a single rigid PCB with wired sensors lies with our manufacturing and service. We will review the situation with them before making any other changes (signal serializing and cable unification) to the PCB.
CABLE UNIFICATION
The purpose of differential signaling is to reject common mode noise. In theory, if both wires in a differential pair are affected similarly by a noise source, the signal remains unchanged. In practice, the noise source may induce such a large voltage step in both wires that the receiver saturates and cannot respond to the differential signal. It is also possible for the two wires to be unequally affected by a noise source but this problem is entirely controlled by proper cable design.
An accurate analysis of the effect of a noise source on a differential pair requires accurate data on every aspect of the cable and the signals. Principal elements are the wiring topology, for example whether both the subject signal and the noise signal are carried by twisted pairs and, if so, the number of twists per inch; the physical relationship of the subject signal wires to the noise source wires; the subject signal termination impedance; the subject signal driver impedance; etc. Even given all required data, testing is still necessary.
Alternatively, a test can be devised that is provably more error-prone than the real implementation. If run for a sufficient length of time without error, such a test should demonstrate that the real implementation would be reliable. In the real implementation, the signal and motor drive wires will be twisted pairs, ensuring that the noise is common mode. A significantly more error-prone cable would simply twist the motor and signal wires together in one bundle. A pathological cable could be composed of two twisted pairs, each composed of one signal and one motor drive wire.
We could test the actual application with a deliberately bad cable but this would require thousands of hours to achieve any assurance. For an induced error to be detected in the system, it would have to occur at the moment that the subject signal was being used. For example, even if noise induced a glitch in the serial data clock, causing all sensor signals to be wrong, the error would be revealed only if the controller were to sample the signals at this time. In fact, the signals are sampled infrequently. To decrease the error aperture, we can make a test bench in which any signal error is latched. This was done using two transmitters from a 26C31, two receivers from a 26C32 and a 74LS74 dual flip-flop. One transmitter input was connected to DGND and the other to +5. Each receiver end was terminated by 130 ohm between the signal pair. The receiver with the low (DGND) signal was connected to the clock of one flip-flop so that a high-going glitch would latch the data input, which was connected to DGND. The receiver with the high (+5) signal was connected to CLR of the other flip-flop so that a low-going glitch would clear. Both flip-flops were armed by presetting. A high-going error would latch one flip-flop's Q low while a low-going error would latch the other's Q low.
A pathological cable was tested to verify the operation of the test system. One of the transport unit motors was tested in full-step mode. Every motor move latched both error indicators. An LC filter was added at the source of each motor winding drive pair. Following the suggestion of the manufacturer of the 3717 motor drivers, the motor's winding inductance was measured as 2.4 mH; a 240 uH inductor and 2 uF capacitor were chosen for the filter (Lf = .1 * Lm and C = (4 * 10 ^-10) / Lf). With these filters in place, errors were observed only when the motor was driven at low power. It is reasonable that low power created more noise than high power because the 3717 chops the motor drive current to reduce power. This testing also revealed that we cannot generally rely on such filters to reduce noise because they reduced the maximum step rate of the motor by 60%. Don Walker has suggested that we include low-pass filters on all inputs to reduce their sensitivity to noise. Along those same lines, we could include low-pass filters on drivers to reduce their radiated noise. However, we can't know how severe to make the filtering until the motors' maximum required speeds have been determined. In some cases, we may not be able to provide any filtering.
A bad cable, with all wires twisted together, was built. It contains two sets of motor drive wires. One set is connected to a lightly loaded transport unit motor and the other to a heavily loaded syringe drive motor. Three distinct tests were developed. A script was written for each test. The motors are driven alternately clockwise and counter-clockwise. In one test both motors are driven simultaneously with high power at various rates ranging from very slow to approximately 75% of the motor's maximum speeds. In another test the motors are driven with low power, first one motor at all speeds, then the other, and then both together. In the third test, one motor is driven using micro-stepping at various speeds. Like low-power, micro-stepping creates the most noise since both windings are constantly chopped. Each test was run for two hours without error.
Each pair of motor winding drive wires should be twisted to reduce radiated EMI and we should consider LC filters to further reduce this irrespective of the effect on the signal wires. Therefore, using a cable without twisted pairs in order to save cost is not an option. A good cable will exhibit less coupling between the motor wires and the signals than the bad cable, which testing revealed to have good coupling immunity due solely to impedance (terminating resistance and transmitted output) and differential signaling effects. Therefore, it may be reasonable to proceed with unifying the cable. First, the cable test should be independently reviewed and repeated.
A hidden cost associated with the unified cable is skepticism on the part of system developers. It will be easy to blame any error on the unified cable. Even if the cable is not at fault, developers may insist on building a separated cable system to test this hypothesis instead of looking for the real error. To prevent this, for modest cost we can build in a means to support both unified and separated cables. We will serialize the sensor signals in any case, freeing enough pins on the DB25 connector for the motor wires. On the transport PCB, we can route the motor traces both to these pins and to the DB9 connector. On the controller PCB we can use the same routing but include a convenient means of disconnecting the DB25. To use separate cables, we only need to open the DB25 links and connect the DB9 cable.
SERIALIZER DESIGN
There are four ways that the sensor signals might be serialized, each characterized by how many signals it requires and circuit complexity. The least complex circuit requires four signals, the most complex one signal. Each signal requires two wires, as they are all transmitted as differential pairs.
The least complex circuit would be simply to extend the MSM's I/O scan chain to the transport unit. The standard I/O scan chain requires five signals, data out (MOSI), data in (MISO), clock, load, and reset. Reset is not needed but both data in and data out are needed, even though the transport unit only transmits data, because the scan chain is a loop. In all of the other approaches a deserializing circuit on the loader controller (actually interface board for us, as the MSM is the controller) presents the reconstituted parallel signals to the controller (in our case to scan chain input registers). The serializer-deserializer system is entirely independent of the scan chain and could be used in a system without a scan chain. One goal of the design is to provide a communication means that would be available as an improvement of the RSH for Architect, which doesn't have scan chains. For this reason the scan-chain-based approach was rejected (it could also be rejected as the biggest consumer of wires).
The three-signal approach is similar to the scan chain but independent of it. A circuit on the transport PCB serializes data while generating its own clock and load signals, transmitting all three to a scan-chain-independent receiver (HC595 or PLD). The two-signal approach is a variant of this. A more complex receiver doesn't require the bit clock but can generate its own by phase-locking onto the periodic load signal. This is the standard approach used by the LVDS SERDES (serializer-deserializer) circuitry promoted by TI, National Semiconductor, and others for replacing costly and error-prone parallel signals (like the RSH transport cable). In this approach, the circuit on the transport PCB is identical to the three-signal approach-- the two clock wires are simply not included in the cable.
The one-signal approach is to recover clock and load (word sync) from the data signal. An asynchronous or synchronous protocol is required. The former is less complex but requires more time for synchronization. Neither one is simple to implement in a PLD. This approach was rejected as too difficult to implement.
With the rejection of the four- and one-signal approaches, the choice is between the two middle solutions. The three-signal approach was chosen over the two-signal for the simplicity of its receiver. Without changing the transport PCB we can convert to the two-signal solution by changing the receiver.
SERIALIZER IMPLEMENTATION
A Xilinx XC9536XL PLD was selected for the serializer due to its low cost ($1 in any quantity). It requires 3.3V Vcc but accepts and generates TTL-compatible signals, a requirement for interfacing with the sensor signals and 26C31. Icc ranges from 20 to 50 mA, which makes it easy to generate the 3.3V from 5V using a resistor (33 ohm, .25W) and zener diode (3.3V .4W).
A clock is needed for the bit rate and serializer state machine. This does not have to be accurate or stable so I tried to build an RC oscillator in the PLD. I tried several oscillator and multivibrator topologies and, although some did oscillator, none yielded a clock that the PLD was able to use. One multivibrator design yielded a decent looking clock but when used as a clock input, the PLD apparently doubled the frequency. When I asked Xilinx about this, the FAE said that he too could not make a usable oscillator in the part. Since an external clock would be needed, I chose a single-component 2 MHz crystal oscillator instead making an RC oscillator from discrete components that might be difficult to source (e.g. 74HC14). A silicon oscillator, such as Maxim/Dallas DS1077L, could also be used.
As currently designed, the transport PCB has nine sensors. We will probably have only eight as we do not need the rail guide sensor. This sensor is located at the end of the rack picker and tells when a rack is engaged. This is an expensive assembly with a moving cable and is not necessary. The reflective rack sensor at the tray (Tray Section Carrier Detector) tells whether the picker has lifted the rack while the sensor at the belt (Carrier Positioner Carrier Detect) tells whether the rack was dropped in moving from the tray to the belt. If the picker fails to engage the rack at the tray, we will not hunt for the rack so the rail guide sensor would only serve to allow us to report a mechanical failure to the user one second sooner than without it.
To facilitate using the serializer in other applications, the PLD's Verilog code parameterizes the number of inputs using defines for the number of signals (MaxSig) and the number of bits in the control state machine (MaxStateBit). Although we won't use be using the rail guide sensor, provision was made for it for the benefit of Architect by defining MaxSig as 8 (the nine signals are 0 through 8). MaxStateBit was defined as 3, which means that there are 4 state bits and, therefore, as many as 16 inputs could be supported. The state machine is clocked by the external oscillator, which also drives an independent inverter (realized in the PLD). The inverted clock is used as the transmitted bit clock. Both the PLD and the receiver clock on the positive edge so they are in anti-phase, avoiding timing problems.
The operation of the serializer is straightforward. The serial output continuously shifts with no delay for loading inputs or for the receiver's parallel shift out. The signals are shifted out most significant (sig8) first. At state MaxSig, the serial output data, sdat, has the value of input sig0. On the rising clock edge, pload, the receiver's parallel load signal goes from low to high, causing a broadside load of the receiver's serial shift registers to its parallel outputs. Also on this clock edge the serializer state changes to 0 and all of the input signals are latched. Since sdat is combinatorially driven from the latched MSB, it reflects this value after a gate delay. The change in sdat coincident with pload does not result in a race condition because the receiver's serial shift registers were clocked on the previous invclk. At state 0, sdat shows the MSB (sig8). After one/half clock delay, the inverted clock, invclk, causes the receiver to shift in this bit. At the next clock, the serializer's input registers are shifted left (dat << 1) causing sig7 to appear at sdat. This process repeats until at state MaxSig / 2 pload goes low, arming it for the next parallel receiver loading. For any even number of signals, pload is symmetrical, which may simplify phase-locking to it in case we decide to implement the two-signal receiver.
With only nine input signals, the PLD is under-utilized. Unused input pins have to be pulled to Vcc or DGND and unassigned pins default to input. To keep unused pins physically uncommitted, all of them are automatically made outputs by the declaration output [27:`MaxSig] unused; The name "unused" has no intrinsic meaning but is simply the name arbitrarily chosen for the vector.
The Verilog test bench, sersigt, is wired to all of the inputs and outputs of the rsh instance of sersig. It also duplicates sersig's internal state counter to display the state in waveforms. For most simulators this is not really necessary, as they can show the internal workings of all simulated components. For simulation, the signal vector (sig0 through sig8) is preloaded with the pattern 101001110. After generating exactly enough clocks to shift this word, the vector is loaded with the pattern 100011010, which is then clocked out. Microsoft Word document sersim.doc [sersim] contains a bitmap of the simulation waveform produced by ModelSim.
Low level timing is very loose. The only possibility of criticality is in the delay of sdat relative to invclk. A 74HC595's input data setup time is 24 nsec. (@<85C). The worst case combined delay of a 26C31 transmitter and 26C32 receiver is 56 nsec. (including both propagation delays and differential signal rise time). Sdat is delayed in the PLD by two gate delays, the sequential shift and the combinatorial out, which is maximum 16 nsec. Therefore, the total worst case time from clock to sdat required valid at the receiver is 96 nsec. Sdat shifts at the PLD on the clock but at the receiver on invclk, which reduces the 500 nsec. (2 Mhz) period to 250 nsec. Thus, even if invclk experienced no delay, there would be a 154 nsec. margin.
sersig.v (signal serializer)
`define MaxSig 8 // Count of signals - 1 (note 0-based indexing).`define MaxStateBit 3 // 4 bits supports up to 16 signals.module sersig( clk1, sig, sdat, pload, xclk, invclk, unused );
input clk1;
input [`MaxSig:0] sig;
output sdat;
output pload;
reg pload;
input xclk; // Connect to same source as clk1.
output invclk;
output [27:`MaxSig] unused;
reg [`MaxSig:0] dat;
reg [`MaxStateBit:0] cnt;
initial begin // Simulation assistance.
cnt = `MaxSig;
pload = 0;
dat = 0;
end
assign unused = 0;
buf( sdat, dat[`MaxSig]);
always @(posedge clk1)
if( cnt == `MaxSig )
cnt <= 0;
else
cnt <= cnt + 1;
always @(posedge clk1) begin
/* At trailing edge of last state, load the input/shift registers. At all
other times shift the loaded data. */
if( cnt == `MaxSig )
dat <= sig;
else
dat <= dat << 1;
case( cnt )
`MaxSig:
pload <= 1;
`MaxSig / 2:
pload <= 0;
endcase
end
endmodule
sersigt.v (test bench)
`include "sersig.v"module sersigt; reg osc; reg [`MaxSig:0] sig; wire sdat; wire pload; wire invclk; wire [27:`MaxSig] unused; reg [`MaxStateBit:0] cnt;
sersig rsh( clk1, sig, sdat, pload, xclk, invclk, unused );
initial begin
osc = 0;
sig = 'b101001110;
cnt = `MaxSig;
end
always
#1 osc = ~osc;
buf( xclk, osc );
buf( clk1, osc );
always @(posedge clk1)
if( cnt == `MaxSig )
cnt <= 0;
else
cnt <= cnt + 1;
initial begin
$monitor( $time,,, "clk1 = %b, cnt = %d, pload = %b, sdat = %b",
clk1, cnt, pload, sdat );
#(`MaxSig * 2 + 2) sig = 'b100011010;
#(`MaxSig * 2 + 4) $finish;
end
endmodule
2/22/02
MOTOR FLAG SIGNALS
System Design Report 29 states "All of the signals except for one, theta align, have very relaxed timing requirements. The motor home signals are only used during homing, while the rack and transport position detectors have tens of milliseconds to respond." Don Walker responded to this, saying that we should monitor theta's position during operation in order to detect position faults. While not stated in the report, this is the purpose of theta align. Theta home could also be used, as Don suggested, but theta align is slightly more efficient for the script.
Both theta home and theta align flags are opto-interrupters. However, theta align's aperture (actually, there are several) is a hole in the flag wheel, while theta home's aperture is half of the wheel. To adequately test position using the home flag, a script must test on both sides of the transition point. A single test affords resolution of only half the wheel. In contrast, to verify the position using the align flag, the script only needs to confirm that the opto-interrupter is not blocked, i.e. a single reading.
As long as theta align can be used, it affords more efficient position verification than theta home. However, there are two reasons why it might not be usable. One is that to provide a fair degree of positional accuracy the hole in the wheel is small. The most efficient means available to the script for testing theta align is to wait for the theta motor to reach the position of the hole (see Software Implementation Report 62 topic Wait For Stepper [Reports- WaitForStepper]). As explained in Report 62, waiting for a local motor is efficient and fast. The APU would not be expected to execute scripts controlling the transport unit, so the additional overhead involved in remote motor wait is not a concern. However, the latency of script command execution is unpredictable. At high theta speeds, the hole may pass through the opto-interrupter too quickly for the script to reliably sample it, in which case the script would have to sample the home flag (and we would have to accept greater positional uncertainty).
The other reason why we might not use theta align is that the mechanical design requires an unusual double-decker opto-interrupter, with one aperture for home and the other for align. We may want to remove this part to reduce cost and/or to have a multiple-source component.
It should also be mentioned that Don's suggestion applies to other motors. While Theta is the only one that has the possibility of checking position by sampling a flag dedicated to this purpose, all of the motors have home flags, which can be checked if the normal range of movement reaches these flags.
As explained in System Design Report 29, regardless of position flagging, the update period of our I/O scan chains establishes a maximum input sampling rate ([Cdxsys- ScanChainInputSampling]). Presenting the inputs at any rate more than twice as fast as the sampler would serve no purpose. This is not to say that the sampling rate is fast enough for everything that we may want to do with the transport unit; just that for speeding up the presentation of inputs to have any effect, we would have to redesign the entire scan chain system.
MOTOR POWER WIRES
System Design Report 29 stated that awg 26 wires, with their "normal" load of 1 amp, would be adequate to carry the .6 amp maximum current of our theta and Z (vertical) motor drives. Don Walker responded that Architect provides more than 1 amp for each of these motors. He suggested that we test our system with twice the expected mechanical load to determine whether the lower current will provide reliable operation.
Don's comment was based on the report's assumption that the 25-wire cable would continue to be built with awg 26 wires. Engineers from Cable Connection, a cable design and fabrication company based in Los Gatos, have reviewed the cable assembly. We asked them if there would be any difficulty building the cable with awg 24 wires. They say that there would be no problem. Awg 24 wires could carry higher current. To provide higher current, we would change the current sense resistors used in the driver circuits. We would also have to change the driver ICs, currently TEA3717, which get hot under continuous operation even at .6 amp. The TEA3718 is pin-compatible and should run much cooler, as it uses DMOS instead of bipolar power transistors.
Don's comment was also addressed by verifying operation with a heavier mechanical load. The total weight of the new rack with five 3-ml. Vacutainer tubes filled with water is 93 grams. For testing, the rack was populated with larger tubes filled with enough lead shot to achieve a total weight of 186 grams. The demonstration script was run. In this script, a rack is picked off the tray and deposited on the (simulated) belt. Then the rack is picked off the belt and returned to the tray. Both theta and Z are fully exercised at high speed. Initially, Z power was high and theta power was medium at all ramp segments. Z had no trouble with the heavier load. Theta's acceleration was not reliable. Theta's up ramp power was increased to high. Operation was superficially reliable but there was noticeable position error after 25 cycles. Power for theta's slew and down segments was increased to high. There was no error in 100 cycles.
The test was stopped because the Z motor driver, which full-steps, was hot. Since Z will not run continuously in the actual system, we could probably safely continue to use TEA3717 but we should replace this with TEA3718 for long-term reliability and to be able to run tests of longer duration. Nevertheless, 100 cycles without error at twice the normal load affords reasonable assurance that .6 amp is adequate.
CABLE DESIGN
In their review of the transport unit cable, the engineers from Cable Connection identified several questionable practices. One thing in particular stood out. The metal hood was too small for the cable and the tail of the hood was cracked and bent to accommodate the cable. The reviewers stated that the proper way to secure the cable was not by pressure from the hood but from a grommet sized to fit both the hood's tail opening and the cable. They questioned the construction of the cable itself. The signal wires are encased in a sheath surrounded by a woven shield. This assembly is encased in a very thick outer casing. Both the inner sheath and outer casing appear to be much thicker than necessary, adding bulk with no apparent value. They criticized the layout of the wires inside the hood. The wires are essentially crammed into the hood. The wires all appear to be cut the same length so that the ones attached to pins nearer the tail are folded into a U, while the ones further out lay flat, which is what the reviewers said all the wires should do.
SERIALIZER PLD
System Design Report 29 presents the design of a serializer for the transport unit. The design is based on a Xilinx XC9536XL. The data bit block, invclk, is combinatorially derived from the same external clock as the global clock, clk1 ([cdxsys- SerializerInvclk]). Jack W reviewed this and suggests that there is there is no need for the xclk input because invclk can be derived from clk1. The separate xclk input was created under the erroneous assumption that the clock input could not serve any other purpose. Jack explains that the clock input pin enters the global routing array and is available like any other input.
The xclk input was deleted and the definition of invclk changed to the following:
not( invclk, clk1 );
The design recompiled without complaint (at least not related to this change). It was loaded into the PLD and tested. It worked.
The X motor is large and has unusually high torque. Architect is able to drive it at 1200 steps per second with some margin, using microstepping. We have been attempting to drive it using a new IC, L6207, from STMicrodevices. The test circuit only provides full stepping. At 36 volts, we have been able to accelerate up to 1300 steps per second but at 24 volts we lose control above 700 steps per second. There are two versions of the motor, a large one ("black") and a smaller one ("red") that uses rare-earth magnets to reduce the size. We have tested both motors at 24 volts both unloaded and driving the loader. In all cases, speed limits are identical, i.e. the mechanical load makes no difference; the motor itself limits the speed.
We have monitored the drive circuit's current sense voltages (one for each of the two windings) and found that, at 700 steps per second, the maximum instantaneous current achieved before the end of a step is 2.3 amps. The circuit is designed to limit current at 2.8 amps and has been observed limiting at 2.6 amps (at 300 steps per second). In other words, the speed limit is not caused by the inability of the circuit to deliver sufficient current but by the slow current rise into the motor.
Current rise is determined by drive voltage, winding inductance, motor-generator back EMF, and source impedance. Thus, it makes sense that increasing the drive voltage would increase the maximum step rate. The source impedance is composed of the current sense resistor in series with the Rdson of two DMOS transistors (upper and lower bridge elements). STMicro claims a typical Rdson of .3 ohm. Since they claim this for all of their motor driver ICs, some of which have been in production for several years, we have no reason to doubt this figure. The sense resistors were .3 ohm, yielding a total source impedance of .9 ohm. To determine whether this was a factor in limiting current rise, the sense resistors were reduced to .1 ohm, reducing the total to .7 ohm. No difference in the maximum step rate was observed.
Don Walker observed that the current traces (recorded by a digital scope) show current rising and then falling before rising again within each step, i.e. that the current does not rise monotonically. Back EMF is the only current limiting element that is capable of a non-monotonic effect. The most plausible explanation for this phenomenon is that the current is being limited by back EMF and the rotor is resonating.
Don suggested that we run the motor continuously while varying the speed to find resonant points. This was done, but resonance was not apparent at any speed. However, slowly increasing the speed did reveal that we could get the motor speed up to 1300 steps per second (even at only 24 volts). This is consistent with other observations. The ability to achieve a step change is directly related to the torque, which is directly related to the current, which has been observed to decrease at higher step rates. For any given torque reducing the size of step changes decreases the chance of step loss.
While testing did not reveal any significant influence of resonance on the maximum achievable step rate, it did suggest that the ramps being used were too steep at the high end. The 700 steps per second speed was being realized by an up ramp defined as 50 to 700 linear 10%. The ramp rate can be as shallow as 1% but, as explained in Software Implementation Report 62 topic Rampdef Message [reports- RampdefMessage] the command message protocol limits the number of steps in a ramp to 118, limiting the range-rate combination. Thus, the shallowest ramp possible starting at 50 and ending at 700 steps per second is 3%. The shallowest ramp from 50 to 1200 is, by coincidence (there is a quantum effect-- 50 to 700 at 2% requires 133 steps while 50 to 1200 at 2% requires 160 steps) also 3%. An up ramp of 200 to 1200 linear 2% was tested. The motor consistently achieved the end rate. The largest range possible with a 1% ramp would be 375 to 1200. This was not tested. 200 to 1300 at a 2% grade was tested. It was not reliable.
The CDNext stepper control system was originally developed for slower mechanisms, such as syringes and peristaltic pumps. For example, one of our more common ramps is 50 to 200 at a 20% grade. Most ramps contain fewer than 10 steps. For these cases, 118 steps did not seem to be too restrictive. We may want to revisit this decision even though it would not be trivial to increase the number, as more steps can be accommodated only by allowing a ramp segment to be split across multiple messages. However, without increasing the number of steps we can still increase the top end step rate with a shallow grade by starting at faster rates. The X motor seemed to have no problem starting at 200 steps per second. The 50 steps per second we often use has been selected only for historical reasons. We have not generally investigated faster starting points. For a test, the RSH transport unit test script was modified to start both the upward and downward ramps at 200, respectively 200 to 700 @ 50% and 200 to 900 @ 50%. With both the standard and double load, there was no step loss. The mechanism was noticeably more quiet during the up and down ramps.
We may want to reconsider the motor that has been selected for X, the K31HRLK-LNK-NS-00 from Pacific Scientific. As Don Walker has pointed out, this motor has very high torque but only at low speeds. A motor with less torque at low speed but that allows a faster current rise time (i.e. less L and more I) might provide better performance at our operating speeds. Such a motor might also cost less, as the very high torque motor is unusual. PacSci describes its POWERPAC motors as having the "highest torques per frame size in the industry." We may be paying for a feature that we aren't really using. A conventional motor, like PacSci's E31NXLA-LXX-XX-00, that has substantially lower holding torque (349 oz-in vs. 845 oz-in) might provide greater usable torque in our application. Comparing the two motor's drive characteristics, the rated current/phase of the K-type motor is 3.3 amp compared to 4.1 amp of the E-type while its phase inductance is 8.3 mH compared to 4.9 mH. It should also be noted that Architect may have less trouble getting the motor up to speed because it uses microstepping, which reduces the current change at each step but also sacrifices 30% of the motor's torque. If we can use full stepping, we automatically recover that lost torque.
3/8/02
The FPGA (Altera EPF10K30RC208-3) on the new MSM (version 2) was being successfully configured by the serial EPROM with FPGA version 7 (MSM7). The CPU was unable to configure the FPGA from the same image stored in flash. Chris Steidle identified a problem in a missing wire between the CPU pin 139, port A7, and DCLK, the clock used for programming the FPGA. After adding the wire, the CPU was sometimes successful but not always.
In most cases when the CPU fails to program the FPGA, pressing the reset button so that the CPU can try again fixes the problem. Also, the FPGA bail program in flash memory seems to always work. This program is invoked under a debugger (we usually use ICD32). Significantly, a long time after applying power we invoke the FPGA programmer. Both of these situations suggest that if the CPU were to wait longer after power is applied before programming the FPGA, it would be successful. However, I tested this hypothesis by adding a half-second delay before calling the programmer. It had no effect.
In the previous MSM version the CPU always succeeds in configuring the FPGA. Occasionally the APU does not succeed in booting up from power-on. Resetting the CPU always corrects the problem so I assumed that the power-on reset circuit was faulty. However, the APU may be experiencing the same FPGA programming problem as the new MSM but not as severely. After adding the missing wire, there is no apparent difference regarding the CPU-FPGA interface between revision 2 and revision 1 of the MSM. The date/manufacturer code on the FPGA differs. In MSM revision 1, the code is U DCA560019A while in revision 2 it is U DCA560137A. Altera claims that the two have no functional differences. If most users of these devices use a serial configurator instead of programming by CPU, it is possible that the latter is not reliable and may be sensitive to fabrication differences. Altera doesn't own its fabs and may not be able to control minor variations that could impact a marginal design.
An alternative hypothesis to the suggestion that the CPU is starting to program the device too soon after power-on is that the device gives the CPU faulty status information, indicating, for example, that it is ready to be programmed when, in fact, it is not or that it has not finished programming itself when it actually is done. Rather than trying to distinguish between the various status errors, I modified the FPGA loader program to repeat the entire process if the device doesn't say that it is done after a reasonable delay. The modified code is as follows:
h:\cdx\an\msm\fpga\fpgaldr.asm
LoadFpga: MOVE.L A1,A3 ; Save in case we have to repeat.
CMP #0,A1
BNE.B haveSrcPtr
LEA fpgadatasize(PC),A1 ; Source pointer.
...
; Wait for FPGA to release Done, indicating configuration done.
MOVE.L A3,A1 ; Restore source pointer in case we need to repeat.
MOVE.W #0,D2
waitDoneHi:
BTST #DONEBIT,(A0)
DBEQ D2,waitDoneHi
CMP.W #0,D2
BEQ LoadFpga ; Try again.
Given this program (see System Design Report 25 [Msm Configuration Methods And Tools]) the CPU is able to configure the FPGA every time. However, it can take as long as 30 seconds compared to the normal two seconds, indicating that the program has to try as many as 15 times before the FPGA is finally configured. This is unacceptably long. Unless someone can find a relevant difference between revisions 1 and 2 of the MSM, I will conclude that the fault lies in some characteristic of the device that Altera doesn't understand, in which case we will have no choice but to only configure with the serial EPROM. The EPROM is more convenient for software development but not for manufacturing, which will be required to program two devices, flash and EPROM, instead of only one.
PREPARATION OF NEW ANALYZER BOARDS
MSM VERSION 2
RPM VERSION 3
RPMS1 = 248
RPMS2 = +1
RPMS3 = +1
RPMS4 = +1
RPMS5 = +1
RPMS6 = +1
RPMS7 = +1
RPMS8 = +1
SENSE_STROBE = FluidSense ; Device name for initialize command
STROBE = FluidStrobe ; Strobe bit.
ACTIVATE = 1 ; Strobe activation 1 = 0->1
REFRESH = 0.1 ; Viability of a reading in seconds.
FluidSens1 = 240 0 = Fluid 1 = NoFluid
FluidSens2 = +1 0 = Fluid 1 = NoFluid
FluidSens3 = +1 0 = Fluid 1 = NoFluid
FluidSens4 = +1 0 = Fluid 1 = NoFluid
FluidSens5 = +1 0 = Fluid 1 = NoFluid
FluidSens6 = +1 0 = Fluid 1 = NoFluid
FluidSens7 = +1 0 = Fluid 1 = NoFluid
FluidSens8 = +1 0 = Fluid 1 = NoFluid
RPMLPM.F
define delay wait for 0.1
begin RpmLpm unit APU
echo "Begin RpmLpm"
write 510000h 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
close RPMV11
delay
...
close RPMV18
delay
close RPMV21
delay
...
close RPMV34
delay
close RPMV36
delay
close RPMV37
delay
close RPMV38
delay
close RPMV41
delay
close LPMV1
delay
...
close LPMV32
delay
end
LPM VERSION 3
ACTUATORS
UNIT = APU
SPACE = APU_ACTS
MULTIBIT = BYTE
LPMV1 = 65
LPMV2 = +2
LPMV3 = +2
...
LPMV32 = +2
If the LPM is connected directly to the APU then define LPMV1 address as 0. All subsequent actuator addresses are relative and are automatically offset appropriately.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 32
12/22/02
REQUIREMENT
During development of the RSH (Random Sample Handler) we have experimented with different X-axis motors and mechanisms. A motor control system based on a commercial product (Oregon Microsystems) is able to drive the robot faster than the MSM. The RSH is intended for use in two products, a clinchem/IA instrument, which will use the Oregon Microsystems-derived controller, and CDNext, which will use the MSM (or a board derived from the same technology). The MSM has been unable to drive any of the motors tested in the X-axis at the speeds required by the clinchem/AI instruments. The sample handling speed requirements of CDNext are not yet completely determined and may vary depending on how much complexity we can tolerate in flow scripts. For example, a series of mechanical operations that depends on faster X-axis movement might be replaced by interleaved operations requiring only slower movement. Nevertheless, we need to determine the cause of the performance difference and to develop at least general ideas for improving the MSM's performance should that become necessary. We will immediately implement any improvements that have no associated costs.
ANALYSIS
An obvious difference between the two controllers is that the commercial system affords fast microstepping at high power, whereas the MSM can microstep only relatively slowly and at low power. The commercial system was able to drive the X-axis considerably faster with a smaller motor than with a larger motor with more torque. Testing with the MSM showed only minor differences between the top speeds of several motors with significantly different torque ratings. Further, the attainable speed depended on the X-axis position of the robot and the temperature of the unit. When the unit was cold, the top speed was reduced. The motor is driven by a MOSFET bridge, which delivers more current at lower temperatures, and there is no lubrication, which might become more viscous at lower temperatures. Therefore, the most likely explanation is that the top speed was limited by mechanical resonance, which changed with temperature. This could explain why the commercial system was unable to get better performance from the more powerful motor. Its stronger permanent magnet detents cause less uniform microstepping torque and, therefore, greater sensitivity to resonance.
The design of the robot invites resonance problems. The stiff rods that it rides on are suspended at their ends like violin strings. The motor and drive gear are cantilevered out significantly beyond the upper rod middle bushing. Acceleration will tilt the robot, essentially plucking the rod. To test the hypothesis that this was a major source of problems, the middle bushing was removed and replaced by a Teflon slider located at the right end of the robot. The top speed increased by 25%; motors with higher torque ratings had higher top speeds; and the attainable speeds were no longer sensitive to robot location or temperature.
While the mechanical design clearly could be improved, testing also revealed shortcomings of the MSM. The most obvious is that, in not providing higher power microstepping, the MSM depends on proper mechanical design. It is difficult to predict resonance problems. Since mechanical iterations are both time-consuming and expensive, it may be more cost-effective to solve resonance problems in the control system rather than in the mechanical design. Unfortunately, the cost of microstepping is higher than simply the cost of the driver circuit. The MSM's CPU has to calculate every step of every motor. To move a certain distance with divide-by-8 (the MSM's microstepping circuit only supports divide-by-8, 4, and 2) microstepping requires eight times as many steps as with whole-stepping.
Analysis revealed a problem with the MSM's ramping capability. The MSM's smallest step is approximately 32 usec, which might imply that a motor could be driven at 31,250 steps per second. However, a motor can't be accelerated to anywhere near this rate because the 32-usec minimum step also sets the minimum step change during acceleration. To reach 31,250 steps per second, a motor's last up ramp step change would have to double the motor's speed. At high speeds, steppers have very reduced torque. A more achievable step change might be 1%, suggesting a top attainable speed of approximately 3000 steps per second. If microstepping by 8, the top whole-step equivalent speed might be as low as 375 steps per second. Microstepping might allow a larger step change during acceleration that could increase this somewhat but not enough to compensate for the reduction in slewing speed.
DESIGN ALTERNATIVES
SCAN CHAIN
As explained in Cdnext Stepper Motor Control topic Architecture- I/O Hardware- Output Scan Chain [motor.doc] the MSM's motor scan chain contains 88 bits, comprising four control bits for each of 20 motors plus one byte for the scan chain test register. The scan chain shifts at 4 MHz, so all 88 bits are shifted in 22 usec. A parallel load period is required. A 10-usec period is used for the motor scan chain. Thus, entire shift cycle consumes 32 usec, which establishes the minimum step size.
The 10-usec parallel load period is much longer than necessary. It was selected to produce a 32-usec cycle to match the MPM motor controller used in previous instruments. Even the four-usec period used for the I/O chain is much longer than the standard hardware needs and exists only to provide smart I/O devices, like the VPM, with a long code processing period (e.g. for an ISR triggered by the parallel load strobe). For the motor scan chain this could be reduced to much less than one usec. If a 250-ns period were used, the minimum step size would be reduced to approximately 22 usec, potentially increasing the maximum step rate from 3000 to 4400.
The 26C31/32 differential transmitter/receivers implement the RS422 electrical standard, which is designed to support a maximum signaling rate of 10 Mbps at 40 feet. We have been using 4 Mbps because this rate is more forgiving of poor connections, such as missing terminators, even though all of our scan chains are less than 20 feet long. We should be able to double the rate to 8 Mbps. With this change (and a 250 nsec parallel load period) the minimum step size would be approximately 11 usec and the maximum step rate 8800. These changes would require a detailed analysis of signal skew in the current hardware and FPGA firmware implementation. System Design Report 10 topic MSM [cdxsys- ScanChainSkew] discusses some of the issues involved in skew but does not fully clarify the FPGA design or consider the effect of skew with an 8 MHz shift clock.
The actual increase in top motor speeds that can be attained due to these two changes in the scan chain would not be as dramatic as the simple calculations suggest, primarily because as the motors are driven faster, the step change as a percentage of the step size must continue to decrease. At the faster rates, it is likely that a 1% change would be too large. We might be able to double the top speed but not triple it as the simple calculations suggest.
The scan chain signaling rate could be radically increased by using LVDS (Low Voltage Differential Signaling) which supports rates up to 655 Bbps. The maximum clock rate of the 74HC595 output registers is 25 MHz. Adding margins for clock/data skew, the maximum safe rate is approximately 22 MHz, yielding a maximum (ideal) step rate of 24,000. For higher rates, we could implement the shift registers in a PLD, such as Xilinx XC9536, which supports frequencies up to 100 MHz. At this rate, the maximum step rate would be approximately 100,000. Again, these are ideals that cannot be realized in practice. However, the 60,000 microsteps/second rates that the commercial stepper controller is capable of delivering could theoretically be achieved by the MSM, even while continuing to support 20 motors, using an LVDS scan chain.
MOTOR CONTROL
In the current MSM design, motor steps are described as events in a page of 256 32-usec periods. The page represents 8.192 msec of motor activity. The FPGA contains two of these (event count) pages and presents them alternately to the CPU. It interrupts the CPU every 8.192 msec to prepare the next page. If these page sizes were to be retained while the scan chain cycle were reduced from 32 usec to 11 usec, the interrupt period would be reduced to 2.8 msec. With the 4-usec cycle that could be achieved with 74HC595 registers and LVDS signaling, the interrupt period would be reduced to one msec. At the 220 nsec scan chain cycle (LVDS with XC9536) the interrupt period would be 56 usec. The 25-MHz MC68340 CPU has a motor interrupt overhead of 7 usec (60% of this is due to register stacking and unstacking for the C-language ISR). Thus, ISR overhead alone consumes .08% of the 8.192 msec period, .25% of the 2.8 msec period, .7% of the 1 msec period, and 12.5% of the 56 usec period. The ISR visits each motor (descriptor) one more times than the motor has steps on the page being prepared, i.e. each motor is visited at least once whether it has any steps or not.
The exact amount of time spent on by the ISR on step processing depends on many factors but, in all cases, constitutes considerably more work for the CPU than the interrupt overhead. The interrupt rate could be kept constant for all of the alternative scan chain cycle rates by increasing the size of the pages. For example, the 8.192 msec rate could be achieved with an 11-usec scan chain cycle by using a 745-event page. This would prevent increasing scan chain rates from increasing the percentage of ISR time spent on interrupt overhead and visiting motors that have no steps in the page. However, the page size has no impact on the amount of work the ISR must do for each step. While increasing the page size decreases the interrupt rate, it simultaneously increases the number of steps that the page is likely to contain. The ISR's load difference between a 745-event page and a 256-event page is probably less than 5%. Thus, speeding up the existing scan chain circuitry to produce an 11-usec step period could be achieved by minor changes in the FPGA with little impact on ISR performance. Enlarging the page would slightly improve performance. For the two LVDS alternatives, the 1 msec period might impose as much as 10% additional ISR burden, which should probably be mitigated by a larger page, while the CPU cannot keep up with the 56 usec period under any scheme.
The rate at which motor step changes can be delivered to the drivers sets an absolute maximum stepping rate. The actual performance that can be realized depends on how much simultaneous activity is expected of the CPU, which not only controls the motors but also executes scripts and communicates with its master as well as the barcode reader. Even with a 32-usec step rate limit, the CPU probably can't drive all 20 motors at the fastest rate in addition to doing its other jobs. However, the purpose of increasing the step rate is not to drive (most) motors faster than the current maximum but to improve ramp resolution. This implies that we want to drive at least one motor faster but even doubling the slew rate of any one motor (e.g. the X-axis) only slightly increases the overall CPU workload.
CPU ALTERNATIVES
The MSM was designed to replace two MPM and two SHM units. We know that it can do at least the work of the units it replaces because of its faster CPU speed, reduced communication overhead (due to running scripts and motor operations locally instead of as master-slave processes), and more efficient control algorithms (particularly in the motor ISR). We do not yet know whether it has the power to control the RSH, which demands significantly greater low-level motor control and higher level processing intelligence (presumably implemented in scripts but possibly with C-language support). There are several ways that the greater demands of the RSH might be met, including:
REVIEW AND SUGGESTIONS
MINIMAL IMPACT PLAN
The RSH robot mechanical design is demonstrably wrong and should be corrected. If the motor is placed between the two bushings instead of cantilevered and 36 volts is used for motor drive, the current MSM can consistently accelerate the robot (on the X-axis) up to 1600 steps per second. For production reliability this would be reduced to perhaps 1200, which would move the robot the full length of the X-axis in less than two seconds. This change should be implemented under any overall plan.
The motor scan chain cycle time should be reduced. The parallel load period could be reduced from 10 usec to 1 usec without any analysis, reducing the minimum step size by 28%. A detailed analysis of skew would not be much trouble and might reduce the minimum step size to 11 usec at no implementation cost and only a slight increase in the CPU's workload. This change should allow reliable X-axis traversal in less than 1.5 seconds.
Even with a two-second X-axis traversal it is likely that we can meet our requirements for the RSH. Reducing this to less than 1.5 seconds may simplify script design.
MAXIMAL UTILITY PLAN
The MSM has been proposed as a universal building block that could be used in a variety of systems, including lab automation systems. However, comparisons with the commercial stepper controller reveal that its motor control capabilities may be insufficient for demanding mechanical systems. Even if improper mechanical design is partly or wholly to blame, some problems, such as resonance, are very difficult to anticipate and potentially expensive to fix by mechanical design and redesign. The MSM clearly can perform the job for which it was originally intended and probably can control the RSH as well, but to be a truly universal building block it needs to provide higher stepping rates (and/or resolution) than can be achieved in its current design.
The only means of achieving step rates similar to the commercial system is to move the step generating function from the CPU's ISR to the FPGA and to use an LVDS scan chain or none at all (direct FPGA to motor). A less aggressive approach would be to only move the step generator but continue to use an RS422 scan chain albeit at the increased cycle rate. This would ensure the MSM CPU sufficient bandwidth for processing time-critical scripts while also providing adequate motor ramping capability for all of our current mechanical requirements. It is possible that this could even be implemented in the current MSM hardware. The FPGA already has bus-master capability, which provides it with a means to read tables and commands set up for it by the CPU and the CPU has a means to write direct to the FPGA. The FPGA resources now used to implement event count pages could instead be used as control registers and step counters. Ramps could continue to be stored in shared RAM. Instead of (or in addition to) storing ramp pointers in each motor's descriptor in memory, the CPU could program the FPGA's control registers with this information.
INTERMEDIATE PLANS
Of all of the alternatives described above, only the option to place an actual MSM in the RSH should be eliminated without further consideration. Locating in the RSH an IML unit derived from the MSM but including all necessary I/O is viable. The only downsides to this are 1) cabling is not compatible with the Sequential Sample Handler and 2) CPU, FPGA, RAM, and flash memory add approximately 60 dollars to the cost of the board. The upsides are 1) better chance that both the RSH and MSM CPUs will perform adequately and 2) opportunity for local optimization, such as a dedicated X-axis motor controller coprocessor.
Replacing the MSM's 68340 with a faster CPU requires much hardware and software redesign, but we may eventually have to do this anyway due to availability concerns. However, Motorola has not indicated that the 68340 is due to be obsoleted and we shouldn't be redesigning based on vague concerns. Eventually, every part we use will be obsolete.
Rewriting the ISR in assembler would not be a particularly difficult task but it should probably not be undertaken unless we have ruled out replacing the 68340 in the near future. Doubling (assuming that conversion to ASM could do this) the speed of the ISR would not improve motor control but would reduce its consumption of processing bandwidth, affording the CPU greater opportunity to process scripts. Assuming that in the worst motor processing hot spots (e.g. simultaneously slewing X, theta, vertical, belt, and mixer) the ISR consumes 75% of the CPU bandwidth, doubling the speed of the ISR would reduce the bandwidth consumption by 37%. This is approximately the same bandwidth gained by using a dedicated RSH CPU, which costs us 60 dollars, but which also affords an opportunity for other means of improving bandwidth.
REVIEW
At this point, with no changes whatsoever, we don't know that there is anything wrong with our original plan to use the one MSM to control the RSH as well as all other steppers (peristaltic pumps and syringes) and to manage sample handling. We engaged this review only because the commercial motor controller was able to move the robot (on the X-axis) significantly faster than the MSM and we needed to know why. Our sharing of the RSH with a clinchem instrument doesn't mean that we have to share that instrument's speed requirements. The RSH robot design should be corrected, however. Experiments with the commercial motor controller show the lower torque motor moving the robot faster than the higher torque motor at speeds well below the maximum of either motor, proving that the mechanical design is wrong. Our experiments with a modified robot show that we will be able to move a corrected robot the full length of the X-axis in less than two seconds, which is adequate for our application.
The question that we can't answer at this point is whether the MSM CPU will have adequate bandwidth to do all of the jobs assigned to it, including controlling the RSH motors. We won't be able to answer this until we design and run scripts for the new instrument, including RSH. The simplest approach would be to continue with the original plan and see if CPU bandwidth is adequate. If the application proves marginally overtaxing, converting the motor ISR to assembler would probably provide sufficient correction.
Somewhat more severe problems may be corrected by a dedicated RSH CPU, perhaps in combination with the assembly language ISR. The problem with this correction is that we can't delay making a decision about the architecture of the RSH controller. The RSH must contain either a dumb I/O board (for control by the MSM) or its own intelligent controller including I/O. Changing between these two architectures is less difficult than one might think. The I/O could be the same for both and the scripts would be similar, although there would be some differences due to the need to coordinate the MSM and RSH controllers. Nevertheless, changing would require a new design and PCB iteration.
The major impetus toward making the RSH control-I/O board intelligent is that doing so initially may avoid a redesign and PCB iteration. Considerations suggesting initially making a dumb board are:
Given these considerations, it seems reasonable to continue with the original plan and initially make a dumb RSH I/O board controlled by the MSM.
Moving step generation from the ISR to the FPGA uniquely solves all problems with no downside other than the fact that we don't know if it can be done. If it can be done, the RSH control intelligence question is answered. The MSM would certainly have the bandwidth to control the RSH. For this reason alone, we should investigate this further. Additionally, this opens a path to the ideal global solution, a distributed motor controller with the performance of the commercial system.
PROPOSED PLAN
CDNEXT ANALYZER SYSTEM DESIGN REPORT 33
4/2/03
[Report 32] [Report 34]In MSM version 2, CPU Tout1 connects to U28 26C32 input pin 15 but the corresponding output pins 13 and 14 are not connected. The Tout1 connection is a vestige from an early approach to providing a local IML clock using jumpers. A local clock is needed when the IML unit connected directly to the PC uses a link other than HSL, for example ECP. When a HSL link is used, the PC's HSSL card provides the clock for all communication between IML units.
The old jumper-based approach has been superceded by an automated one in the APU. Now, if the communication link that the APU establishes with its master (PC or another IML unit) is not HSL (or if the HSL clock is not present) then software enables a local IML clock driver. In APU4, the local IML clock is provided by the CPU's TOUT2, which connects to one of four inputs of U87 26C32. The corresponding outputs connect to JSLAVE and JMASTER pins 9 and 10. None of the other three sections of U87 are used. Software enables the driver using CPU PA2 (pin 132), net signal TOUT2_EN (this connection was inadvertently omitted in early versions of the APU4 schematic).
With the APU able to provide a local IML clock, the MSM was relieved of this responsibility, because only one unit needs to do this. In fact, it would be difficult to provide the local clock without jumpers if any unit other than the first one in the IML chain were to do it because the local clock driver should not be enabled until master communication has been established, in order to determine whether the local clock is needed, but IML communication can't proceed without the clock.
Although the MSM may serve as the first unit in the IML chain, since it has not had a slave port, it unquestionably has not needed to support a local clock capability similar to the APU's. The new MSM version 3 will have a slave port, which brings up the question of whether it should support a local clock. This capability is clearly not needed for any analyzer instrument where, without a gather function, the MSM cannot serve as the first IML unit. However, the MSM has been conceived as a fairly general-purpose unit to be used in systems other than blood analyzers. In these other systems, an MSM might easily be the first IML unit in a chain and need to provide the local IML clock. This capability would go unused in an analyzer, but it wouldn't hurt anything, since the driver would never be enabled.
Adding the local IML clock driver capability to the MSM would require adding one driver IC 26C32. All of the CPU facilities needed to exactly duplicate the APU's local clock capability are available in the MSM. Neither of the CPU's timer outputs, T2OUT and T1OUT, are used for anything currently. Its PA2 is attached to the NDSCHMWT net, which doesn't connect to anything. Whether to add the capability is a difficult question. On the one hand, it is easy and cheap but, on the other, it is not terribly important.
Easy to answer is whether to delete the Tout1 connection to U87. Since that connection serves no purpose, it should be eliminated. Since Tout1 is not under consideration for any function, it should, like all unused CPU I/O pins, be routed to a pad with a hole to facilitate any experimental use we might have for it.
MSM FPGA CONFIGURATION OPTIONS
To reduce manufacturing cost, we will use the FPGA serial flash/EPROM configurator only during development and switch to configuration by the CPU for production. All APU versions prior to 4 use this method exclusively without problem. The MSM affords three options, but its CPU-based programming of the Altera EPF10K30 has not been very reliable. A Xilinx part will be used in the new version and we will have to rewrite the programming code for it. We hope that it will respond better to this method than the Altera part but if it doesn't we will have to manufacture with a serial configurator. In any case, we will always want to be able to use a configurator during development because it is faster and more reliable when working on software.
EPROM configurators are significantly cheaper than flash parts but require a socket, whereas the flash parts can be surface mount. Similarly to the Altera Bit Blaster, the Xilinx part can be programmed via a JTAG interface. If the JTAG is used for development of the FPGA itself then the only advantage of a flash configurator over EPROM is that if programming by the CPU (from an FPGA image stored in flash) is not functioning then the flash configurator can be easily reprogrammed without a specialized programmer. However, for development, we could provide this same convenience by making an adapter to let the flash configurator emulate the EPROM part. The adapter would take the flash configurator's signals that are the same as the EPROM to a set of pins matching the EPROM's footprint. The JTAG programming signals would be taken to a separate connector to which a programming cable would attach. Consequently, there is no need to put any effort into supporting the flash configurator in the circuit. We should design for the EPROM and it should be socketed. If we decide to use the configurator in production, we should leave the part socketed to support upgrading. An 8-pin DIP socket is preferred for this. The pads for a surface mount EPROM could also be included to increase our options if this is easy.
DUART, BARCODE READER, RJ45 CONNECTOR
The new DUART will provide two serial comm channels, one or both of which will be used to communicate with a bar code reader. Previously, an RJ45 connector was used for this. The RJ45 was selected when it was not clear how the MSM would be connected to the loader. Questions still remain about the possibility of supporting loader options other than the RSH. However, for the RSH, the BCR connections are now clear. There is no direct connection from the BCR, now located on the robot, to the MSM. The BCR will connect to the RSH I/O board through the flat flex cable. The RSH I/O board will have a DB25 or DB37 connector. The MSM's 10-pin JLDRIO and 14-pin JLDRMTR connect to this through either separate ribbon cables or a unified harness.
Since we have already made the decision to separate the I/O and motor groups, to be consistent we should keep the BCR cable separate from the others. This is especially true for the second channel, which currently appears to not be needed for a BCR (a single BCR now moves between two positions) but might be used for a different purpose. In MSM version 2, the RJ45 is not in a good location relative to JLDRIO and JLDRMTR, since all of these connect to the same cable bundle. A better arrangement would be to replace the RJ45 with a 10-pin box-type connector located near JLDRIO and JLDRMTR. The connectors could be placed and the cable made in such a way that it would be very difficult to accidentally exchange JLDRIO and the BCR connector. What to do with the second channel is more challenging. We may find a use for it in some loader arrangement, in which case we would want it located near the others. However, if we end up using it for some other purpose, locating it near the others would just increase the cable congestion around this area of the board. A reasonable compromise might be to locate it near the others but on the periphery of the group so that its cable could conveniently either merge with the loader bundle or remain separate.
The new FPGA is a 3.3V part with Vout high of 2.4V. This provides adequate margin for the 26C32 and 26C31 parts, which require 2.0V. It is not high enough for the ULN2003 used for GS2Release and DoorRelease. These are not really digital parts but transistor arrays. To keep the output transistor saturated (2V CEsat) the input must be 2.4V at Ic = 200mA and higher for greater currents (e.g. 3V at Ic = 300mA). The SSCHOUT signals that drive these originate in the FPGA's internal scan chain. Thus, the FPGA serves as a scan chain output register, like the HC595, in this function. We could replace this function with an external register. However, the HC595 suffers the same control signal problem as the UNL2003, expecting a minimum Vin high of 3.3V (3.7V with margin). Although an HCT595 would be compatible with the FPGA, it would not provide high enough outputs for the ULN2003. The most feasible solutions are to either use external parts to level-shift the FPGA's outputs or to replace the ULN2003 with either a similar part with lower input voltage requirements or a serial input-parallel output driver with lower input voltage requirements. Texas Instruments makes such parts but they may not have second sources and are relatively expensive (all solutions cost much less than a 5V FPGA).
The L298 spinner motor driver requires 2.3V input high. The .1V difference between this and the FPGA output high doesn't afford sufficient margin. This situation can be corrected in the same manner as for the ULN2003 inputs.
The motor chopper driver ICs 3717/18 have 2V logic high inputs, which can be driven directly by the new FPGA.
MSM version 2 contains a spare 8-bit parallel output port, JSPO, and a spare parallel input port, JSPI. The clock, load, and serial data inputs of the HC595 and HC597 shift registers used for these are not compatible with the FPGA's outputs. Since we aren't using these, they can be eliminated. MSM version 2 also provides one two-bit spare output on J27. This is driven by U38 ULN2003, which is driven by the FPGA. This also may be eliminated. If we have any spare level-shifters we should reinstate as many spare outputs as we can since these don't cost much and may prove useful during development.
The JXBUS affords a general interface to the CPU in most respects. An adapter module that can function as a bus master will have no trouble controlling CPU bus signals. However, some master communication adapters (e.g. ECP) require the use of the CPU's DMA channel 2, which is otherwise used by the CPU's UART-A for master communication. The CPU provides no internal connection capability between TXRDYA and DREQ2, so this needs to be made by a wire. If a direct wire is used, the adapter module will not be able to use the DMA channel. TXRDYA and DREQ2 could both be routed to JXBUS so that the adapter could determine whether the two are connected, but this necessitates a jumper when the adapter is not plugged in.
One alternative to routing both TXRDY and DREQ2 to JXBUS and jumping the two when no adapter is present is a wired-OR arrangement in which DREQ2 has a passive pull up connected to an open-collector buffer driven by TXRDYA on the MSM and (possibly) a similar device on any adapter that wants to use DMA2. Only DREQ2 would be routed to JXBUS. The CPU would not enable DMA2 until the link has been discovered and can disable UART-A if this is discovered to not be the link. In this case, the UART would not assert TXRDYA and the adapter would have exclusive control. If an adapter board were not installed, the only pull up for DREQ2 would be the resistor on the MSM. Therefore, this resistor must source enough current to meet the timing requirements of the DREQ signal.
Adapter modules that don't need the CPU's DMA can ignore DREQ2. Those that do need the DMA have several alternatives for driving DREQ2. In no case do they need to pull up the signal because, as mentioned, the MSM must provide adequate pull up when the adapter is not attached. If we were willing to accept that whenever an adapter is attached, the native master HSL interface could not be used, then the adapter could actively drive DREQ2 (both high and low) at all times, because the CPU would not enable its UART to pull the signal down. However, such a restriction would complicate development. It would be much better for the adapter to drive DREQ2 with an open-collector or tri-state device, which the CPU can disable in order to allow the UART to use the DMA channel. If a tri-state driver is used, it does not have to mimic an open-collector driver, i.e. logic high = tri-state, because there would be no conflict with the UART over control of DREQ2. Active high driver might be desirable to improve the rise time, which may be degraded somewhat by the extra wiring and XBUS connector.
The CPU's TXRDYA is a permanent output (whether the pin is configured to serve this function or general I/O-- OP6). It is one of the few CPU signals that are not even tri-stated when the CPU is reset. Therefore, it cannot be connected directly to DREQ2. TXRDYA and DREQ2 are designed to be functionally compatible. TXRDYA (output) low indicates that the UART's transmit buffer is empty, while DREQ2 (input) low tells the DMA to read the next byte (from memory into the UART). There is no way to reprogram the active state of either of these signals. Whatever circuit is used to convert TXRDYA to an open-collector signal must not invert. There are no common non-inverting OC devices, although the old 74LS09 and LS15 gates provide this capability. If two inverter stages were used (with the second providing the OC drive and the first correcting signal polarity) the delay time through the entire signal path, including passive rise time should be compared to DREQ2's requirements. Given the relatively slow operation of the UART, unless there are some CPU microstate issues, there shouldn't be much of a timing problem.
An alternative to the open-collector driver between TXRDYA and DREQ2 would be a tri-state driver implemented in the FPGA or PLD. In this arrangement, the CPU would have to control the buffer just as it would the DREQ2 driver on any adapter that would selectively use the DMA channel. The CPU would discover the link that the master wants to use with either or both DREQ2 drivers disabled and then enable the one for the chosen link.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 34
4/7/03
[Report 33] [Report 35]Chris Steidle
Also see Report 15 [Cdxsys.doc-SystemReset]
The reset circuit on the MSM 2 board is described as follows:
The HSSL interface reset is driven into the pushbutton reset input of the TL7705A circuit. The output of the TL7705A is input to the CPLD.
The CPLD can receive resets from 3 sources:
Pushbutton
BDM CPU port connector
Output of TL7705A
The CPLD outputs resets to the FPGA, CPU and Dual Uart. Resouces in the CPLD are implemented to debounce the pushbutton and the output of the TL7705A inputs using a 7 bit counter.
The reset from the BDM board is not debounced by the counter, but is or'ed with the output of the debounce circuit and driven to an output pin.
I propose making the following change:
MOTOR WINDING TEST CONTROL SIGNALS
See Report 17 [Motor And Solenoid Winding Test].
See Motor Winding Test Circuit by Ed Sewall [Windtest] or [Windtest].
REVIEW BY CHRIS
In [the previous] report you [David] mention that the NDSCHMWT signal is not connected to anything. Actually, it is connected to a 74HC00 on page 16. It looks like it is a reset signal to the motor current detection circuitry. If this signal is not used, I will use the signal to control the IML clock capability. If it is used, will it be ok to use an unused pin on the FPGA to control the IML clock? Will the IML clock control be required before the FPGA is programmed? Is the motor current detection circuit used?
NEW WINDING TEST CIRCUIT
David McCracken
As Chris has pointed out, the MSM's CPU PA2 is not currently available for enabling a local IML clock, as suggested in Report 33. Since NDISCH and this local clock control are both simple outputs, either one could be implemented by an FPGA or PLD pin. However, as a general design rule we want to locate fundamental system signals, such as FPGA configuration and IML communication, on the CPU's port pins in order to have core capabilities available even if application level functions are not functioning due to circuit problems. Following this line of reasoning, PA2 should be used for the local IML clock control and an FPGA pin for NDISCH.
The winding test circuit was tested for the first time recently. It does not work. The LF356 op amps cannot be used without an input offset voltage nulling circuit. Because we only are looking for a difference between the integrator charge time when a winding bridge is driven one way vs. the other, we really don't need a very accurate circuit. Newer op amps than the LF356 provide adequate performance without trimming. The circuit has been redesigned using a TL074 quad op amp, replacing the HC00 (U20) discharge driver and LM311 (U22) comparator with two of the four amplifiers while improving performance of the current-to-time circuit without trimming.
For the new circuit, NDSCHCMP is renamed DSCHCMP and moved from TGATE2 (CPU pin 73) to TGATE1 (CPU pin 30). TGATE1 pullup R99 must be removed. As described in Analyzer Software Implementation Report 76 [Motor Cpu Bandwidth] timer 1 is used as an occasional timer. It could be temporarily reprogrammed to time a period established by the gate signal. Because timer 2 might now be used for a local IML clock, it cannot be used in this manner. The gate input of either timer could be tested as a simple input by a programmatic (instruction cycle) counter but this would require disabling all interrupts, which should be avoided if possible.
Chris Steidle
I have connected the configuration mode pins on the Xilinx to select one of the following 2 modes: Master Serial with pullups 1,0,0 (M2, M1, M0) Slave Serial with pullups 0,1,1 (no jumper installed, default) The jumper is connected to the CPU input, PA3, the Xilinx configuration mode pins M1 and M0 and the inputs of a 74HC00 gate. I will build the board so the Slave serial is the default, allowing the CPU to program the part, no jumper installed. If the jumper is installed, then the configurator must also be installed in the board.
David McCracken
Report 33 [Dreq2 And Xbus Daughter Cards] suggests the possibility of using an FPGA or PLD output controlled by the CPU's TxRdyA to provide a non-inverting pseudo-open collector DMA2 request driver for the HSL master link. The PLD would be preferable to the FPGA because it is permanently programmed and, therefore, more likely to be functioning than the FPGA when there are hardware problems. Since master communication is a core function, we would like it to be as reliable as is feasible.
One potential problem that Report 33 identifies with the wired-OR circuit is that the DMA request signal deassertion rise time will be relatively slow if only a resistor provides the pull up. Chris has suggested an improved implementation where the PLD's DMA request signal output drives high for one clock period after TxRdyA deasserts (goes high). After this period, the output tristates and the signal remains high due to the external (to the PLD) pullup resistor. A similar approach could be used on daughter cards to ensure a fast rise time in all cases.
During the link discovery process, the native HSL link will not use DMA. A daughter card could use the shared DMA channel although this would present an additional software complication that we would prefer to avoid during link discovery (which is rather complicated even if all links are individually simple). Currently, the APU and MSM BIOS programs configure the CPU's OP6/TxRdyA signal to serve as TxRdyA. In this configuration, the signal will assert (low) whenever the master HSL (CPU UARTA) transmit register is empty. This prevents any daughter card from using DMA. Further, depending on a daughter card's DMA request signal implementation, there could be a signal conflict, even if the card doesn't use DMA. If, like the native HSL, the daughter card were to hold or episodically drive the signal low then the signal could conflict with the PLD output every time TxRdyA deasserted, which occurs fairly often during communication. A relatively large amount of current would flow during these conflicts but each event would persist for only one clock period.
It serves no purpose for the BIOS to configure OP6/TxRdyA to serve as TxRdyA and, in fact, it complicates the APP program. The APP program assumes that the BIOS has done nothing except memory and reconfigures all I/O, including the master communication link. But it only configures the link selected during discovery. If this is not the native HSL, the program does not configure UARTA. This means that the OP6/TxRdyA configuration established by the BIOS remains in effect. This will not be acceptable no matter how we implement the wired-OR DMA2 request. The simplest solution is to configure OP6/TxRdyA as simple always high output in the BIOS. This simultaneously eliminates any potential signal conflict, allows the daughter card to use DMA if needed, and relieves the APP program of any responsibility for disabling the HSL DMA2 request.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 35
4/20/03
[Report 34] [Report 36]SCHEMATIC REFERENCE
MSM-2500. SCH 74773-102 rev. A. Distributed by Chris Steidle 4/15/03.
PROBLEMS AND SUGGESTIONS
INTERRUPTS
The barcode reader UART has two interrupt outputs, DUART_INTA and DUART_INTB (sheet 17). These connect to FPGA pins 57 and 58 (sheet 4). The FPGA has two interrupt outputs, SPI_IRQ and MPI_IRQ (pins 138 and 114) which connect to the CPU's IRQ5 (CPU pin 101) and IRQ6 (CPU pin 100) respectively (sheet 1). The disposition of the UART interrupts is not evident.
MPI_IRQ must have the highest priority of all maskable interrupts because the step motor ISR has hard real-time requirements. System Design Report 7 [cdxsys- ScanChainInterrupt] suggests using a scan chain interrupt to indicate the completion of a scan to support sequential outputs, for example to communicate with the VPM. As described in Vpm Requirements [vpm.doc] the controller (APU or MSM) and VPM synchronize sequential command transfers using handshake flags "next command" and "command received" without the support of such an end-of-scan interrupt. Similarly, as described by Implementation Report 59 [Strobed Fluid Sensors] sub-topic Implementation- Script Interpreter, a foreground command function generates the scan chain-based fluid sense strobe sequence, consulting the 5-msec global ticker to ensure that a scan has been completed. These high-level handshake mechanisms operate nearly 150 times slower than the scan chain itself (10 msec / 68 usec). However, both afford adequate performance.
Because of the motor page interrupt's higher priority and the fact that the motor ISR may consume as much as 8 msec, an end-of-scan interrupt cannot guarantee real-time tracking of the scan chain. It would be possible to fold the processing of this interrupt into the motor page ISR with interrupt status flags in an FPGA register (readable by the CPU) indicating the interrupt source. This could not be used to speed up any scan chain transfers sequenced by script command interpreters, which execute in the foreground, because foreground processes are all suspended while the motor page ISR executes. However, an interrupt-driven sequential transfer facility is feasible. Such a facility would execute very quickly, as it only needs to write a block of source data into the FPGA's scan chain output image. Therefore, it would not impinge significantly on the motor page time. But, only one scan could be serviced in a single interrupt unless the motor process were to repeatedly check the end-of-scan interrupt flag, complicating the motor step generator function. Nevertheless, this would be feasible if high-speed sequential scan chain data transfer capability were required.
System Design Report 17 [Scan Chain Integrity Testing] describes the continuous feedback approach to testing the integrity of the I/O and motor scan scan chains. The FPGA is expected to interrupt the CPU upon detecting a failure at least in the I/O chain and preferably in the motor chain as well. Whether the integrity test is realized depends on the scan chain not being fully consumed by application devices and the existence of a feedback register in the end unit. Although this interrupt is important, it does not require a particularly fast response.
The two COM device (DUART) interfaces include transmit and receive enable signals. If the external devices use these then the CPU's response to a UART interrupt does not have to be very fast. If they don't then communication will be reliable only if no message is larger than the internal buffers of the 16C2552 because the motor page ISR will not be interrupted for a COM device. Thus, under any circumstances, the UART interrupts are not high priority.
A reasonable assignment of sources to interrupts is to associate the motor page interrupt with IRQ6 and all other sources to IRQ5. Given this, the FPGA's SPI_IRQ name is misleading. We might simply want to call it IRQ5 but this describes the CPU's relationship to the signal rather than what the FPGA does with it. Associating the name MPI_IRQ with IRQ6 is more appropriate but it too has confusing connotations. This is actually the motor page interrupt, as a motor scan chain error should cause an interrupt on IRQ5 rather than IRQ6. Also, the possibility still remains of merging a high-speed I/O scan-chain sequential data transfer utility with the motor page ISR, in which case, IRQ6 would not even be exclusively associated with the motor scan chain. Names based on our current thinking about how these signals will be used would make the schematic more self-documenting but would also risk causing confusion if we modify the relationships inside the FPGA. The safest naming, precisely because it doesn't tell very much, would be to simply identify the two signals as high- and low-priority, e.g. HP_IRQ and LP_IRQ. Other alternatives are simply IRQ6 and IRQ5 or MOTOR_PAGE_IRQ and SCAN_UART_IRQ.
INSPECTED NO PROBLEMS
DESIGN REFERENCE
APU.SOF/POF/TTF version 1.7B.
GATHER CONTROL REGISTER
The Gather Control Register (500040 for bench A and 500050 for bench B) command structure is inconvenient for programming and doesn't work correctly as it is. For programming, ideally count and gather would be entirely independent of each other since they are independent commands. As explained in Analyzer Software Implementation Report 68 [Separate Count And Gather] FPGA documentation implies that independent control can be achieved by the following:
Count and gather interpreters were developed for previous APUs based on this information. However, the validity of the control mechanism was not fully verified. The same controls have been tested on APU4 and it does not function as implied. The gather command interpreter starts by stopping any gather in progress. It does this by setting the control register's bit 6. If this is done (regardless of bit 7) and then an exact number of pulses are generated using the circuit's test facility, the FPGA's cells counter indicates more pulses than are actually generated. If gather is enabled, the correct number of cells is indicated by list mode data. Only the hardware counter malfunctions. Thus, APU4 cannot count accurately independently of gather.
APU2 was never similarly tested and it is not known whether it also malfunctions. The design is wrong anyway. As explained in Report 68, gather and count control should simply be entirely independent. For example, if b7 = 1 then gather is enabled, regardless of count; if b6 = 1 then count is enabled, regardless of gather. There seems to be no reason to ever disable the trigger mechanism itself but if there is then the trigger can be disabled if both b7 and b6 are 0.
IMAGE LIMITERS
The image limiters, for bench A and bench B, are described as follows: "when the cells counter = limit, stop gather but continue count." This incorrectly ties count and gather together. This register should be more like the count register (BTC = Byte Transfer Count) of the CPU's DMA, which the FPGA replaces. In the case of the CPU DMA, bytes are counted. In the current APU4 FPGA design, four channels are dedicated to each bench and all four are collected at each event even if the gather requires fewer. As long as this is the basic architecture, the gather counter can count cells. The CPU would load the required cell count before starting a gather. The FPGA would decrement the counter and stop gathering when it reaches 0 (the CPU's DMA BTC stops at -1). Even with this 8-to-1 (one cell vs. four times 2 bytes) reduction compared to the DMA BTC, a 16-bit counter is not large enough. Although no scripts currently gather more than 65535 cells, some of the experimental ones have gathered as many as 100,000. Therefore, this counter should comprise at least 17 bits and 18 bits might be better.
Used in this manner, a name such as "gather counter" would be more appropriate than "image limiter".
RAW DATA POINTERS
These now comprise 16 bits. As with the image limiters (gather counters) these should comprise 17 or 18 bits.
CELLS COUNTERS
The two cell counters, for bench A and bench B, both comprise 20 bits. They don't need to be this long because the count sampler designed for APU2 is equally effective with 16 bits. As long as the count doesn't exceed 64,535 in one sampling period (typically .5 second) an unsigned subtraction automatically compensates for rollover.
ELECTRONIC CELLS GENERATOR CONTROL REGISTER
This is described as having only one function, normal input vs. simulated event, which should require only a single control bit. However, a full byte register is implemented with a full byte control word, 0x5A. This wastes FPGA resources.
INTERRUPT
The FPGA should assert an interrupt (IRQ6) when either gather counter reaches 0. It would be feasible for software to determine which of the benches has caused the interrupt by examining both cell counters, but it would be easier if the FPGA would set bits in an interrupt status register for the ISR to read. Such a register will be needed anyway when scan chain interrupts (specifically integrity test failure) are implemented, since the FPGA is allotted only one IRQ. Since a variety of independent processes have to share this one IRQ, it would be helpful if every source could be independently enabled by a bit-mapped IER (Interrupt Enable Register). However, both bench A and bench B gather terminal count interrupts may be enabled by a single control bit, because if software wants an interrupt for one bench then it will also want this for the other.
The gather terminal count interrupt has been used in three circumstances:
The APU4 has sufficient memory to support gather buffers that are not circular if we can be sure that no gather will be started when data remains from a previous one or that the extractor will be far enough along that the inserter will not catch up to it. Since these requirements depend on other domains (scripts and data station) it is difficult to enforce them. The approach previously implemented is safer although much more difficult for software. Whether we carry forward that design or revert to a non-circular buffer can be entirely a software decision if the FPGA provides selectable (by software) interrupt capability on gather terminal count.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 36
5/3/03
[Report 35] [Report 37]OVERVIEW
The latest RPM PCB revision contains several wiring errors caused by signal naming, configurable use signals, and the complexity of the I/O scan chain. It also still contains the old Status Board scan chain interface, which has been replaced by direct I/O on JSTAT. Further, the EP1.5 instrument may require additional strobed fluid sensors and the circuits currently used for these are not correct for all fluid types, which have different conductivities. The design will be corrected and modified in three stages. Initially, we will correct the wiring errors on the existing boards. Second, to avoid further confusion, the schematic will be updated to eliminate the errors, naming confusion, and the unused Status Board scan chain interface. Finally, we will clarify the strobed fluid sensor issues, adding or modifying circuits as required, and fabricate the new version.
CORRECTIONS TO THE EXISTING PCB
STROBED FLUID SENSOR COMPONENTS
All of the eight strobed fluid sense circuits use the same set of components for the detector, .1uF coupling capacitor, .001uF filter capacitor, and 3.3K charging resistor. Hide has pointed out that, in recognition of the different conductivities of the various fluids, the CD3xxx series instruments vary the values of the filter capacitor and charging resistor as follows:
Fluid Sense Type |
R# (RPM), value (CD3500) |
C# (RPM), value (CD3500) |
In-line sheath |
R33, 330K |
C8, none |
In-line lyse |
R37, 33K |
C23, 100pF |
In-line oil |
R38, 3.3K |
C24, .001uF |
External waste |
R27, 3.3K |
C15, .001uF |
Diluent |
R28, 3.3K |
C16, .001uF |
Sheath |
R39, 33K |
C25, 100pF |
Lyse |
R40, 3.3K |
C26, .001uF |
Spare |
R29, 3.3K |
C17, .001uF |
The RPM uses a single set of component values in order to avoid hard-wired assignment of sensors to particular fluids. Clearly, this won't work if those values only work for the most conductive fluid. However, considering that in the no-fluid state, the conductivity is 0, it doesn't seem to make sense that any circuit but the most sensitive would be used in all cases other than where foam must be ignored. We will test this hypothesis and determine which, if any components need to change. For those that need to change for EP1.5 the changes will be done as part of the wiring corrections. Our testing should tell how we will handle the next revision. It may be possible to use the most sensitive circuit for all or we may need to make some less sensitive.
REMOVE OBSOLETE ELEMENTS AND RENAME SIGNALS
REMOVE OBSOLETE STATUS BOARD SCAN CHAIN INTERFACE
RENAME SIGNALS
MERGE MULTI-USE SIGNAL NETS
NAMING PROBLEM
The combination of boards in the EP1.5 instrument points out a flaw in our naming conventions, particularly of I/O PCBs and scan chain connections. Many of the names assume a specific geographical arrangement. RPM stands for Right Panel Module, LPM for Left Panel Module, and RPMD for Right Panel Motor Driver. I/O scan chain connections indicate the presumed upstream or downstream board. For example, the RPM's downstream connector is call JLPM and its upstream JAPU. However, the boards may be connected in any order. In EP1.5 we now have all of the boards in one pseudo-card-cage (this is likely to change but we are not sure to what configuration) where their names make no sense and we have LPM JRPM jacks connected to LPM JVPM jacks. In a final configuration, these names may help service technicians a little but, unless they are exactly right, they will be misleading.
We have already corrected this problem for IML (Inter-Module Link, i.e. intelligent) boards. The names APU and MSM stand for Analyzer Processing Unit and Motor Sample Module, which indicate the purpose of the boards. The IML connectors are called Master and Slave.
For IML interfaces, the terms Master and Slave are very appropriate because there actually is a functional master-slave relationship between upstream and downstream units. For scan chain interfaces, Master and Slave are not so clearly associated with upstream and downstream interfaces because all I/O boards are Slaves. However, it still makes sense to use these names to indicate whether a connection is toward the master, even if not directly to the master, or away from it. Another possibility would be to use names related to MOSI (Master Out Slave In) and MISO (Master In Slave Out). For example, the upstream interface could be called JMOSI and the downstream JMISO. These names would be shorter than JMASTER and JSLAVE (better for board legend) and could not be confused with the IML interfaces. Thus, the naming convention could extend up to the IML units, which already have JSLAVE jacks.
PROPOSAL
Although it would be inconvenient for us at this point to change names, especially of PCBs, the longer we wait, the more painful it will be. We could devise a general naming convention now and implement it piecemeal as new boards are built for functional reasons. The following name changes are proposed for review:
PCB NAMES
CONNECTOR
CDNEXT ANALYZER SYSTEM DESIGN REPORT 37
7/10/03
[Report 36] [Report 38]CONCEPTUAL DESIGN
The FPGA design generally exhibits a conceptual design flaw. Instead of presenting relatively primitive facilities for the CPU to configure in whatever fashion is appropriate to the application and feasible given circuit restrictions, it bundles facilities into application-specific blocks. This is seen everywhere: in hard-wired main memory addressing, in fixed conversion channel selections, in bundled functions that the CPU could have assembled from primitives, etc. This design approach restricts the developers' use of circuit capabilities, complicates software design, unnecessarily ties the FPGA design to a specific circuit, and makes the FPGA the gate keeper of the application for no apparent reason.
The FPGA program is fundamentally flawed in domain separation. Any configuring that could be done by software should be. The primary responsibility of the FPGA is to do things that software can't do. This would be true in any system but it is especially true in ours. Even if the CPU program had to change to configure the FPGA differently for a change in the application, this would be better than changing the FPGA program. But in our instrument, system configuration is defined by an interpreted file (the analyzer configuration file) giving instrument developers the means to support system configuration changes nearly instantly without having to rewrite and validate software or firmware. It is inexcusable, for example, that the FPGA must now be reprogrammed (for cell width) to adapt to a change in sample flow rate.
The argument that the FPGA can't support configuration by the CPU because it is already too full is wrong. It is likely that application-level bundled functions consume more resources than would more primitive configurable facilities.
CONFIGURATOR
The FPGA (U26: Altera EPF10K50BC356-3) is configured by a serial EPROM (U27: Altera EPC1PC8), by serial Bit Blaster (via jack JBLAST), or by the CPU. During FPGA development, the Bit Blaster is used because it affords the quickest means of trying out a new program. During software development, the EPROM is used because it simplifies operating the CPU under a BDM debugger. For production, the CPU is used to reduce parts and to support remote upgrading of the FPGA program. To avoid contention between the EPROM and the CPU, when the EPROM is used, JU5 must be open; when the CPU is used, the EPROM must be removed and JU5 closed.
Jack W has not reported any problems configuring the FPGA using the Bit Blaster. Up through version 1-7D of the program, there have been no problems configuring by EPROM. Beginning with version 1-7E, programming by EPROM has not been reliable. The version 1-7E EPROM was unable to program the FPGA. Jack reported that attaching a vertical wire to the EPROM's DCLK (pin 2) caused the FPGA to be programmed. This strongly suggested that the DCLK signal, which is supposedly an EPROM output, was actually an input, allowing ambient electrical noise to function as the programming clock.
Jack reported finding that the EPROM was not being programmed to automatically assert DCLK as an output, a problem that he claimed was corrected by his FPGA version 1-7F. However, one EPROM programmed with this version failed to configure the FPGA while another with this same version consistently succeeded. The EPROM programmer reads the same checksum from both the "good" and the "bad" devices. We need to determine what causes this difference and develop a consistently reliable procedure for generating an EPROM that does configure the FPGA.
CELL WIDTH
An event occurs when the signal on each channel selected to participate in the "trigger" exceeds the threshold programmed for that channel. As long as all trigger channels are in this condition, the event exists.
The FPGA implements an event duration low pass filter. If an event's duration is less than the lower limit of this filter, it is not counted by the hardware counter and its list mode data (ADC conversion of each channel's signal) is discarded (either by not writing to the destination (DMA) buffer in the first place or simply by not advancing the destination pointer.
The filter limit has varied between .5 and 2.5 usec. In the current FPGA design it can only be changed by changing the FPGA program. It must be programmable by the CPU at any time. Its range should be from 250 nsec to 10 usec with 250 nsec resolution. The CPU also must be able to disable it entirely so that all events are counted regardless of duration. Additionally, the FPGA program should be written in such a way that all elements of the filter can be optionally removed at compile time in order to save resources.
Some versions of the FPGA program have included "clump detection", which is a high-pass filter of the event duration. This and any remnants of facilities to record cell duration as a list mode data element should be either permanently removed or controlled by a compile time option that completely frees all associated FPGA resources.
LIST DATA DESTINATION ADDRESS
In the first functioning version of the APU (version 2), one of the CPU's (MC68340) DMA channels served to transfer ADC into list mode data buffers (in main memory). The CPU programmed the destination address and data count by programming the DMAC, not the FPGA. Through its control of the DMAC, the CPU could program the location of the list mode data buffer (there is only one in APU2 because it has only one bench) and its size. The current version of APU4's FPGA hard-wires the location and size of the two list mode data buffers. The CPU can program the starting position and cell count within the fixed buffer spaces through the "RawData Cell Ptr" and "Gather counter", which function analogously to the DMA's DAR (Destination Address Register) and BTC (Byte Transfer Count).
The problems with the new FPGA approach are lack of control and efficiency. The CPU should be able to program the FPGA in the same way that it did the DMAC. The FPGA doesn't need to know about buffers and their limits. The CPU should simply tell the FPGA where to start storing data for each bench and the maximum number of cells or channels (cell count * channels-per-cell) to collect. The CPU can take responsibility for not overrunning the available memory. This should be simpler for the FPGA, which now has to synthesize the address for each word stored by multiplying the cell ptr by the number of channels per cell and adding this to the buffer base address. The current version of the FPGA simplifies this task for itself by collecting a fixed count of four channels per cell, but this must be changed. Rather than complicating the FPGA by trying to maintain its current data destination address synthesis even while changing to a variable collected channel count, we should go back to a DMA-like approach in which the CPU loads a destination register and cell or channel counter and assumes full responsibility for overall memory management. This also entails less work for the CPU, which the current FPGA design now forces to continually resynthesize the data insertion address in order to extract data from the circular buffers without overrunning the FPGA.
The CPU program for managing the DMA-like design already exists, having been developed for APU2, so we know that this is feasible. However, the simple DMA interface presents a burden on the CPU similar to the FPGA's destination address synthesis requirement. The list mode data buffers are circular. To avoid overrunning the inserter (FPGA/DMA) the extractor (CPU) compares its next extraction address to the current insertion address. The insertion address must be on a cell boundary. If the raw insertion address advances after each channel is written, it will not be on a cell boundary for much of the time that all the collected channels of one event are being written. The APU2 program compensates by calculating the last/next integral cell data address using a channel-count-modulus. This calculation is of similar complexity to the new FPGA's address synthesis requirement and both yield the same integral cell address. The CPU's work would be reduced by merging the two approaches in the FPGA. In this proposed approach, the FPGA has a full memory width (i.e. 18-bit, as the DRAM CS bits are not needed and A0 is always 0) destination address register for each bench. The CPU can write and read this register. The FPGA advances the register only after writing all channels for one event. The FPGA might do this with a hidden 3-bit counter that it adds to the exposed register to synthesize each address. Another approach would be to copy the exposed register to a hidden 19-bit register, used for the destination address and incremented after each word is written, and write the hidden register back into the exposed one after transferring a full cell.
To recap, the three possible list mode data insertion means are as follows:
The Cell Pointer approach affords no benefits whatsoever. Raw DMA is simpler for the FPGA while the Smart DMA reduces CPU work. Both afford the CPU greater freedom in controlling the FPGA. Raw DMA is the most compatible with the approach that must be taken with APU2, which may be helpful during the transition from APU2 to APU4, and could also be used in any system where CPU DMA channels are used to reduce FPGA resource consumption.
COLLECTED CHANNELS
The analyzer has two light benches, supporting two independent event streams. The APU4 board has two high-speed ADCs for converting list mode data, with each dedicated to a bench. The assignment of ADC to bench is not determined by the APU circuit but by which transducers are attached to which parameter channels on the PCB. A main memory buffer is reserved for the data from each ADC. Thus, a "bench" comprises the physical flow cell, the ADC to which the transducers around that flow cell are attached, and a dedicated block of main memory.
The FPGA now collects list mode data from channels 1 through 4 for bench A and channels 5 through 8 for bench B. Exactly four parameters are collected for each bench. Ideally, all channels could be freely assigned to either bench, but the hardware doesn't allow this. Only channels 1, 2, 3, 4, and 8 connect to the bench A MUX (U36) while only channels 5, 6, 6A, 7, and 8 connect to the bench B MUX (U101). This is a design error; each MUX has two unused analog inputs, which should have been used for additional cross-coupled channels, for example channels 5 and 6 to MUX A and 3 and 4 to MUX B. This will be done in the APU5 design and could be added to APU4 boards. In any case, the FPGA should allow any combination of the eight analog MUX inputs to be collected on the associated bench. The FPGA should not blindly collect exactly four channels but rather the number of channels selected by the CPU, which could be more or less than four.
The FPGA should not assume anything regarding the analog bench inputs. For each "bench" it should present to the CPU an 8-bit register, into which the CPU writes a channel select bit-map with each bit corresponding to an input. The event trigger for each bench should be a similar bit-map, which selects an arbitrary set of Trigger Comparator outputs. When all selected triggers are true, the selected collection channels are converted and stored in memory. The trigger and collection sets are often not identical and the FPGA should not try to enforce any relationship between them. The CPU will not make illogical arrangements such as including in one bench's trigger set a signal that only connects to the other bench's MUX.
It should be noted that the APU4's concept of benches is misleading. There is no fixed relationship between the transducers and the board's input channels and no overall design relationship between the input channels and the two ADCs. In fact, there is no general reason for having two ADCs. Analog channel signals are captured (peak and hold) for presentation to the ADC. If events occur simultaneously in the two flow cells, the FPGA could process the two sets of channels sequentially as long as the peak and hold droop of the delayed event's signals is not significant. Therefore, having two ADCs really only serves to reduce conversion time. That each ADC is fed by a MUX with a nearly unique (both see channel 8 and the test signal) set of inputs is an implementation decision. As already mentioned, four additional channels could easily be cross-coupled between the two MUXs simply by adding wires. An alternative design could feed all channel signals to each ADC. However, this much flexibility is not required. It is sufficient that ADC have four dedicated inputs, one test input, and three cross-coupled inputs. The FPGA should not be aware of any of this (unless APU5 increases the number of MUX inputs). It should implement simply the bit-mapped channel collection and trigger sets.
HARDWARE-BASED DECIMATION
Decimation is a data compression process in which the list mode data of a percentage of events that fit certain criteria is discarded. Only the list mode data is discarded; these events are still counted (in the "hardware" count). In previous instruments (as well as in APU2) decimation is done by the CPU. A typical decimation scheme is to not collect the data for four out of every five events that have a signal above a certain level in one channel. In older instruments that support this capability, the channel, level, and decimating factor (e.g. four out of five) are hard-coded in the program and associated with a particular cell type (Reds) and intended for a specific purpose (to reduce the ratio of RBC to PLT list data). The APU2 program allows these characteristics to be changed on the fly and adds that the level is a band pass rather than low pass or high pass filter. Setting the low threshold equal to 0 creates a low pass filter and the high threshold to the maximum (65535) a high pass.
Software-based decimation is expensive. Not only does it entail substantial CPU work but also, additionally, it requires the FPGA to collect data for all events, including those that will be later discarded. Implementing decimation in the FPGA could be much more efficient. The current APU4 FPGA supports hardware-based decimation but only in a fully hard-wired form similar to the software-based version in older instruments. This is not acceptable. Due to APU4 hardware constraints, certain characteristics of decimation may not be programmable but decimation must be as programmable as the hardware will allow.
To provide decimation, the FPGA tests the signal in a selected "discriminating" channel against one (high or low pass) or two (band pass) thresholds. Theoretically, the comparison can be done in the analog or digital domain. If digital, the FPGA would have to convert (ADC) at least the discriminating channel and then compare it to the one or two thresholds. Converting the discriminating channel only one time and not all the channels in the collection set could significantly complicate the FPGA. But if the FPGA were to convert all of the channels in the collection set only to determine that the event should be discarded, the only benefit would be a reduction in the CPU's work (in addition to giving the CPU a little more bus access time by reducing the cycles stolen by the FPGA to write the list mode data). If the decimation decision could be made in the analog domain, the ADC would be immediately free to convert another event (that is not discarded).
Every event parameter channel in the APU has at least one analog comparator. However, if the parameter participates in the event trigger, this comparator must be dedicated to the trigger level and cannot be used for a decimation threshold detector. Unfortunately, any parameter that that is significant enough to be used for the decimation decision will also be important in the trigger determination. Channels 6 and 8 are unique. Channel 8 has two comparators, "Trigger" and "Upper Gate". The channel 6 input is split into a standard circuit and a "high gain" channel 6A. Thus, both of these channels inherently have a second comparator. However, any of the other parameters can have a second comparator by connecting the transducer to two channels, one for trigger and collection and the other just for a decimation comparator. Channels 6 and 8 are unique in other ways that might interfere with their dual use for general trigger and collection plus decimation. Channel 8 has an extra amplification stage between the Fine Gain Adj and Baseline Restore. Whether this prevents its use for any input other than Impedence, which we have concluded will not be needed, is not clear. Channel 6A does not merely provide a second comparator but is also routed to the bench B MUX and not (now) to the bench A MUX. That channel 6A has higher gain than the others probably does not interfere with its potential use as a decimation threshold detector but if it is used for this purpose, it will not be available for its intended purpose, to amplify a particular parameter of small cells (specifically platelets).
The definition of the decimation decision should be altered slightly to support a practical analog-domain decision mechanism. Instead of a band pass, the filter should be selectable high or low pass so that a single comparator output XORed with a pass-type select bit determines whether an event lies in the decimated region. While APU4 hardware clearly constrains the selection of an analog comparator for the decimation decision, it does not impose any other restrictions and there is no reason for the FPGA to impose any. The decimation mechanism can be easily generalized like the trigger and collection channel sets. One of 10 comparators-- 1, 2, 3, 4, 5, 6, 6A, 7, 8, and 8-upper-- is selected as the discriminator. When the input signal exceeds the discriminator threshold and high pass filtering is selected or the input signal is below the threshold and low pass is selected then the event is subject to decimation. Note that this is a two-bit XOR function. The FPGA discards such events up to a "discard" count of from 0 to 255 programmed by the CPU. It converts and saves the next such event and resets the discard counter. The FPGA presents to the CPU a means to select the discriminating comparator, to select the pass type (high or low), and an 8-bit discard count. This mechanism is simple enough that it may be duplicated for each bench. The FPGA does not need to know whether any particular configuration is practical or feasible. The CPU will take responsibility for this.
Hardware-based decimation determined in the digital domain appears to be too complicated to implement effectively.
TEST MODE (ECELL)
Event counting ("hardware" count) and list mode data conversion and storage can be tested using a test mode built into the APU. We call this ECELLs (electronic cells) for the fact that a test voltage is fed into each channel instead of normal input. In ECELL mode, every time the CPU requests a cell, the FPGA asserts NTEST_STB, causing one-shot U67 to generate a fixed duration pulse (TEST_MAM) that gates the test voltage to the Fine Gain Adj block of every channel. The only thing that ECELL testing does is to substitute the test voltage for each channel's normal input. Trigger thresholds and channel set are configured as for normal operation. If the channel collection set and decimation could be configured (they should be in a redesign of the FPGA) they would be configured as for normal operation.
The current FPGA design presents an 8-bit register to the CPU for controlling ECELL operation, which is documented only by example. The example shows writing 0x5A to set ECELL mode and again 0x5A to generate each event and then 0 to return to normal mode. To enable ECELL mode the FPGA reverses the TPEN/NTPEN signals that control the input selection to all channels' Fine Gain Adj. Since the mode can be selected by one bit and the pulse generated by another, the specific 8-bit control word is apparently unjustified. It would seem that the control should have only two bits and the purpose of each should be clearly documented.
CDNEXT ANALYZER SYSTEM DESIGN REPORT 38
10/27/03
[Report 37] [Report 39]RSH COVER, ANNUNCIATOR, LIGHT CURTAIN PCBS
EMAIL FROM DAVID MCCRACKEN TO KEN BRIGGS 7/15/03
Assuming that the tray latch PCB is made as one PCB with alternating hard panels and flexible transition sections, all flags can share a digital ground and +5V wire and all solenoids can share an analog/power ground. Each solenoid and flag pair requires one unique solenoid driver wire and one unique flag sense. The one connector, which is located at the edge of the leftmost hard panel requires 12 pins: 4 solenoid drivers, 2 power grounds, 1 digital ground, 1 +5V, and 4 flag sense. If we use a 14-pin connector, the two extras can be used to beef up the power ground.
Both the transmitter and receiver light curtain boards can either matrix or serialize their opto elements. For 40 devices per board, matrix requires 13 wires but no components other than the opto devices themselves, while serial requires only 3 control wires plus ground and +5V in addition to 5 ICs and 40 resistors on the board. The lower board (transmitter) also requires annunciator LEDs. Assuming three of these per bay, the 12 devices can be matrixed with 7 wires or serialized with 3 plus 2 ICs and 12 resistors. It isn't necessary to use the same addressing architecture for the three groups but, assuming that we do, maxtrixing requires a 14-pin connector on the top/cover (receiver) board and a 20-pin connector on the lower (transmitter) board. Serializing would require only 5 pins for the receiver board but the smallest connector available has 10 pins. The transmitter board would require a 10-pin connector (2 ground, 2 +5V, 1 clock, 2 data, 2 load) but 7 ICs and 52 resistors. There is little difference between serialized and matrixed as far as the control and electrical systems are concerned. Matrixing might be a few dollars cheaper than serializing but not enough to drive the decision. Mechanical considerations may be the most important. Do the PCBs have space for the ICs and resistors? Does the lower PCB have room for a 20-pin connector?
We have been assuming standard two-row .1" box connectors. These have a high profile that may interfere with mechanical constraints especially on the light curtain boards. An alternative is to use a mass-terminated solder-tail connector. On end of the cable has a standard female box connector for plugging into the I/O board jack. A solder-tail MTC connector is attached to the other end of the cable and this connector is soldered into the light curtain board, creating a single board plus cable system component. One manufacturer of this type of connector is CW Industries (www.cwind.com) for example the CWR-142-10-0003, which is listed in the Digikey catalog as CPC10T.
COMPONENT REQUIREMENTS
As measured in the first prototype system, the power requirements are:
COMPOSITE REQUIREMENTS
SCENARIO REQUIREMENTS
Total power requirements listed at 100%, 80% (switcher), and 50% (linear) efficiencies.
PDM (POWER DISTRIBUTION MODULE)
Size PDM wiring for maximum current for each PCB as follows:
The large maximum 24V current requirements of RPM and LPM may be difficult to achieve on the PDM foil. If necessary, lower maximum currents are acceptable but will force flow scripts to consider the location of solenoids when simultaneously activating more than the number supported by any one board. The 12V current requirements cannot be compromised because there may be times when all solenoids on one board must be holding.
To reduce voltage drops, the PDM should provide wire clamping connectors instead of post jacks for connecting to the power supplies. These could also be used for each board connection although it is less desirable in this case because these cables are more likely to be reworked during development.
POWER SUPPLIES
MODERATE POWER USAGE
Assuming the moderate power usage of scenario 5 plus heaters but AC instead of DC pumps, the power supply (supplies) must provide 5V@3.25A, +15V@1A, -15@1A, 12V@15.4A, and 24V@12.2A. This will draw 655W with switchers and 1048W with linear supplies. Adding the DC pumps boosts the 24V requirement to 20.2A and the line power to 895W with switchers and 1432W with linear supplies.
MAXIMUM POWER USAGE
Assuming DC pumps and 12 medium power steppers moving, the power consumption with all solenoids off is 339/424/678W (100%/80%/50% efficiency) leaving 1161/1076/822W power budget for solenoids, assuming a maximum 1500W power consumption. With switching power supplies, each solenoid requires 9W from 24V at activation and 1.92W from 12V to hold. With linear supplies, these are 14.4W from 24V and 3W from 12V. Assuming that all 125 solenoids are on simultaneously, their power consumption is S * 9 + (125-S) * 1.92 with switchers and S * 14.4 + (125-S) * 3 with linear supplies, where S is the number of devices simultaneously being activated. To stay within the power budget, with switchers 118 solenoids may be switching while the remaining 7 are holding. With linear supplies, 39 solenoids may be switching while the remaining 86 are holding. Turning on 118 solenoids would require 35.4A from 24V, while 39 would require 11.7A. 12V must still supply 16A for holding all solenoids in a maximum scenario because the 24V provides power only during switching.
The 1500W total power budget with 80% switchers is apportioned into 5V@3.25A, +15V@1A, -15@1A, 12V@16A, and 24V@47A. The same budget with 50% linears is apportioned into the same 5, +15, -15, and 12V currents, but 24V@23.3A. Note that the apparent power totals, 1708W and 1595W, are higher than the assumed 1500W limit. This is because the 12V must support all 125 solenoids simultaneously holding, at which time they would consume no 24V power and the 1500W limit would not be exceeded. Note also that the linear supply almost seems to be a better deal than the switcher, with a 1595W apparent power compared to 1708W to support a maximum power situation. Actually, there are two situations here: one in which 118 solenoids can be simultaneously activated, which only the switcher can support; and one in which only 39 solenoids can be simultaneously activated. The purpose of these calculations is to set an upper boundary for current requirements, above which any additional capacity is useless, not to suggest a particular design solution.
THE ANSWER
The power supply must provide at least 5V@3.25A, +15V@1A, -15@1A, 12V@15.4A, and 24V@12.2A. There is no benefit to providing more than this for 5V, +15, and -15. The maximum useful 12V current is 16A. The maximum useful 24V current is 47A with switchers and 23.3A with linear supplies.
ADAPTING CD4000 PMT PREAMPS FOR USE WITH APU4
REQUIREMENTS
APU4 appears to have inadequate gain control compared to the APU2. It not only doesn't have the 2/4/8 selectable prescale capability of the APU2 but also simply seems to produce weaker low-level signals. When CDX8 has an APU2, Rathna is able to collect adequate data for the standard PLT/RBC and WBC methods even though the prescale settings don't change for the different gather types. When the same instrument has an APU4, Rathna has been unable to collect adequate data for all methods without changing the gain of the APU4's front-end amplifiers for each type of gather. She has found a compromise gain for channels 1 and 2 that is high enough for PLT without clipping WBC signals but she has also shown that a similar compromise is not possible with channel 4.
Channels 1 and 2 use photodiodes while 3 and 4 use photo-multiplier tubes (PMT). We have been using the CD3xxx PMT supply/preamp unit (Abbott drawing number 9631010) with both APU2 and APU4. Norm has reported that this unit seems to produce a noisier output than the CD4000 PMT preamp #1961. The CD3xxx PMT unit provides dynode voltage control by a potentiometer that must be adjusted by hand. The CD4000 unit affords remote control. Rathna has demonstrated adequate data collection for all gather types when channel 4's dynode voltage is adjusted for each type. Manufacturing has requested the elimination of all hand-operated potentiometers. Thus, it may be possible to simultaneously address three problems by replacing the CD3xxx PMT preamps with the CD4000 version.
Norm has opined that we should try to find out the root cause of the APU4's gain deficiency before compensating for it. However, since either of the other two reasons are sufficient motivation to change the PMT unit, we may as well take advantage of the change to at least temporarily make the APU4 functional. This change doesn't prevent anyone from further analyzing, characterizing, and attempting to improve APU4.
ANALYSIS
The two most critical signals from the standpoint of noise are the dynode control and the PMT output. The CD4000 PMT circuit appears to try to provide a differential control signal input with VDYN and VDYNRET but whatever benefit this affords is lost by the fact that the circuit ties the high-voltage converter's power ground and control return together. Thus, any noise or level shifting that appears at the power ground (VPSRET) will change the control reference point and, therefore the output (dynode) voltage.
The proper interface to the high-voltage converter would be a differential analog output from the control board (APU4 in our case) tied directly to the converter's control and control ground signals. The power and control ground would not be tied together although a common practice would be to tie them together through a relatively low value resistor (e.g. 100 ohm) to prevent them from drifting too far apart.
The APU4 provides three single-ended dynode control signals taken from MUXDAC channels 16, 17, and 18. Since these are not differential, superficially it would seem that they don't interface properly to the PMT circuit. However, since that circuit is already wrong, a functional differential interface isn't feasible anyway. In fact, the most effective interface would appear to violate all design rules. Because the control reference is tied to power ground, common mode noise rejection would be enhanced by deliberately driving the noise from the power ground into the control signal. This can be done by providing the power ground (VPSRET) through a coaxial sheath around the control signal, in effect producing a driven shield, deliberately driven in a manner that would normally be the worst way possible. Both ends of the shield are connected, one to the APU's analog ground and the other to the preamp board's converter power ground and power flow through this connection.
The CD4000's PMT preamp circuit also improperly handles its PMT signal output (PMTSIG). Although all of the APU's channel inputs are differential, the PMT preamp drives sig+ from the output of a single-ended, ground-referenced amplifier. The analog ground itself provides sig-. While this isolates the signal from ground shifts, it provides little common-mode noise rejection. A better approach would be to convert the single-ended output to differential and transmit the fully differential signal in a twisted pair. Since we don't have the luxury of completely redesigning the circuit at this time, the best we can do is to transmit in a twisted pair with sig- tied to the analog ground, which is isolated from the high-voltage converter ground (VPSRET) by LC filters.
With such poor treatment of the two most critical signals, one might wonder why this circuit is reportedly less noisy than the CD3xxx version. That circuit treats its output similarly so this signal suffers in either case, but the CD4000's dynode voltage control is an additional noise problem that the CD3xxx version doesn't have. The Hamamatsu high-voltage converter used in the CD4000 circuit might be quieter than the homemade converter in the CD3xxx circuit. However, the one thing that the CD4000 unit does have that would produce an apparently less noisy output is an LR filter on the PMT signal itself. Filtering the signal of interest represents a last-ditch effort to get rid of noise that should have been prevented from superposing on the signal in the first place.
The CD4000 PMT preamp provides no means of measuring the dynode voltage, which can exceed the 1KV limit of most multi-meters (some can't even take inputs above 500V and there is always a user safety issue when probing high voltages). The CD3xxx circuit contains a divide-by-minus 100 test point that at least can be tested with an ordinary meter and doesn't expose the operator (as readily) to high voltages. The APU4 provides inputs for measuring the dynode voltage remotely. We should take advantage of this to provide remote control and testing through the data station.
IMPLEMENTATION
The CD4000 PMT preamp is simply a bad design that should be redone before CDNext goes into production. To meet the immediate needs, the circuit can be patched to provide both remote control and reading of the dynode voltage. Reading the voltage requires an amplifier to reduce and invert the 0 to -1000V to the 0 to +12V that the APU analog inputs require. This amplifier can be taken away from the dynode control voltage, which is excessively complicated for no purpose. Even the first amplifier isn't really needed but to reduce rework it will be retained. The second amplifier is disconnected entirely from the dynode voltage control circuit and used for the voltage readback circuit.
DYNODE VOLTAGE CONTROL
The differential VDN/VDYNRET circuit is converted to single-ended input using the U3:1,2,3 op amp changed to a voltage follower.
DYNODE VOLTAGE READBACK
The second amp of U3 pins 5,6,7 is connected to the high-voltage output as a divide-by-minus 100. The output is connected to the previously unused JP1 pin 7.
CABLING
Two PMT preamp boards will be connected to APU4. The Channel 3 (90-degree) unit will connect to APU4's JPMT1 for dynode voltage control and high-voltage converter ground, JCH3 for PMT signal, and JPMT for +/-15V power. The Channel 4 (90 depolarized) unit will connect to JPMT2, JCH4, and to the same JPMT pins as the other unit. The shared JPMT connections may be either star or daisy-chain.
TESTING
CDNEXT ANALYZER SYSTEM DESIGN REPORT 39
2/1/04
[Report 38][Report 40]
PREPARATION INSTRUCTIONS
System Design Report 36 [Rpm Version 5 Corrections] describes PCB corrections required by RPM version 5. These are inaccurate because some of the items refer to erroneous changes that had already been made to one of the boards as if they were part of the original design and because the fluid sense system had not been fully analyzed. A new document [rpmv5.doc] accurately and completely describes the required PCB corrections, PLD and jumper configuration, testing, and wiring of RPM5. That document is intended to be used as an instruction sheet, which can only briefly mention rationales and must be allowed to change. Therefore, it exists as a stand-alone document rather than being merged into the design history.
DESIGN HISTORY
Previous reports (Report 31 [Preparation Of New Analyzer Boards], Report 27 [Other RPM Issues], Report 24 [Vpm, Rpm And Lpm] [Status Panel], Report 22 [Vpm, Rpm, Lpm, Status Panel, Loader Board]) document the evolution and design history of the RPM. RPM5 affords increased configurability, more complete utilization of scan chain resources, and a more efficient Status Board interface. These are discussed here. Additionally, the strobed fluid sensor system is analyzed and tested here for the first time.
SOLENOIDS
RPM5 potentially has three more solenoids than RPM4. SOL1 and SOL2 are available if the electric shear valve 0 and shear valve 1 are disabled. This is the case in the EP1.5 and later instruments. SOL3 is available in all configurations. The remaining 24 solenoids, SOL4 through SOL27, are located in the same bit addresses (in space 2) as SOL1 through SOL24 in RPM4. We could retain the same device addresses by shifting all connections three places, for example the SOL1 wire to SOL3. Alternatively, we may retain the same wire connections and change the device addresses in the analyzer configuration file.
Initially, the analyzer configuration definitions are retained and the three additional solenoids are defined as RPMSOL1, RPMSOL2, and RPMSOL3, located at bit addresses 1, 3, and 15. This is feasible only because the existing configuration files don't use the actual solenoid numeric names but rather names inherited from the CD3200 flow panel and duplicated on the EP1.5 flow panel. For example, the first solenoid on RPM4 is called RPMV11 (a declared alias) or SOL11 (a name that may be used in documentation but is not actually declared in the configuration file). These names were inherited from the CD3x series labeling based on electrical arrangement; SOL11 is the first solenoid on the first driver board. This naming convention has nothing to do with CDNext or any version of the RPM.
STROBED FLUID SENSORS
FLUID PARAMETERS
The strobed fluid sensor circuit operates by measuring the time required to charge a fixed on-board capacitor plus an unknown sensor, which may provide a resistive and/or capacitive path to ground, up to a fixed threshold voltage. Reaching the threshold within 10 usec (after charging begins) indicates that the sense path load is small, i.e. that there is no fluid. If the sensor resistance is low or the capacitive loading is high, the threshold will not be reached in the allowed time period, indicating that there is fluid.
Whether the sensor load is primarily resistive or capacitive, a path between the grounded and active electrodes exists only through either air or the fluid. The dielectric constant of air is, presumably, sufficiently less than that of any fluid that the circuit can sense the difference. Previously, we had assumed that the sensor load, i.e. the measured parameter, was primarily resistive. We recently tried to measure the resistance of the waste and found it higher than the range of a standard multimeter. Since the circuit detects this fluid, its capacitance must be the main parameter. To determine the actual load of each fluid, its resistance and capacitance need to be tested as normally seen by the sensing circuit. This is especially true if the primary parameter is capacitance, which is more sensitive to electrode geometry than is resistance. The fluids currently used by CDX9 were tested as they would be normally monitored, waste with its cap-mounted electrodes inserted into a beaker of waste fluid and Hgb Lyse and Woc Lyse in their reservoirs. A conductance meter, accurate to 100 MOhm, was used to measure the resistance. Capacitance was measured with a rather imprecise multimeter. Additionally, a source container electrode probe was fabricated from stainless steel rods in polyethylene tubing. This has been used to measure the capacitance of some CD3200 fluids in their source containers. Those results are also reported here.
The capacitance of the down-hole probe in distilled water was also tested. The meter reported less than 100pF but readings below 500pF are not trustworthy.
SENSOR CIRCUITS
As designed, all versions of the RPM have eight identical strobed fluid sensors, with a 3.3K resistor and .001uF capacitor. This is the same as the least-sensitive circuit used in 3000-series instruments. Because of an error in the APU2's FPGA scan chain implementation, the RPM has been sampling the sense comparators after 20 usec charging time instead of the intended 10 usec. This approximately doubles their sensitivity. But even with this, they have been unable to detect certain fluids, particularly the Woc Lyse. RPM5 corrects the timing, which reduces the sensitivity. Sensitivity is regained by changing the resistor and capacitor in each sense circuit.
The circuit as used in 3000-series instruments has one sense circuit with a 330K resistor and no capacitor. This is not reliable. The distributed capacitance and current leakage across the PCB and between traces and wires contribute significantly to the operation of the circuit. These parameters are uncontrolled and change over time, especially in the presence of dust and humidity. The lowest on-board capacitor that will not be significantly influenced by uncontrolled parameters is 100pF. The threshold voltage is 2.45V, which is 50% of the charging voltage (5V - Vcesat of the charging transistor). The time to reach the threshold is .693 * R * C. The most effective division of the triggered vs. untriggered range occurs when the load equals the reference. Assuming that the capacitance in air is 0, the trigger load would be 100pF. The resistor sees this in parallel with the reference. 43K will charge 200pF to the 50% point in 10 usec. The JSNS1 resistor and capacitor were changed to 43K and 100pF. This was tested using fixed value capacitors and the StrobeSensTest flow script (in sstest.f). It ignored 100pF and reported 118pF as fluid present. This can detect all of the CDX9 and CD3200 fluids that we have tested. When the down-hole probe was connected to this circuit and inserted into distilled water, fluid was reported. It is very unlikely that we will have to detect any fluid with lower dielectric constant than distilled water, so we clearly don't need to push the sensitivity any lower than this.
Although the most sensitive circuit, comprising 43K and 100pF is reasonably repeatable, the less sensitive the circuit is the less likely it will be to develop problems as long as it easily detects the fluid. Following the same design approach, a mid-range sensitivity circuit was designed, taking 330pF as a given and calculating that R should be 22K. This was built and tested and found to not respond as close as predicted. Experimenting with several instances of the circuit showed clearly that 15K produced the ideal response for some reason. Therefore, 15K was designed into the circuit instead of 22K. This circuit ignored 220pF and reported 330pF as fluid. This could probably be safely used for all fluids.
The existing 3.3K and .001uF circuit was tested. It ignored 1nF and reported 2nF as fluid. To provide a range of sensitivities, instances of all three have been specified.
For CDNext EP1.5 and later instruments, the following connections are recommended:
This arrangement provides source monitoring for only two fluids. Additional sources may be monitored if certain reservoirs can share a sense circuit. This is feasible if the reservoirs that share a circuit are never simultaneously full. There is no other means of monitoring source fluids. The resistance of all fluids is too high for any of them to be monitored by a general sensor and strobed sensors cannot simply be added to the existing RPM although it would be easy to add them to a revision of the board.
The fluid sensors work as specified only if JPLDI1 correctly tells SCK speed (see next topic). For APU2 JPLDI1 jumper must be installed. For APU4 it must be removed.
PLD
RPM version 5 replaces the Altera PLD used in previous versions with a Xilinx XC95144XL, which is programmed in the circuit instead of in a programmer. This eliminates a socket and reduces the PLD's size and cost. Because a relatively flexible JTAG interface is used to access the device for programming, the chip may change without our having to find a new programmer or adapter. This is more convenient in some ways but less convenient in others. Particularly, we cannot simply program a supply of devices for use whenever a board needs one but must program existing boards that function at least enough to power the PLD device. The PLD code is written in Verilog. It is located at cdx\an\rpm5\pld. The source files are rpmpld.v (the actual code) and rpmpld.ucf (tells the pin-signal association).
The current version of the PLD code is 3. In previous versions for RPM5 and in the code for Altera PLD in previous RPM versions, the program derives all timing from the scan chain SCK, which is supposed to be 4MHz. The APU2 FPGA incorrectly implements a 2MHz SCK. To continue supporting APU2 while adding support for APU4 (as well as MSM) the internal (to the PLD) clock divisor can be selected for a 2- or 4-MHz SCK. There is one unused MOSI bit, 13, could be used to make this selection but this would require a script command to determine which types of APU and RPM are running and to set the bit accordingly. The alternative of using a jumper to make the selection is more consistent with the overall strategy of the RPM design but there is no spare jumper. However, since we aren't using any electric yvalves now and any instruments that do use these have only one of them, the second yvalve enable jumper JPLDI1 would seem a reasonable candidate for making the selection. Therefore, the program was modified to use this input for selecting the SCK speed. If the jumper is off, a 4MHz SCK is assumed; if on, 2MHz is assumed. The second yvalve is disabled in all cases, and its MOSI control bits are routed as simple outputs to JPLD0.
The new type shear valve controller (with drivers on the RPM board) also uses SCK for timing. The original code, which assumes a 4MHz SCK, has not been changed. Any system that might use a new type shear valve is likely to be based on APU4. If this turns out not to be the case, this code should be updated to be aware of the SCK option.
rpmpld.v
module rpmpld(
...
SVValvesEnabled, // Enable <0,1> Shear valve, else SOL1 and/or SOL2.
SVNewEnabled, // Enable new (on-board driver) or old style shear valve.
YV0_En, // (JPLDI4) Enable Yvalve 0, else MOSI4->JPLDOp7, MOSI5->JPLDOp6.
YV1_En, /* (JPLDI1) Select 2MHz (APU2) or 4MHz (APU4 or MSM) SCK
* used for timing (e.g. of fluid strobe). This input was originally intended
* for selecting YV1 control or simple output. That use could be restored by
* using the extra MOSI bit (13) to select the clock speed. Jumper off selects
* 4MHz, on 2MHz. In either case, MOSI6->JPLDOp9, MOSI7->JPLDOp8. */
...
if (NPCS) begin
if (LAST_MOSI_b8 != mosi_shift_registers[8]) begin
LAST_MOSI_b8 = mosi_shift_registers[8];
if (LAST_MOSI_b8 && timer == 0) // Start pulse on low to high transition of bit8
if( YV1_En )
timer = 40; // 10 usec pulse @ 4MHz SCK
else
timer = 20; // 10 usec pulse @ 2MHz SCK
...
//
// Output Assignments
//
assign CHARGE_SENSORS = STROBE_SENSORS; // strobed fluid sensor output
...
/*
YValve1 is permanently disabled in order to use YV1_En for selecting the
strobed fluid sensor divider. The YV1 outputs are simple scan chain.
assign YV1_CW = (YV1_En && YV1_On && YV1_CW_REQ && !YV1_OC && YV1_CWH && !PRST) ||
(!YV1_En && YV1_CW_REQ && !PRST);
assign YV1_CCW = (YV1_En && YV1_On && !YV1_CW_REQ && !YV1_OC && YV1_CCWH && !PRST) ||
(!YV1_En && YV1_On && !PRST);
*/
To provide a convenient probe attachment point for monitoring the strobe and verifying its timing the signal has been assigned to JTEST pin 2. This may be replaced at any time by another internal signal to be monitored.
// Generic outputs
assign SOL25_HI = (GenOut[14] && !PRST);
assign SOL25_ON = (GenOut[15] && !PRST);
// JT1 is permanently assigned MOSI13 and could be used for application.
// JT2-5 are assigned internal signals as needed for testing.
assign JT1 = (GenOut[13] && !PRST);
assign JT2 = STROBE_SENSORS; // For verifying 10 usec strobe width. Note JPLDI1 use.
TESTING
The "light chase" script, rpmlpm that has previously been used to verify solenoid operation on the RPM and LPM in series has been expanded to also test the three new solenoids. These also have to be defined in the analyzer configuration file. Rpmlpm should not have to be compiled specifically for APU2 or APU4. This script can be run to test the RPM's solenoids without an LPM. If an LPM is connected, the RPM's downstream scan chain interface is tested.
analyz4/2.ini
; -------------- STATUS BOARD (RPM5) ----------------
ACTUATORS
UNIT = APU
SPACE = APU_ACTS
MULTIBIT = BYTE
Beep = 9 1 = On 0 = Off POWERDOWN = N
Green = 10 1 = On 0 = Off POWERDOWN = N
Yellow = 11 1 = On 0 = Off POWERDOWN = N
Red = 12 1 = On 0 = Off POWERDOWN = N
; -------------- RPM -------------------------
ACTUATORS
UNIT = APU
SPACE = APU_ACTS
MULTIBIT = BYTE
FluidStrobe = 8
RPMSOL1 = 1
RPMSOL2 = +2
RPMSOL3 = 15
rpmlpm.f
define delay wait for 0.1
begin RpmLpm unit APU
echo "Begin RpmLpm"
loop 5
write 510000h 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
close RPMSOL1
delay
close RPMSOL2
delay
close RPMSOL3
To test inputs like the general sensors JSNSR1-8 and the fluid sensors JSNS1-8 we have previously used the debugger's general purpose Hardware test dialog (Target- Hardware menu). This is tedious for determining whether and where a particular switch is connected (to one of the JSNSRs) or the fluid sense range that can detect a particular fluid/electrode combination because no more than four inputs at a time can be monitored and set up is not obvious. To address this, two similar scripts have been developed, GenSensTest (in gstest.f) and StrobeSensTest (in sstest.f). Both contain continuous loops that use VARs to keep track of the previous states of all eight sensors at once, reporting any that change. For example, to determine which of the three fluid sense ranges can respond to a fluid/electrode combination, the StrobeSensTest script is started. If all JSNS inputs are open, the script initially reports 1 (no fluid) for all eight. The sensor to be tested is then plugged into JSNS1 or 2 to see if the most sensitive circuit can detect it, in which case, the script reports a state change to 0 for that input. This is repeated on JSNS3 or 4 for the mid-range sensitivity test and JSNS4, 5, 6, or 7 for the least sensitive.
sstest.f
begin StrobeSensTest
echo "Begin StrobeSensTest"
initialize FluidSense
VAR0 = 2
VAR1 = 2
...
loop 0
if FluidSens1 = 0
if VAR0 != 0
echo "FluidSens1 is 0"
VAR0 = 0
endif
else
if VAR0 != 1
echo "FluidSens1 is 1"
VAR0 = 1
endif
endif
if FluidSens2 = 0
if VAR1 != 0
echo "FluidSens2 is 0"
VAR1 = 0
endif
else
if VAR1 != 1
echo "FluidSens2 is 1"
VAR1 = 1
endif
endif
...
endloop
end
DOCUMENTATION
The plain text file RPM5.TXT has been created for the debugger's on-line help system. The RPM.TXT previously used has been renamed RPM4.TXT and the help ini file has been modified to present both in the debugger's System Info menu.
sysinfo.ini
[HelpSystemInfo]
...
RPM4
RPM5
[HelpSystemFiles]
...
RPM4=RPM4.TXT
RPM5=RPM5.TXT
rpm5.txt
<RPM (Right Panel Module) version 5 (9602330-rev P14)>
------ CONFIGURATION --------
Place jumper on JSCLK pins 1-2.
For Status board 9601920 place jumpers on J1 pins 1-2 and JLED_INV.
For 2MHz SCK (APU2) place jumper on JPLDI1 (YV1 is permanently deselected).
To enable JSOL4 to 7 (SOL 1 to 4) place jumper on JBP4 pins 2-3.
To enable JSOL8 to 11 (SOL 5 to 8) place jumper on JBP3 pins 2-3.
To enable JSOL12 to 15 (SOL 9 to 12) place jumper on JBP2 pins 2-3.
To enable JSOL16 to 19 (SOL 13 to 16) place jumper on JBP7 pins 2-3.
To enable JSOL20 to 23 (SOL 17 to 20) place jumper on JBP6 pins 2-3.
To enable JSOL24 to 27 (SOL 21 to 24) place jumper on JBP5 pins 2-3.
To enable GENSNSR 1 to 8 place jumper on JBP1 pins 2-3.
----- GENERAL SCAN CHAIN ADDRESSING -----
MSM MOSI and MISO addresses shown here apply when connected to JLDRIO with
JU17 pins 1-2 and JU12 pins 1-2 connected. For all, output is space 2, input
space 3.
......Output. Total 64 bits.........
Byte 0 (APU @510000, MSM @500012)
Remove JSV0_EN jumper to enable SV0.
Remove JSV1_EN jumper to enable SV1.
Place jumper on JNEW_SV to select "new" SV control with driver on RPM.
SOL1 is enabled if JSV0_EN jumper is on (disabled) or JNEW_SV jumper is off
(SOL1 or SV0 may be connected but not both).
SOL2 is enabled if JSV1_EN jumper is on (disabled) or JNEW_SV jumper is off.
(SOL2 or SV1 may be connected but not both).
Place jumper on JPLDI4 to enable YV0, else MOSI4,5->JPLDO7,6.
0 = SV0 CW/CCW. JOSV0 pin 5. Alternately JSOL1_HI (SOL1_HI)
1 = SV0 ON. JOSV0 pin 4. Alternately JSOL1_ON (SOL1_ON)
2 = SV1 CW/CCW. JOSV1 pin 5. Alternately JSOL2_HI (SOL2_HI)
3 = SV1 ON. JOSV1 pin 4. Alternately JSOL2_ON (SOL2_ON)
4 = YV0_On. YV0 if JPLDI4 on, else JPLDOp7
5 = YV0_CW. YV0 if JPLDI4 on, else JPLDOp6
6 = YV1_On. JPLDO9
7 = YV1_CW. JPLDO8
Byte 1 (APU @510001, MSM @500013)
8 = Fluid sense (JSNS1 to 8) strobe. Posedge triggers 10us pulse.
9 = Beep (1 = on)
10 = Green (1 = on)
11 = Yellow (1 = on)
12 = Red (1 = on)
13 = JTESTp1 (PLD pin 60)
14 = JSOL3_HI (SOL25_HI)
15 = JSOL3_ON (SOL25_ON)
Byte 2 (APU @510002, MSM @500014)
16 = JSOL4 high (SOL1_HI)
17 = JSOL4 on (SOL1_ON)
18 = JSOL5 high (SOL2_HI)
19 = JSOL5 on (SOL2_ON)
20 = JSOL6 high (SOL3_HI)
21 = JSOL6 on (SOL3_ON)
22 = JSOL7 high (SOL4_HI)
23 = JSOL7 on (SOL4_ON)
...
.... Input Total 24 bits ..........
Bit space addresses for APU2/APU4/MSM
First register: byte address APU2 @51003F, APU4 @520007, MSM @500045
248/56/40 = General Sensor 1 JSNSR1 pin 2 (3 = GND, 1 = +5)
249/57/41 = General Sensor 2 JSNSR2
250/58/42 = General Sensor 3 JSNSR3
251/59/43 = General Sensor 4 JSNSR4
252/60/44 = General Sensor 5 JSNSR5
253/61/45 = General Sensor 6 JSNSR6
254/62/46 = General Sensor 7 JSNSR7
255/63/47 = General Sensor 8 JSNSR8
Second register: byte address APU2 @51003E, APU4 @520006, MSM @500044
240/48/32 = Strobed Fluid Sensor 1 JSNS1 pin 1 (2 = GND).
241/49/33 = Strobed Fluid Sensor 2 JSNS2
242/50/34 = Strobed Fluid Sensor 3 JSNS3
243/51/35 = Strobed Fluid Sensor 4 JSNS4
244/52/36 = Strobed Fluid Sensor 5 JSNS5
245/53/37 = Strobed Fluid Sensor 6 JSNS6
246/54/38 = Strobed Fluid Sensor 7 JSNS7
247/55/39 = Strobed Fluid Sensor 8 JSNS8
Third register: byte address APU2 @51003D, APU4 @520005, MSM @500043
232/40/24 = SV0 CWACK JOSV0 pin 6
233/41/25 = SV0 CCWACK JOSV0 pin 7
234/42/26 = SV1 CWACK JOSV0 pin 6
235/43/27 = SV1 CCWACK JOSV0 pin 7
236/44/28 = Y-Valve 0 CW Limit JY01 pin 2
237/45/29 = Y-Valve 0 CCW Limit JY02 pin 2
238/46/30 = Y-Valve 1 CW Limit JY03 pin 2
239/47/31 = Y-Valve 1 CCW Limit JY04 pin 2
..... Strobed Fluid Sensing ........
The strobe (output bit 8) is low normally. Writing a 1 to it triggers the
10 usec pulse to the sensors. It is a low-to-high edge triggered event. The
bit can be be written back to a low at any time during or after the 10 usec.
The next low-to-high transition of the bit will cause another 10 usec pulse.
The fluid sensors are 1 when open, 0 when sensing fluid.
...