 
  
  
  
  
 Machine type: Distributed-memory multi-vectorprocessor.
 
 Models: Computing Surface 2.
 
 Operating system: Internal OS transparent to the
user, SunOS (Sun's Unix variant) on the front-end system.
 
 Connection structure: Multistage crossbar.
 
 Compilers: Extended Fortran 77, ANSI C.
System parameters:

Performance:

 
 Note:  and
  and  for a  64 node are quoted.
  for a  64 node are quoted.
 
 The CS-2 features 8-1,024 processor elements (PEs) which can
be either scalar or vector nodes.  Apart from a separate communications module,
these PEs contain either a SuperSparc or a SuperSparc + 2  VP
vectorprocessors. The speed of a scalar PE is estimated to be 40 Mflop/s (at a
20 ns clock) and 200 Mflop/s for the vector PEs for 64-bit precision. The
VP
vectorprocessors. The speed of a scalar PE is estimated to be 40 Mflop/s (at a
20 ns clock) and 200 Mflop/s for the vector PEs for 64-bit precision. The
 VP modules are manufactured by Fujitsu. The speed at 32-bit precision is
doubled with respect to 64-bit operation and, unlike the earlier Fujitsu VP
products, use IEEE 754 floating-point  format.  The memory has 16 banks and to avoid memory
bank conflicts the CS-2 has the interesting option to have scrambled allocation
of addresses, thus guaranteeing good access at potential problematic strides 2,
4, etc.
VP modules are manufactured by Fujitsu. The speed at 32-bit precision is
doubled with respect to 64-bit operation and, unlike the earlier Fujitsu VP
products, use IEEE 754 floating-point  format.  The memory has 16 banks and to avoid memory
bank conflicts the CS-2 has the interesting option to have scrambled allocation
of addresses, thus guaranteeing good access at potential problematic strides 2,
4, etc.
The point-to-point communication speed is 100 MB/s (50 MB/s in each direction). Because the communication happens through multi-level crossbars, called ``layers'' by Meiko, the aggregate bandwidth of the system scales with the number of PEs, with a very respectable latency of 200 ns per layer. As the maximum configuration of the machine contains 1,024 PEs, the theoretical peak performance at 64-bit precision is 200 Gflop/s. It is possible to connect each PE to its own I/O devices to have scalable parallel I/O with the scaling of other resources.