High Performance Computing
Clusters are comprised of racks of small computers ( computer nodes), linked to each other by a high-speed internal network. The operation of the nodes is controlled by a master node, usually called the "head node". User data storage may be provided in several ways:
- Disk on the head node
- NFS (Network File System) - mounted disk on the SAN (Storage Area Network)
- AFS disk on the SAN
- Local disk on each node (used for temporary files)
Generally, users' storage needs can be met be one of these methods, or by a combination thereof.
Clusters are designed to support parallel processing, using message-passing interface (MPI) software, for rapid communication between nodes. However, clusters can also be used to run serial jobs on a number of nodes at once, with no communication between the computer nodes.
The scheduling and running of jobs on the computer nodes is done by a scheduler (Sun Grid Engine, or SGE) running on the head node. All jobs, whether parallel or serial, must be submitted by the user via SGE, from whence they are run on the computer nodes. This means that users are not permitted to run computer jobs on the head node - i.e., the node that the user logs into to submit a job via SGE: cappl, hydra, or kong.
There are currently three clusters in operation:
- cappl.njit.edu - Restricted access only.
- hydra.njit.edu - Restricted access only.
- kong.njit.edu - General and restricted access.
For specification details of the three clusters, see: Cluster Specifications.
The operating system running on all clusters is Linux. Users who are unfamiliar with Unix or Linux can contact ucssys@njit.edu for guidance.
Faculty and researchers who wish to either have access to the cluster kong.njit.edu for themselves or their students, or to purchase nodes on kong for their own, dedicated use should send email requests to hpc@oak.njit.edu. Faculty/staff accounts are automatically removed without notification at termination of employment. All other types of accounts are automatically removed without notification upon separation or graduation from NJIT, or after one year of inactivity. Removed accounts can be reinstated by request of faculty/staff to ucssys@njit.edu. Files belonging to expired accounts will be kept on backup for eleven months and can also be restored by request.
All NJIT clusters are managed by staff of IST University Computing Systems. Access to any of these resources requires a Highlander AFS account.
Computational Grids vs. Computer Clusters
Computational grids enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources (such as supercomputers, computer clusters, storage systems, data sources, instruments, people) and presents them as a single, unified resource for solving large-scale computer and data intensive computing applications.
NJIT is in the process of standardizing on groups ("clusters") of commodity computers as the vehicle for providing high performance computing services for researchers. A cluster is a computing approach with a relatively large number of processors with very high-speed interconnects under the control of specialized scheduling and resource management software. Clusters are designed for parallel processing. These computer nodes can act independently, or in parallel, to handle large-scale computationally demanding tasks. The user interacts with the cluster via scheduling and resource management software. This software can perform the same functions for groups of clusters (or other computational elements). Scheduling and allocation of resources are determined by policies that are dependent on various factors, including individual ownership of nodes in a cluster.
The key difference between clusters and grids is in the way resources are managed. In a computer cluster, all resources are managed by a centralized resource manager and nodes work cooperatively as a single unified resource. In a grid, each node has its own resource manager and the gird engine software works to find resources available on the grid to share and aggregate distributed computational resources and deliver them as a service. Grids are useful in allowing small computation resources to contribute to the solution of a task that is far too large for any one of those resources to handle. This idea is analogous to electric power network grids, where power generators are distributed, but the users are able to access electric power without bothering about the source of energy and its location.
For further information on grids, visit the Grid Computing Information Centre .


Information Services and Technology