Quad-way Symmetric Multi-Processing Server
SATPURA
System Details
|
Sl. No. |
Description |
Quantity |
Hostname |
User group |
Remarks |
|
1 |
Quad-way SMP Supermicro Rack Server (2U) with 4xAMD Opteron 2.1GHz 12 Core processor, 192GB DDR3 memory, 1Giga LANs, Infini Band QDR interface, 4x2TB SATA disks, running Debian wheezy(kernel 3.2.0-1-amd64) O/S (connected through the 18ports Infini Band switch to form as Cluster) - 48 Cores per node |
7 |
satpura satpura1 |
General use |
Available for General Use |
|
satpura2 satpura3 satpura4 satpura5 |
Project use Dr. Satyavani Vemparalla |
Reserved |
|||
|
satpura6 |
Project use Dr. Rahul Sinha |
Reserved |
HPL Benchmark Report for satpura:
HPL is complied using gcc-4.6 and OpenMPI-1.4.3 and linked with ACML-5.1.0 (LAPACK & BLAS). The benchmarking test run on each node, all nodes together with various options and the results are as follows:
|
|
Computed (Giga Flops) |
Theoretical Peak (Giga Flops) |
HPL Variables Values |
Efficiency % |
|---|---|---|---|---|
|
Single Node |
275.4 |
393 |
N =102144 NBS= 224 Ps = 1 Qs = 48 |
70.07 |
|
All Nodes as a Cluster |
1848 |
2755 |
N =314496 NBS= 224 Ps = 7 Qs = 48 |
67.07 |
Installed Packages:
|
Section |
Name |
Version |
Remark |
|
Compilers |
gcc g++ gfortran f95
openmpi mpicc mpiCC mpif77 mpif90 mpic++ mpicxx
python perl |
4.6 , 4.4 4.6 , 4.4 4.6 , 4.4 4.6
1.4.3 gcc-4.6 with openmpi-1.4.3 '' '' '' '' ''
2.7.2+ 5.14.2 |
OpenMPI will berform well with IB network in terms of node to node communication and Shard Memory in terms of within the node. |
|
Libraries |
atlas blas lapack gsl |
0.6.1 1.2 3.3.1 1.15 |
|
|
Job Scheduler |
torque |
2.4.16 |
Open PBS |
|
Local Packages |
Mathematica Matlab gnuplot NAMD2 ACML
MAUI
MOUDULE
C3 |
8.0 2010b 4.4.0 2.8 5.1.0
3.3.1
3.2.9c
5.1.2 |
Linux-x86_64-ibverbs-smp AMD Core Math Library performing well comparing to native blas, lapack
Maui Scheduler is enabling torque(open PBS) to partition the nodes among the given nodes. so that multiple queue is possible. (open source)
Enable the dynamic loading of given package with specific version.
Execute a command to all the node or selective nodes (Threaded rsh so it is kind of parallel) |
Those who want to use the above facility need to send mails to service@imsc.res.in with details of the codes so that the login facility will be enabled in the nodes.
Login to head node(satpura) to submit jobs:
The allowed user need to login to the head node satpura to submit their codes for execution.
Submitting Jobs:
To submit the jobs after creating executables, the user need to create PBS script file to submit the job in to queue.
Sample PBS script file
Parallel jobs(pbs_scriptp.sh)
(Sample script for running 8 processor job)
#!/bin/bash
#PBS -l nodes=1:ppn=8
#PBS -N testrun
#PBS -q satpuraq
#PBS -V
#PBS -M username@imsc.res.in
#PBS -m abe
#PBS -j oe
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE |grep satpura | sort | uniq -c | awk '{print $2" slots="$1}' > hostfile_$PBS_JOBID
mpirun -v -hostfile hostfile_$PBS_JOBID /executable/path/executable [options_if_any] >& out_$PBS_JOBID
Serial jobs(pbs_scripts.sh)
(Sample script for running single processor job)
#!/bin/bash #PBS -l nodes=1:ppn=1 #PBS -N testrun #PBS -q satpuraq #PBS -V #PBS -M username@imsc.res.in #PBS -m abe #PBS -j oe cd $PBS_O_WORKDIR cat $PBS_NODEFILE > hostfile_$PBS_JOBID /executable/path/executable [options_if_any] >& out_$PBS_JOBID
Command to manage queue:
* To submit a job Queue
qsub pbs_script.sh
Once the queue is accepted by torque it will return a JOBID and that can be used for finding status of the job OR deleting the job. Each operation will take 15 seconds to start executing.
* To know the status of the job
qstat -an1
It will display the status of all the jobs. The out put will have Job ID, Username, Queue, Jobname SessID, NDS, TSK, S, Elap Time.
Here are the important one 'S'. single letter queue status details given bellow
Q - job is queued, eligible to run or routed.
R - job is running.
E - Job is exiting after having run.
H - Job is held.
T - job is being moved to new location.
W - job is waiting for its execution time
(-a option) to be reached.
To delete a Queue
qdel <jobid>
Use the above command more than once with few seconds gap until torque says jobid does not exists.
Wish you all the best