Quad-way Symmetric Multi-Processing Server

SATPURA


System Details

Sl. No.

Description

Quantity

Hostname

User group

Remarks

1

Quad-way SMP Supermicro Rack Server (2U) with 4xAMD Opteron 2.1GHz

12 Core processor,

192GB DDR3 memory,

1Giga LANs, Infini Band QDR interface,

4x2TB SATA disks,

running Debian wheezy(kernel 3.2.0-1-amd64) O/S (connected through the 18ports Infini Band switch to form as Cluster) - 48 Cores per node

7

satpura

satpura1

General use

Available for General Use

satpura2

satpura3

satpura4

satpura5

Project use

Dr. Satyavani Vemparalla

Reserved

satpura6

Project use

Dr. Rahul Sinha

Reserved

Available as Cluster and shared for different user groups

HPL Benchmark Report for satpura:

HPL is complied using gcc-4.6 and OpenMPI-1.4.3 and linked with ACML-5.1.0 (LAPACK & BLAS). The benchmarking test run on each node, all nodes together with various options and the results are as follows:


Computed

(Giga Flops)

Theoretical Peak

(Giga Flops)

HPL Variables Values

Efficiency

%

Single Node

275.4

393

N =102144

NBS= 224

Ps = 1

Qs = 48

70.07

All Nodes as a Cluster

1848

2755

N =314496

NBS= 224

Ps = 7

Qs = 48

67.07

Installed Packages:


Section

Name

Version

Remark

Compilers

gcc

g++

gfortran

f95


openmpi

mpicc

mpiCC

mpif77

mpif90

mpic++

mpicxx


python

perl

4.6 , 4.4

4.6 , 4.4

4.6 , 4.4

4.6


1.4.3

gcc-4.6 with openmpi-1.4.3

''

''

''

''

''


2.7.2+

5.14.2






OpenMPI will berform well with IB network in terms of node to node communication and Shard Memory in terms of within the node.

Libraries

atlas

blas

lapack

gsl

0.6.1

1.2

3.3.1

1.15


Job Scheduler

torque

2.4.16

Open PBS

Local Packages

Mathematica

Matlab

gnuplot

NAMD2

ACML




MAUI






MOUDULE



C3

8.0

2010b

4.4.0

2.8

5.1.0




3.3.1






3.2.9c



5.1.2




Linux-x86_64-ibverbs-smp

AMD Core Math Library

performing well comparing to native blas, lapack


Maui Scheduler is enabling torque(open PBS) to partition the nodes among the given nodes. so that multiple queue is possible.

(open source)


Enable the dynamic loading of given package with specific version.


Execute a command to all the node or selective nodes (Threaded rsh so it is kind of parallel)

The Mathematica and Matlab jobs should be submitted in the
"Non-Interactive background mode only".

Those who want to use the above facility need to send mails to service@imsc.res.in with details of the codes so that the login facility will be enabled in the nodes.

Login to head node(satpura) to submit jobs:

The allowed user need to login to the head node satpura to submit their codes for execution.

Submitting Jobs:

To submit the jobs after creating executables, the user need to create PBS script file to submit the job in to queue.

Sample PBS script file

Parallel jobs(pbs_scriptp.sh)

(Sample script for running 8 processor job)

#!/bin/bash

#PBS -l nodes=1:ppn=8
#PBS -N testrun
#PBS -q satpuraq
#PBS -V
#PBS -M username@imsc.res.in
#PBS -m abe 
#PBS -j oe

cd $PBS_O_WORKDIR

cat $PBS_NODEFILE |grep satpura | sort | uniq -c | awk '{print $2" slots="$1}' > hostfile_$PBS_JOBID

mpirun -v -hostfile hostfile_$PBS_JOBID  /executable/path/executable [options_if_any] >& out_$PBS_JOBID

Serial jobs(pbs_scripts.sh)

(Sample script for running single processor job)

#!/bin/bash

#PBS -l nodes=1:ppn=1
#PBS -N testrun
#PBS -q satpuraq
#PBS -V
#PBS -M username@imsc.res.in
#PBS -m abe 
#PBS -j oe

cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > hostfile_$PBS_JOBID

/executable/path/executable [options_if_any]    >& out_$PBS_JOBID

Command to manage queue:

* To submit a job Queue

qsub pbs_script.sh

Once the queue is accepted by torque it will return a JOBID and that can be used for finding status of the job OR deleting the job. Each operation will take 15 seconds to start executing.

* To know the status of the job

qstat -an1

It will display the status of all the jobs. The out put will have Job ID, Username, Queue, Jobname SessID, NDS, TSK, S, Elap Time.

Here are the important one 'S'. single letter queue status details given bellow

Q - job is queued, eligible to run or routed.

R - job is running.

E - Job is exiting after having run.

H - Job is held.

T - job is being moved to new location.

W - job is waiting for its execution time

(-a option) to be reached.

To delete a Queue

qdel <jobid>

Use the above command more than once with few seconds gap until torque says jobid does not exists.

Wish you all the best