User Tools

Site Tools


users:introduction:login_and_sending_jobs_to_the_cluster

Login and sending jobs to CBLab

The Computational Biology Lab cluster is a set of computers and programs that let make simultaneous calculations. We can divide this set in two big groups, the nodes (that make all the calculations) and the support system. All the applications of the cluster run in a Bio·Linux8 (Ubuntu 14.04LTS).

User data are stored in the /home/usuaris/<user_name> directory and we can find most of the programs that run in the cluster in /home/soft/<program> directory. There is a /home/db directory where we want to store all the databases necessary for user's work and others that can be useful for all users.

By default, a maximum 4GB RAM and one processor (from 284 installed) are assigned to each process sended, but we will see how to modify this.

You can download a presentation here: CBLab tutorial (Español)

How to log in

We use a ssh connection to the cluster. For Linux and Mac we simply write in the terminal:

ssh nom_usuari@cluster-ceab.ceab.csic.es

Windows' users can login too using putty, ssh-client, etc.

If the user wants to upload/download files we recommend filezilla or from the terminal using scp.

How to send jobs to the cluster

The first thing we have to do is to create a plaintext file (e.g. a .txt file, and NO html, .docx etc) where we will write the commands or instructions (a script) with all the options and also the path of the file or files that the cluster will need to work.

snippet.bash
#!/bin/bash
 
### Codi R
 
cd  ~/research/gTiger/tiger_risk_estimation/scripts
 
/home/soft/R-3.2.1/bin/R CMD BATCH --no-save --no-restore t002.0_sampling_effort_overlay_rsample.r t002.0_sampling_effort_overlay_rsample.out

We could complete our script adding other options that give us useful information:

< <input_file_name>

Standard Input

> <output_file_name>

Standard Output, this file will contain all the output generated by the program.

2> <error_file_name>

Standard Error, if the job fails it will give us information about errors.

Written in our script it would look like this:

snippet.bash
#!/bin/bash
 
### Codi R
 
(...)
 
/home/soft/program_name < ''input_file_name'' > ''output_file_name'' 2> ''nom_fitxer_error''

Once the script is written we save it as a .sh file. Make sure all the files that the cluster will need are in the path you have indicated previously.

From the console, we will send our job using the command qsub

qsub script_name.sh

By default (with no options), a maximum 4GB RAM and one processor (from 284 installed) are assigned to each process sended. Now we will see how to modify this. Also, each job sended, when finished, it will generate two files <script.name>.o<job_id> (output file) and <script.name>.e<job_id> (error file, empty if there is no errors).

qsub options

These are some useful options:

qsub -l h_vmem=<x>G

This one will let us assign a different amount of RAM. 'Attention' RAM is assigned per core (processor) not for job.

qsub -l h_vmem=10G ... <script_name>.sh

qsub -pe make <n_processors>

With this option we will activate the parallelization environment 'make'. It is an intra-node parallelization, so the maximum number of processors that we can use is limited by hardware (64 cores max). Exemple:

qsub -pe make 10 <script_name>.sh

e.g. We want to send a job with 20 cores and we want to assign 100G RAM, then:

qsub -pe make 20 -l h_vmem=5G <script_name>.sh

Please, consider that assigning more RAM or processors than necessary will “block” resources that other users could use.

qsub -m bea -M <user_mail>

Using this option we will receive a mail at the beginning and at the end or when we abort the sended job.

qsub -m bea -M x.roijals@ceab.csic.es

qsub -q ceab@nodexxx

Send a job to a chosen node where xxx is the number of the node (100 to 112)

qsub -e error -o output

By default these files are created, with this option we can choose the name of them. Just write this:

qsub -e <error_file_name.txt> -o <output_file_name.txt> <script.name>.sh

You can add all these option in one:

qsub qsub -pe make 10 -l h_vmem=10G -M <an_email> -m bea -q ceab@nodexxx <script_name>.sh

Login in a node

Sometimes it can be useful to enter in a node where a job is running or for other reasons. We use the command qlogin for this. It is choose at random with no options and at least one processor of the node has to be free. You can use the same commands of the qsub too. In this case you don't need to indicate a script.sh

qlogin <options>

Check job status

If we want to know which of our jobs are runnning in the cluster we use the command qstat. It show us the job ID and the node where it is running.

qstat

To know other user's job:

qstat -u <login_user_name>

Or all users jobs at the same time:

qstat -u "*"

If we want more information about one particular job

qstat -j <job_ID>

If we want to know the number of cores are running jobs in the nodes:

xavier.roijals@cluster-ceab:/home/soft$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
ceab@node100                   BIP   0/0/16         0.08     linux-x64     
---------------------------------------------------------------------------------
ceab@node101                   BIP   0/16/16        15.87    linux-x64     
---------------------------------------------------------------------------------
ceab@node102                   BIP   0/0/16         0.08     linux-x64     
---------------------------------------------------------------------------------
ceab@node103                   BIP   0/0/16         1.01     linux-x64     d
---------------------------------------------------------------------------------
ceab@node104                   BIP   0/0/16         0.05     linux-x64     
---------------------------------------------------------------------------------
ceab@node105                   BIP   0/0/20         0.00     linux-x64     
---------------------------------------------------------------------------------
ceab@node106                   BIP   0/20/20        19.84    linux-x64     
---------------------------------------------------------------------------------
ceab@node107                   BIP   0/0/20         0.00     linux-x64     
---------------------------------------------------------------------------------
ceab@node108                   BIP   0/0/20         0.02     linux-x64     
---------------------------------------------------------------------------------
ceab@node109                   BIP   0/10/20        10.03    linux-x64     
---------------------------------------------------------------------------------
ceab@node110                   BIP   0/62/64        3.97     linux-x64     
---------------------------------------------------------------------------------
ceab@node111                   BIP   0/6/20         5.97     linux-x64     
---------------------------------------------------------------------------------
ceab@node112                   BIP   0/0/20         0.00     linux-x64     

Maybe we want a more detailed report:

xavier.roijals@cluster-ceab:/home/soft$ qstat -f -u "*"
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
ceab@node100                   BIP   0/0/16         3.21     linux-x64     
---------------------------------------------------------------------------------
ceab@node101                   BIP   0/16/16        9.28     linux-x64     
  89406 0.58975 Rsnow      jgarriga     r     01/19/2021 16:47:07    16        
---------------------------------------------------------------------------------
ceab@node102                   BIP   0/0/16         3.20     linux-x64     
---------------------------------------------------------------------------------
ceab@node103                   BIP   0/0/16         1.00     linux-x64     d
---------------------------------------------------------------------------------
ceab@node104                   BIP   0/1/16         2.96     linux-x64     
  89391 0.50500 QLOGIN     fbartu       r     01/19/2021 13:29:51     1        
---------------------------------------------------------------------------------
ceab@node105                   BIP   0/0/20         0.04     linux-x64     
---------------------------------------------------------------------------------
ceab@node106                   BIP   0/20/20        11.64    linux-x64     
  89406 0.58975 Rsnow      jgarriga     r     01/19/2021 16:47:07    20        
---------------------------------------------------------------------------------
ceab@node107                   BIP   0/1/20         0.07     linux-x64     
  89405 0.50500 QLOGIN     m.pardo      r     01/19/2021 16:30:07     1        
---------------------------------------------------------------------------------
ceab@node108                   BIP   0/0/20         0.79     linux-x64     
---------------------------------------------------------------------------------
ceab@node109                   BIP   0/10/20        6.19     linux-x64     
  88997 0.50500 FEElnc_spu c.pegueroles r     12/19/2020 11:38:07     1        
  89406 0.58975 Rsnow      jgarriga     r     01/19/2021 16:47:07     9        
---------------------------------------------------------------------------------
ceab@node110                   BIP   0/62/64        3.94     linux-x64     
  88954 0.50500 QLOGIN     j.palmer     r     12/17/2020 13:31:41     1        
  89348 0.60500 scktjob    rlloret      r     01/17/2021 08:48:01    60        
  89387 0.50500 R          pol.fernande r     01/19/2021 13:16:53     1        
---------------------------------------------------------------------------------
ceab@node111                   BIP   0/6/20         3.74     linux-x64     
  89406 0.58975 Rsnow      jgarriga     r     01/19/2021 16:47:07     6        
---------------------------------------------------------------------------------
ceab@node112                   BIP   0/0/20         0.00     linux-x64     

How to know if a job is parallelizing (or see the number of running cores in a node):

$ export TERM=xterm;htop

or if we want:

$ export TERM=xterm 

and then:

$ htop

"Kill" jobs

If we want to end a job before it is finished:

qdel <job_ID>
users/introduction/login_and_sending_jobs_to_the_cluster.txt · Last modified: 2021/01/19 16:12 by admins_ceab