Table of Contents
Login and sending jobs to CBLab
The Computational Biology Lab cluster is a set of computers and programs that let make simultaneous calculations. We can divide this set in two big groups, the nodes (that make all the calculations) and the support system. All the applications of the cluster run in a Bio·Linux8 (Ubuntu 14.04LTS).
User data are stored in the /home/usuaris/<user_name>
directory and we can find most of the programs that run in the cluster in /home/soft/<program>
directory. There is a /home/db
directory where we want to store all the databases necessary for user's work and others that can be useful for all users.
By default, a maximum 4GB RAM and one processor (from 284 installed) are assigned to each process sended, but we will see how to modify this.
You can download a presentation here: CBLab tutorial (Español)
How to log in
We use a ssh connection to the cluster. For Linux and Mac we simply write in the terminal:
ssh nom_usuari@cluster-ceab.ceab.csic.es
Windows' users can login too using putty, ssh-client, etc.
If the user wants to upload/download files we recommend filezilla or from the terminal using scp.
How to send jobs to the cluster
The first thing we have to do is to create a plaintext file (e.g. a .txt file, and NO html, .docx etc) where we will write the commands or instructions (a script) with all the options and also the path of the file or files that the cluster will need to work.
- snippet.bash
#!/bin/bash ### Codi R cd ~/research/gTiger/tiger_risk_estimation/scripts /home/soft/R-3.2.1/bin/R CMD BATCH --no-save --no-restore t002.0_sampling_effort_overlay_rsample.r t002.0_sampling_effort_overlay_rsample.out
We could complete our script adding other options that give us useful information:
<
<input_file_name>
Standard Input
>
<output_file_name>
Standard Output, this file will contain all the output generated by the program.
2>
<error_file_name>
Standard Error, if the job fails it will give us information about errors.
Written in our script it would look like this:
- snippet.bash
#!/bin/bash ### Codi R (...) /home/soft/program_name < ''input_file_name'' > ''output_file_name'' 2> ''nom_fitxer_error''
Once the script is written we save it as a .sh
file. Make sure all the files that the cluster will need are in the path you have indicated previously.
From the console, we will send our job using the command qsub
qsub script_name.sh
By default (with no options), a maximum 4GB RAM and one processor (from 284 installed) are assigned to each process sended. Now we will see how to modify this.
Also, each job sended, when finished, it will generate two files <script.name>.o<job_id>
(output file) and <script.name>.e<job_id>
(error file, empty if there is no errors).
qsub options
These are some useful options:
qsub -l h_vmem=<x>G
This one will let us assign a different amount of RAM. 'Attention
' RAM is assigned per core (processor) not for job.
qsub -l h_vmem=10G ... <script_name>.sh
qsub -pe make <n_processors>
With this option we will activate the parallelization environment 'make'. It is an intra-node parallelization, so the maximum number of processors that we can use is limited by hardware (64 cores max). Exemple:
qsub -pe make 10 <script_name>.sh
e.g. We want to send a job with 20 cores and we want to assign 100G RAM, then:
qsub -pe make 20 -l h_vmem=5G <script_name>.sh
Please, consider that assigning more RAM or processors than necessary will “block” resources that other users could use.
qsub -m bea -M <user_mail>
Using this option we will receive a mail at the beginning and at the end or when we abort the sended job.
qsub -m bea -M x.roijals@ceab.csic.es
qsub -q ceab@nodexxx
Send a job to a chosen node where xxx is the number of the node (100 to 112)
qsub -e error -o output
By default these files are created, with this option we can choose the name of them. Just write this:
qsub -e <error_file_name.txt> -o <output_file_name.txt> <script.name>.sh
You can add all these option in one:
qsub qsub -pe make 10 -l h_vmem=10G -M <an_email> -m bea -q ceab@nodexxx <script_name>.sh
Login in a node
Sometimes it can be useful to enter in a node where a job is running or for other reasons. We use the command qlogin for this. It is choose at random with no options and at least one processor of the node has to be free. You can use the same commands of the qsub too. In this case you don't need to indicate a script.sh
qlogin <options>
Check job status
If we want to know which of our jobs are runnning in the cluster we use the command qstat. It show us the job ID and the node where it is running.
qstat
To know other user's job:
qstat -u <login_user_name>
Or all users jobs at the same time:
qstat -u "*"
If we want more information about one particular job
qstat -j <job_ID>
If we want to know the number of cores are running jobs in the nodes:
xavier.roijals@cluster-ceab:/home/soft$ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- ceab@node100 BIP 0/0/16 0.08 linux-x64 --------------------------------------------------------------------------------- ceab@node101 BIP 0/16/16 15.87 linux-x64 --------------------------------------------------------------------------------- ceab@node102 BIP 0/0/16 0.08 linux-x64 --------------------------------------------------------------------------------- ceab@node103 BIP 0/0/16 1.01 linux-x64 d --------------------------------------------------------------------------------- ceab@node104 BIP 0/0/16 0.05 linux-x64 --------------------------------------------------------------------------------- ceab@node105 BIP 0/0/20 0.00 linux-x64 --------------------------------------------------------------------------------- ceab@node106 BIP 0/20/20 19.84 linux-x64 --------------------------------------------------------------------------------- ceab@node107 BIP 0/0/20 0.00 linux-x64 --------------------------------------------------------------------------------- ceab@node108 BIP 0/0/20 0.02 linux-x64 --------------------------------------------------------------------------------- ceab@node109 BIP 0/10/20 10.03 linux-x64 --------------------------------------------------------------------------------- ceab@node110 BIP 0/62/64 3.97 linux-x64 --------------------------------------------------------------------------------- ceab@node111 BIP 0/6/20 5.97 linux-x64 --------------------------------------------------------------------------------- ceab@node112 BIP 0/0/20 0.00 linux-x64
Maybe we want a more detailed report:
xavier.roijals@cluster-ceab:/home/soft$ qstat -f -u "*" queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- ceab@node100 BIP 0/0/16 3.21 linux-x64 --------------------------------------------------------------------------------- ceab@node101 BIP 0/16/16 9.28 linux-x64 89406 0.58975 Rsnow jgarriga r 01/19/2021 16:47:07 16 --------------------------------------------------------------------------------- ceab@node102 BIP 0/0/16 3.20 linux-x64 --------------------------------------------------------------------------------- ceab@node103 BIP 0/0/16 1.00 linux-x64 d --------------------------------------------------------------------------------- ceab@node104 BIP 0/1/16 2.96 linux-x64 89391 0.50500 QLOGIN fbartu r 01/19/2021 13:29:51 1 --------------------------------------------------------------------------------- ceab@node105 BIP 0/0/20 0.04 linux-x64 --------------------------------------------------------------------------------- ceab@node106 BIP 0/20/20 11.64 linux-x64 89406 0.58975 Rsnow jgarriga r 01/19/2021 16:47:07 20 --------------------------------------------------------------------------------- ceab@node107 BIP 0/1/20 0.07 linux-x64 89405 0.50500 QLOGIN m.pardo r 01/19/2021 16:30:07 1 --------------------------------------------------------------------------------- ceab@node108 BIP 0/0/20 0.79 linux-x64 --------------------------------------------------------------------------------- ceab@node109 BIP 0/10/20 6.19 linux-x64 88997 0.50500 FEElnc_spu c.pegueroles r 12/19/2020 11:38:07 1 89406 0.58975 Rsnow jgarriga r 01/19/2021 16:47:07 9 --------------------------------------------------------------------------------- ceab@node110 BIP 0/62/64 3.94 linux-x64 88954 0.50500 QLOGIN j.palmer r 12/17/2020 13:31:41 1 89348 0.60500 scktjob rlloret r 01/17/2021 08:48:01 60 89387 0.50500 R pol.fernande r 01/19/2021 13:16:53 1 --------------------------------------------------------------------------------- ceab@node111 BIP 0/6/20 3.74 linux-x64 89406 0.58975 Rsnow jgarriga r 01/19/2021 16:47:07 6 --------------------------------------------------------------------------------- ceab@node112 BIP 0/0/20 0.00 linux-x64
How to know if a job is parallelizing (or see the number of running cores in a node):
$ export TERM=xterm;htop
or if we want:
$ export TERM=xterm
and then:
$ htop
"Kill" jobs
If we want to end a job before it is finished:
qdel <job_ID>