users:guides:structure_with

Settings

Let's create a new directory where we put the files we will need.

~$ mkdir Structure_job

~$ cp /home/usuaris/miguel.omullony/Structure_files/*  Structure_job/

In Structurejob we will see three files: structure.R, genjoblist.py, structure.sh

With Filezilla we will copy the data table that we want to analyze in the same directory.

R code

Structure.R

The code works calling Structure from the path where is installed and we choose the parameters in the function instead of using the mainparams file. It needs a 'joblist.txt' file (where K, burnins and repetitions are indicated) and a directory where the outputs results are saved. We will generate the joblist/s easily with gen_joblist.py.

Here is the R script, the same as the one you copied before. Just modify the path lines and the loop 'for' with the number of joblist to work with.

 ### Using Structure from R with parallel package

library(ParallelStructure)

setwd("/home/usuaris/user/Structure_job/") ### All files has to be here (data.txt, joblists...)

system('mkdir structure_results') ### Directory to save the results

# path of Structure. Don't modify this.
my_path = "/home/soft/Structure/console/"

# Function to call structure, in this case ten times because we have ten joblists.
# Modify the number to equal the number of joblist files.
# Here you specify the parameters like in the mainparams file and how many processors # you'll use to parallelize

for (i in 1:10) {
parallel_structure(structure_path=my_path, joblist = paste('joblist', i, '.txt', sep = ''), n_cpu=20, infile='tabla.txt', outpath='structure_results/', numinds=680, numloci=17, printqhat=1, plot_output=1, noadmix=0, linkage=0, label=1, markernames=1, popdata=1, locdata=1, missing=-9, ploidy=2, inferalpha=1, onerowperind = 1)
}

Generating joblists

To generate a number x of joblists faster I have created a python script. Just write the numbers of the columns as it's asked when you execute the program.

Here an example:

~$ python gen_joblists.py

###WARNING! Insert only numbers WARNING!###
how many joblists? > 5
number of populations? > 20
number of K's > 11
how many burnins? > 1000 
how many reps? > 2000

You'll see five new files. Type 'ls' and 'nano joblistx.txt'

miguel.omullony@cluster-ceab:~/Structure_job$ ls
gen_joblist.py  joblist1.txt  joblist2.txt  joblist3.txt  joblist4.txt  joblist5.txt  structure.R

nano joblist1.txt

T1-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 1 1000 2000
T2-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 2 1000 2000
T3-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 3 1000 2000
T4-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 4 1000 2000
T5-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 5 1000 2000
T6-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 6 1000 2000
T7-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 7 1000 2000
T8-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 8 1000 2000
T9-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 9 1000 2000
T10-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 10 1000 2000
T11-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 11 1000 2000

Now we have all we need to run Structure sending the job with qsub.

qsub

As always to send a job we do it with a 'script.sh' file. We copied one before in the Structure_job directory (structure.sh). Just modify the path. If it's your first job you can check this TODO (escribir enlace a la página)

nano structure.sh

#!/bin/bash

cd /home/usuaris/<USER>/Structure_job   # write your user 

/home/soft/R-3.3.1/bin/R --vanilla < structure.R > output.txt 2> error.txt

Press ctrl+X to exit, don't forget to save changes.

Now send it to a queue with qsub. Take the same processors as indicated in the R script. It does not need to much RAM, just the assigned as default.

qsub -pe make 20 structure.sh

zip file

Structure generate a lot of output files (ending with _f, _q and .pdf's). Once it finished is useful to compress in a zip file all of them or just the ones we need. If we want just the _f's:

cd Structure_results/

zip files_f.zip *_f

With this will obtain a 'files_f.zip' with all files ending in _f. (If we write *f it will compress the pdf's as well).