Settings
Let's create a new directory where we put the files we will need.
~$ mkdir Structure_job ~$ cp /home/usuaris/miguel.omullony/Structure_files/* Structure_job/
In Structurejob we will see three files: structure.R, genjoblist.py, structure.sh
With Filezilla we will copy the data table that we want to analyze in the same directory.
R code
Structure.R
The code works calling Structure from the path where is installed and we choose the parameters in the function instead of using the mainparams file. It needs a 'joblist.txt' file (where K, burnins and repetitions are indicated) and a directory where the outputs results are saved. We will generate the joblist/s easily with gen_joblist.py.
Here is the R script, the same as the one you copied before. Just modify the path lines and the loop 'for' with the number of joblist to work with.
### Using Structure from R with parallel package library(ParallelStructure) setwd("/home/usuaris/user/Structure_job/") ### All files has to be here (data.txt, joblists...) system('mkdir structure_results') ### Directory to save the results # path of Structure. Don't modify this. my_path = "/home/soft/Structure/console/" # Function to call structure, in this case ten times because we have ten joblists. # Modify the number to equal the number of joblist files. # Here you specify the parameters like in the mainparams file and how many processors # you'll use to parallelize for (i in 1:10) { parallel_structure(structure_path=my_path, joblist = paste('joblist', i, '.txt', sep = ''), n_cpu=20, infile='tabla.txt', outpath='structure_results/', numinds=680, numloci=17, printqhat=1, plot_output=1, noadmix=0, linkage=0, label=1, markernames=1, popdata=1, locdata=1, missing=-9, ploidy=2, inferalpha=1, onerowperind = 1) }
Generating joblists
To generate a number x of joblists faster I have created a python script. Just write the numbers of the columns as it's asked when you execute the program.
Here an example:
~$ python gen_joblists.py ###WARNING! Insert only numbers WARNING!### how many joblists? > 5 number of populations? > 20 number of K's > 11 how many burnins? > 1000 how many reps? > 2000
You'll see five new files. Type 'ls' and 'nano joblistx.txt'
miguel.omullony@cluster-ceab:~/Structure_job$ ls gen_joblist.py joblist1.txt joblist2.txt joblist3.txt joblist4.txt joblist5.txt structure.R nano joblist1.txt T1-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 1 1000 2000 T2-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 2 1000 2000 T3-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 3 1000 2000 T4-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 4 1000 2000 T5-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 5 1000 2000 T6-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 6 1000 2000 T7-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 7 1000 2000 T8-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 8 1000 2000 T9-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 9 1000 2000 T10-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 10 1000 2000 T11-1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 11 1000 2000
Now we have all we need to run Structure sending the job with qsub.
qsub
As always to send a job we do it with a 'script.sh' file. We copied one before in the Structure_job directory (structure.sh). Just modify the path. If it's your first job you can check this TODO (escribir enlace a la página)
nano structure.sh #!/bin/bash cd /home/usuaris/<USER>/Structure_job # write your user /home/soft/R-3.3.1/bin/R --vanilla < structure.R > output.txt 2> error.txt
Press ctrl+X to exit, don't forget to save changes.
Now send it to a queue with qsub. Take the same processors as indicated in the R script. It does not need to much RAM, just the assigned as default.
qsub -pe make 20 structure.sh
zip file
Structure generate a lot of output files (ending with _f, _q and .pdf's). Once it finished is useful to compress in a zip file all of them or just the ones we need. If we want just the _f's:
cd Structure_results/ zip files_f.zip *_f
With this will obtain a 'files_f.zip' with all files ending in _f. (If we write *f it will compress the pdf's as well).