Rerunning GOM Illumina on QIIME 1.8

I’ve spent the last few day re-parsing data to get around some issues related to the way the sequencing facility handed me the GOM Illumina data (ended up needing to separate out primer sets and renumber demultiplexed sequences using the renumber_rev_reads_v2.pl script in the GOM_Illumina/QIIME_files/Dec_2013 folder on Dropbox). In any case, continuing with analysis on QIIME 1.8.

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_co
ncat1.7_allF04combo_10Jan14.fna -o /home/ubuntu/data/uclust_openref99_F04_10Jan 
-r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s
 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --p
refilter_percent_id 0.0
pick_open_reference_otus.py -i /home/ubuntu/data/GOM_co
ncat1.7_allR22combo_10Jan14.fna -o /home/ubuntu/data/uclust_openref99_R22_10Jan 
-r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s
 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --p
refilter_percent_id 0.0

Got a weird error on AWS when I started ran these above script – “OpenBLAS : Your OS does not support AVX instructions. OpenBLAS is using Nehalem kernels as a fallback, which may give poorer performance.” Not sure if this will affect anything, but the open ref OTU picking seems to be progressing OK regardless (for now…).

Advertisements

Parallel Aligning Sequences, yet again

Reattempting to align sequences so I can move forward with some cursory core diversity analyses. This time I opted for an AWS m2.2xlarge (only 32GB memory), since I the larger instance was just not wanting to align anything. Same errors with this script, even after changing the qiime_config file.

parallel_align_seqs_pynast.py -i /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13/rep_set.fna -o /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13/pynast_aligned_seqs -T –jobs_to_start 4 –template_fp /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk_aligned.fasta –pairwise_alignment_method uclust –min_percent_id 70.0 –min_length 150

Finally posted a help message on the QIIME Google Group, because this issue has been frustrating me long enough: https://groups.google.com/forum/#!topic/qiime-forum/f2tA2a97OxE

18S_chimera – 100% subsampled OTUs

Been thinking about OTU picking, and if we really want to figure out how chimeric sequences are being incorporated into OTUs, I have to cluster 100% of the failure OTUs. So running another round of analyses:

96 clustering:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 1.0 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_96_amazon.txt

99% clustering:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref99_alldenovo_20Oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 1.0 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_99_amazon.txt

Organizing GOM Illumina data

Organizing GOM analyses run to data – downloaded completed runs onto 1TB external hard drive, along with parameter files (and copied command ran into a comment line at the top of the parameter file). Proceeding with more AWS analysis.

Forward reads at 96% (m2.4xlarge was running out of memory, so dropped down to 6 parallel jobs):

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /home/ubuntu/data/uclust_openref96_fwd_16Sept -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 6 -s 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref96_GOMamazon_16sept.txt --prefilter_percent_id 0.0

(9/20/13) Forward reads at 99% – kept at 6 parallel jobs

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /home/ubuntu/data/uclust_openref99_fwd_20Sept -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 6 -s 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --prefilter_percent_id 0.0

Reverse reads at 99%:

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_concat1.7_rev_demulti_1to12_2.fna -o /home/ubuntu/data/uclust_openref99_rev_16Sept -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --prefilter_percent_id 0.0

#indoorevol metanalyses – fungi

Getting down to meta-analysis for the indoorevol project. Running Closed Ref OTU picking on Amazon Cloud, with a new fungi parameter file I compiled. Command ran:

pick_closed_reference_otus.py -i /home/ubuntu/data/fungi/Fungal_long_seqs.fasta -o /home/ubuntu/data/fungi/uclust_closedref_97 -r /home/ubuntu/data/fungi/its_12_11_otus/rep_set/99_otus.fasta -t /home/ubuntu/data/fungi/its_12_11_otus/taxonomy/99_otu_taxonomy.txt --parallel -O 8 -p /home/ubuntu/data/fungi/qiime_parameters_fungi.txt

parallel_align_seqs_pynast.py not working on EC2

The pick_open_reference_otus.py script finished for GOM Illumina reverse reads clustered at 96% – however – the parallel_align_seqs_pynast.py script was not working…the log file indicated the script had been executed, but top command showed nothing running. Tried running the script manually (below command), both with and without the -T flag. Also didn’t seem to work – the paralell jobs would start but then would all end for some reason..

parallel_align_seqs_pynast.py -i /home/ubuntu/gom_data/uclust_openref96_ref_27Aug/rep_set.fna -o /home/ubuntu/gom_data/uclust_openref96_ref_27Aug/pynast_aligned_seqs_manual -O 8 -t /home/ubuntu/gom_data/99_Silva_111_rep_set_euk_aligned.fasta -a uclust -p 70.0

So instead of worrying about that for now, I’m going to move on to 99% clustering on the reverse reads to see how long this takes. (note: the 96% cutoff finished overnight, for a ~1.5GB file on an m2.4xlarge instance)

pick_open_reference_otus.py -i /home/ubuntu/gom_data/GOM_concat1.7_rev_demulti_1to12_2.fna -o /home/ubuntu/gom_data/uclust_openref99_28Aug -r /home/ubuntu/gom_data/99_Silva_111_rep_set_euk.fasta --parallel -O 10 -s 0.1 -p /home/ubuntu/gom_data/qiime_parameters_uclust99gom_aws28Aug.txt --prefilter_percent_id 0.0

Reverting to AWS SSHing

Running long jobs with StarCluster/iPython notebook is giving me issues (need to troubleshoot these with the QIIME forum)…so just to get data run I’m moving back to standard SSHing into Amazon AWS for the GOM illumina data:

pick_open_reference_otus.py -i /home/ubuntu/gom_data/GOM_concat1.7_rev_demulti_1to12_2.fna -o /home/ubuntu/gom_data/uclust_openref96_ref_27Aug -r /home/ubuntu/gom_data/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 -p /home/ubuntu/gom_data/qiime_parameters_uclust96gom_aws27aug.txt --prefilter_percent_id 0.0