Final Aquarium Project data now ready!

Got the demultiplexed data back from all 4 Aquarium project runs, and now we’re going forward with OTU picking. Starting off with closed ref, to see what we get:

pick_closed_reference_otus.py -i /Users/hollybik/Desktop/Data/aquarium_project/All4MiseqRuns_demult_30Apr14/AQUEXP_all_MPF_2.fasta -o /Users/hollybik/Desktop/Data/aquarium_project/All4MiseqRuns_demult_30Apr14/uclust_closedref_30Apr14 -r /macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta -t /macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt --parallel -O 2
Advertisements

Rerunning GOM Illumina on QIIME 1.8

I’ve spent the last few day re-parsing data to get around some issues related to the way the sequencing facility handed me the GOM Illumina data (ended up needing to separate out primer sets and renumber demultiplexed sequences using the renumber_rev_reads_v2.pl script in the GOM_Illumina/QIIME_files/Dec_2013 folder on Dropbox). In any case, continuing with analysis on QIIME 1.8.

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_co
ncat1.7_allF04combo_10Jan14.fna -o /home/ubuntu/data/uclust_openref99_F04_10Jan 
-r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s
 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --p
refilter_percent_id 0.0
pick_open_reference_otus.py -i /home/ubuntu/data/GOM_co
ncat1.7_allR22combo_10Jan14.fna -o /home/ubuntu/data/uclust_openref99_R22_10Jan 
-r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s
 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --p
refilter_percent_id 0.0

Got a weird error on AWS when I started ran these above script – “OpenBLAS : Your OS does not support AVX instructions. OpenBLAS is using Nehalem kernels as a fallback, which may give poorer performance.” Not sure if this will affect anything, but the open ref OTU picking seems to be progressing OK regardless (for now…).

Filtering Fasta files in QIIME

Just discovered this script to filter my input fasta sequences in QIIME (e.g. post-demultiplexed samples with SampleId_SeqID header format). Needed to do some filtering on the GOM Illumina data, as follows:

filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_rev_demulti_1to12_2.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_rev_demulti_1to12_2_F04only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_F04only.txt
filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_rev_demulti_1to12_2.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/split_by_sample_Dec2013/GOM_concat1.7_rev_demulti_1to12_2_R22only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_R22only.txt
filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/split_by_sample_Dec2013/GOM_concat1.7_fwd_demulti_1to12_1_F04only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_F04only.txt
filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/split_by_sample_Dec2013/GOM_concat1.7_fwd_demulti_1to12_1_R22only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_R22only.txt

Parallel Aligning Sequences, yet again

Reattempting to align sequences so I can move forward with some cursory core diversity analyses. This time I opted for an AWS m2.2xlarge (only 32GB memory), since I the larger instance was just not wanting to align anything. Same errors with this script, even after changing the qiime_config file.

parallel_align_seqs_pynast.py -i /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13/rep_set.fna -o /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13/pynast_aligned_seqs -T –jobs_to_start 4 –template_fp /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk_aligned.fasta –pairwise_alignment_method uclust –min_percent_id 70.0 –min_length 150

Finally posted a help message on the QIIME Google Group, because this issue has been frustrating me long enough: https://groups.google.com/forum/#!topic/qiime-forum/f2tA2a97OxE

Generating .biom tables with taxonomy

The add_metadata.py scripts just don’t seem to be working for most of the 18S_chimera and GOM_illumina runs. Did a bit of poking on the QIIME forums and it seems like the easier way to do this (confirmed via my fiddling) is just re-generating the OTU tables using make_otu_table.py and passing in the taxonomy mapping file using the -t flag. This is quick and easy to do on my MacBook Retina:

Commands run:

make_otu_table.py -i /Users/hollybik/Desktop/Alaska\
 Analyses/GOM_Illumina/uclust_openref99_rev_16Sept/final_otu_map_mc2.txt -o /Users/hol
lybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_rev_16Sept/otu_table_mc2_
w_tax.biom -t /Users/hollybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_r
ev_16Sept/rdp_assigned_taxonomy/rep_set_tax_assignments.txt
make_otu_table.py -i /Users/hollybik/Desktop/Alaska\
 Analyses/GOM_Illumina/uclust_openref96_fwd_16Sept/final_otu_map_mc2.txt -o /Users/hol
lybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref96_fwd_16Sept/otu_table_mc2_
w_tax.biom -t /Users/hollybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref96_f
wd_16Sept/rdp_assigned_taxonomy/rep_set_tax_assignments.txt
make_otu_table.py -i /Users/hollybik/Desktop/Alaska\
 Analyses/GOM_Illumina/uclust_openref99_fwd_20Sept/final_otu_map_mc2.txt -o /Users/hol
lybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_fwd_20Sept/otu_table_mc2_
w_tax.biom -t /Users/hollybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_f
wd_20Sept/rdp_assigned_taxonomy/rep_set_tax_assignments.txt

18S_chimera – 100% subsampled OTUs

Been thinking about OTU picking, and if we really want to figure out how chimeric sequences are being incorporated into OTUs, I have to cluster 100% of the failure OTUs. So running another round of analyses:

96 clustering:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 1.0 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_96_amazon.txt

99% clustering:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref99_alldenovo_20Oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 1.0 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_99_amazon.txt

18S Chimera – rerunning relabeled files

Wrote a script last night to label chimeric sequences with >chimera_ – now rerunning QIIME analyses locally on my iMac

pick_open_reference_otus.py -i /Users/hollybik/Desktop/Data/18S_chimera/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /Users/hollybik/Desktop/Data/18S_chimera/chimera_openref96_18Sept -r /macqiime/silva_111/eukaryotes_only/rep_set_euks/99_Silva_111_rep_set_euk.fasta --parallel -O 2 -s 0.1 --prefilter_percent_id 0.0 -p /Users/hollybik/Dropbox/QIIME/qiime_parameters_18Schimera_96_iMac.txt 

Update (10/3/13) – iMac taking way too long for OTU picking, so moved over to Amazon AWS. Command for 96% open ref:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref96_3oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_96_amazon.txt

Command for 99% open ref:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref99_5oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_99_amazon.txt