Final Aquarium Project data now ready!

Got the demultiplexed data back from all 4 Aquarium project runs, and now we’re going forward with OTU picking. Starting off with closed ref, to see what we get:

pick_closed_reference_otus.py -i /Users/hollybik/Desktop/Data/aquarium_project/All4MiseqRuns_demult_30Apr14/AQUEXP_all_MPF_2.fasta -o /Users/hollybik/Desktop/Data/aquarium_project/All4MiseqRuns_demult_30Apr14/uclust_closedref_30Apr14 -r /macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta -t /macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt --parallel -O 2

Summarizing .biom table by sample name

This command collapsed replicates in the 18S_chimera .biom table, pooling all reads from the same sample site together:

summarize_otu_by_cat.py --otu_table_fp 18Schimera_openref99_20Oct13_otu_table_mc2_w_tax_metadata.biom --mapping_fp /Users/hollybik/Dropbox/Projects/18S_chimera/18S_chimera_mapping_qiime_format_remGOM_fungi15xR2.txt --mapping_category SampleName --output_fp 18Schimera_openref99_20Oct13_otu_table_mc2_w_tax_SampleName.biom

Note: I had to remove any samples that had zero representation in the .biom table (this included a couple of the GOM fungi replicates)

Rerunning GOM Illumina on QIIME 1.8

I’ve spent the last few day re-parsing data to get around some issues related to the way the sequencing facility handed me the GOM Illumina data (ended up needing to separate out primer sets and renumber demultiplexed sequences using the renumber_rev_reads_v2.pl script in the GOM_Illumina/QIIME_files/Dec_2013 folder on Dropbox). In any case, continuing with analysis on QIIME 1.8.

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_co
ncat1.7_allF04combo_10Jan14.fna -o /home/ubuntu/data/uclust_openref99_F04_10Jan 
-r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s
 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --p
refilter_percent_id 0.0
pick_open_reference_otus.py -i /home/ubuntu/data/GOM_co
ncat1.7_allR22combo_10Jan14.fna -o /home/ubuntu/data/uclust_openref99_R22_10Jan 
-r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s
 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --p
refilter_percent_id 0.0

Got a weird error on AWS when I started ran these above script – “OpenBLAS : Your OS does not support AVX instructions. OpenBLAS is using Nehalem kernels as a fallback, which may give poorer performance.” Not sure if this will affect anything, but the open ref OTU picking seems to be progressing OK regardless (for now…).

Filtering Fasta files in QIIME

Just discovered this script to filter my input fasta sequences in QIIME (e.g. post-demultiplexed samples with SampleId_SeqID header format). Needed to do some filtering on the GOM Illumina data, as follows:

filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_rev_demulti_1to12_2.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_rev_demulti_1to12_2_F04only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_F04only.txt
filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_rev_demulti_1to12_2.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/split_by_sample_Dec2013/GOM_concat1.7_rev_demulti_1to12_2_R22only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_R22only.txt
filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/split_by_sample_Dec2013/GOM_concat1.7_fwd_demulti_1to12_1_F04only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_F04only.txt
filter_fasta.py -f /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/split_by_sample_Dec2013/GOM_concat1.7_fwd_demulti_1to12_1_R22only.fna --sample_id_fp /Users/hollybik/Dropbox/Projects/GOM_Illumina_Dauphin/QIIME_files/Dec_2013/QIIMEmappingfile_GOM_Illumina_fakebarcodes_R22only.txt

Aligning seqs on iMac

While waiting for cloud support, running the 18S_chimera workflow on my iMac:

Parallel align sequences:

parallel_align_seqs_pynast.py -i /Users/hollybik/Desktop/Data/18S_chimera/AWS_analyses/18S_chimera_openref99_alldenovo_20Oct13/rep_set.fna -o /Users/hollybik/Desktop/Data/18S_chimera/AWS_analyses/18S_chimera_openref99_alldenovo_20Oct13/pynast_aligned_seqs -T --jobs_to_start 2 --template_fp /macqiime/silva_111/eukaryotes_only/rep_set_aligned_euks/99_Silva_111_rep_set_euk_aligned.fasta --pairwise_alignment_method uclust --min_percent_id 70.0 --min_length 150

Filter alignment:

filter_alignment.py -o /Users/hollybik/Desktop/18S_fallanalyses/GOM_Illumina/uclust_openref96_rev_27Aug/pynast_aligned_seqs -i /Users/hollybik/Desktop/18S_fallanalyses/GOM_Illumina/uclust_openref96_rev_27Aug/pynast_aligned_seqs/rep_set_aligned.fasta --allowed_gap_frac 0.999999 --threshold 3.0 --suppress_lane_mask_filter

Make Tree:

make_phylogeny.py -i /Users/hollybik/Desktop/18S_fallanalyses/18S_chimera/18S_chimera_openref96_alldenovo_18Oct13/pynast_aligned_seqs/rep_set_aligned_pfiltered.fasta -o /Users/hollybik/Desktop/18S_fallanalyses/18S_chimera/18S_chimera_openref96_alldenovo_18Oct13/rep_set.tre --root_method tree_method_default --tree_method fasttree

Also making progress with GOM_Illumina data that my desktop/laptop machines can handle:

parallel_align_seqs_pynast.py -i /Users/hollybik/Desktop/18S_fallanalyses/GOM_Illumina/uclust_openref96_rev_27Aug/rep_set.fna -o /Users/hollybik/Desktop/18S_fallanalyses/GOM_Illumina/uclust_openref96_rev_27Aug/pynast_aligned_seqs -T --jobs_to_start 2 --template_fp /macqiime/silva_111/eukaryotes_only/rep_set_aligned_euks/99_Silva_111_rep_set_euk_aligned.fasta --pairwise_alignment_method uclust --min_percent_id 70.0 --min_length 50

Parallel Aligning Sequences, yet again

Reattempting to align sequences so I can move forward with some cursory core diversity analyses. This time I opted for an AWS m2.2xlarge (only 32GB memory), since I the larger instance was just not wanting to align anything. Same errors with this script, even after changing the qiime_config file.

parallel_align_seqs_pynast.py -i /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13/rep_set.fna -o /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13/pynast_aligned_seqs -T –jobs_to_start 4 –template_fp /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk_aligned.fasta –pairwise_alignment_method uclust –min_percent_id 70.0 –min_length 150

Finally posted a help message on the QIIME Google Group, because this issue has been frustrating me long enough: https://groups.google.com/forum/#!topic/qiime-forum/f2tA2a97OxE

Generating .biom tables with taxonomy

The add_metadata.py scripts just don’t seem to be working for most of the 18S_chimera and GOM_illumina runs. Did a bit of poking on the QIIME forums and it seems like the easier way to do this (confirmed via my fiddling) is just re-generating the OTU tables using make_otu_table.py and passing in the taxonomy mapping file using the -t flag. This is quick and easy to do on my MacBook Retina:

Commands run:

make_otu_table.py -i /Users/hollybik/Desktop/Alaska\
 Analyses/GOM_Illumina/uclust_openref99_rev_16Sept/final_otu_map_mc2.txt -o /Users/hol
lybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_rev_16Sept/otu_table_mc2_
w_tax.biom -t /Users/hollybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_r
ev_16Sept/rdp_assigned_taxonomy/rep_set_tax_assignments.txt
make_otu_table.py -i /Users/hollybik/Desktop/Alaska\
 Analyses/GOM_Illumina/uclust_openref96_fwd_16Sept/final_otu_map_mc2.txt -o /Users/hol
lybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref96_fwd_16Sept/otu_table_mc2_
w_tax.biom -t /Users/hollybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref96_f
wd_16Sept/rdp_assigned_taxonomy/rep_set_tax_assignments.txt
make_otu_table.py -i /Users/hollybik/Desktop/Alaska\
 Analyses/GOM_Illumina/uclust_openref99_fwd_20Sept/final_otu_map_mc2.txt -o /Users/hol
lybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_fwd_20Sept/otu_table_mc2_
w_tax.biom -t /Users/hollybik/Desktop/Alaska\ Analyses/GOM_Illumina/uclust_openref99_f
wd_20Sept/rdp_assigned_taxonomy/rep_set_tax_assignments.txt
Follow

Get every new post delivered to your Inbox.