QIIME open ref OTU picking

Been trying to use the new QIIME 1.7 scripts on the GOM Illumina data (running out of memory so this is still a work in progress though…heading over to Amazon Cloud soon):pick_open_reference_otus.py -i

Standard workflow:

/Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/uclust_99_fwd -r /macqiime/silva_111/rep_set/Silva_111_full_unique.fasta --parallel -O 2 -s 0.1 --suppress_taxonomy_assignment --suppress_align_and_tree

Skipping prefiltering (thought this would speed things up but no…)

pick_open_reference_otus.py -i /Users/hollybik/Desktop/Data/Illumina_GOM/demultiplexed_qiime1.7/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /Users/hollybik/Desktop/Data/Illumina_GOM/uclust_99_fwd -r /macqiime/silva_111/rep_set/Silva_111_full_unique.fasta --parallel -O 2 -s 0.1 --suppress_taxonomy_assignment --suppress_align_and_tree --prefilter_percent_id 0.0

QIIME GOM data

For some reason I couldn’t get the parallel_assign_taxonomy_rdp.py script to work, so I had to revert back to the normal assign_taxonomy.py script. I was getting an error unless I increased the max memory flag, so remember to do this again next time.

assign_taxonomy.py -m rdp -i /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set.fasta -r /home/ubuntu/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -t /home/ubuntu/Silva_108/taxa_mapping/Silva_RDP_taxa_mapping_Eukarya_only_genus.txt -o /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/rdp_genus_assigntaxon/ –rdp_max_memory 50000

assign_taxonomy.py -m rdp -i /home/ubuntu/GOM_demulti2_rev_OpenRef_uclust99_12Oct/GOM_concat_rev_demultirepeat_1to12_2_otus_rep_set.fasta -r /home/ubuntu/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -t /home/ubuntu/Silva_108/taxa_mapping/Silva_RDP_taxa_mapping_Eukarya_only_genus.txt -o /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/rdp_genus_assigntaxon/ –rdp_max_memory 50000

Making OTU tables of all seqs:

make_otu_table.py -i GOM_concat_fwd_demultirepeat_1to12_1_otus.txt -t rdp_genus_assigntaxon/GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set_tax_assignments.txt -o GOM_concat
_fwd_demultirepeat_1to12_1_otu_table_allotus.biom

Chimera checking (used parallell for actual data anlysis, but listing non-parallel script here for reference):

identify_chimeric_seqs.py -i GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set.fasta -t /home/ubuntu/Silva_108/taxa_mapping/Silva_RDP_taxa_mapping_Eukarya_only_genus.txt -r /home/ubuntu/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -o GOM_concat_fwd_demultirepeat_1to12_1_chimeric_seqs.txt -m blast_fragments

parallel_identify_chimeric_seqs.py -m blast_fragments -i GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set.fasta -t /home/ubuntu/Silva_108/taxa_mapping/Silva_RDP_taxa_mapping_Eukarya_only_genus.txt -r /home/ubuntu/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -o GOM_concat_fwd_demultirepeat_1to12_1_chimeric_seqs.txt -O 6

 

GOM data analysis – yet more QIIME

Continuing next steps with QIIME. Filtering alignments next:

filter_alignment.py -i /home/ubuntu/GOM_demulti2_rev_OpenRef_uclust99_12Oct/aligned_seqs/GOM_concat_rev_d
emultirepeat_1to12_2_otus_rep_set_aligned.fasta -o /home/ubuntu/GOM_demulti2_rev_OpenRef_uclust99_12Oct/aligned_seqs/GOM_concat_rev_
demultirepeat_1to12_2_otus_rep_set_filtered_aligned.fasta -s -e 0.10 -g 0.90

filter_alignment.py -i /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/aligned_seqs/GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set_aligned.fasta -o /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/aligned_seqs/GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set_filtered_aligned.fasta -s -e 0.10 -g 0.90

Now assigning taxonomy:

parallel_assign_taxonomy_rdp.py -i /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set.fasta -r /home/ubuntu/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -t /home/ubuntu/Silva_108/taxa_mapping/Silva_RDP_taxa_mapping_Eukarya_only_species.txt -o /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/rdp_taxonomy/ -c 0.7 -O 6

GOM Illumina – next steps with QIIME

Getting back to GOM data analysis on the Amazon cloud. I had run the fwd OTU picking too but forgot to note down the command (finished on Oct 16th):

pick_otus.py -i ~/GOM_demultiplexed/GOM_concat_fwd_demultirepeat_1to12_1.fna -m uclust_ref -o /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct -r /home/ubuntu/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -s 0.99 –enable_rev_strand_match

Proceeding with picking rep set of sequences:

pick_rep_set.py -i GOM_concat_fwd_demultirepeat_1to12_1_otus.txt -f ~/GOM_demultiplexed/GOM_concat_fwd_demultirepeat_1to12_1.fna -m first -l pick_rep_set.log -o GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set.fasta

pick_rep_set.py -i GOM_concat_rev_demultirepeat_1to12_2_otus.txt -f ~/GOM_demultiplexed/GOM_concat_rev_demultirepeat_1to1
2_2.fna -m first -l pick_rep_set.log -o GOM_concat_rev_demultirepeat_1to12_2_otus_rep_set.fasta

Next need to align sequences:

parallel_align_seqs_pynast.py -i /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/GOM_concat_fwd_demultirepeat_1to12_1_otus_rep_set.fasta -t /home/ubuntu/Silva_108/core_aligned/Silva_108_core_aligned_seqs.fasta -a uclust -o /home/ubuntu/GOM_demulti2_fwd_OpenRef_uclust99_12Oct/aligned_seqs/ -e 70 -O 6

parallel_align_seqs_pynast.py -i /home/ubuntu/GOM_demulti2_rev_OpenRef_uclust99_12Oct/GOM_concat_rev_demultirepeat_1to12_2_otus_rep_set.fasta -t /home/ubuntu/Silva_108/core_aligned/Silva_108_core_aligned_seqs.fasta -a uclust -o /home/ubuntu/GOM_demulti2_rev_OpenRef_uclust99_12Oct/aligned_seqs/ -e 70 -O 6

GOM Illumina – old Virtual Box analyses

Transferring info over from Google Docs. Here is some of the older work I was doing with the GOM Illumina data (note that these commands precede the most recent analysis, and I am not using any of this processed data for the final publication):

De-multiplexed individual samples in QIIME

QIIME 1.4 Virtual box on iMac. Seems like these commands allow no errors in the barcode, though.

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-1_1_trimmed.txt -o sl_out/KO_1_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-1_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_1.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-1_2_trimmed.txt -o sl_out/KO_1_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-1_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_1.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-2_1_trimmed.txt -o sl_out/KO_2_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-2_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_2.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-2_2_trimmed.txt -o sl_out/KO_2_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-2_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_2.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-3_1_trimmed.txt -o sl_out/KO_3_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-3_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_3.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-3_2_trimmed.txt -o sl_out/KO_3_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-3_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_3.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-4_1_trimmed.txt -o sl_out/KO_4_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-4_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_4.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-4_2_trimmed.txt -o sl_out/KO_4_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-4_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_4.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-5_1_trimmed.txt -o sl_out/KO_5_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-5_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_5.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-5_2_trimmed.txt -o sl_out/KO_5_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-5_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_5.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-6_1_trimmed.txt -o sl_out/KO_6_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-6_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_6.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-6_2_trimmed.txt -o sl_out/KO_6_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-6_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_6.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-7_1_trimmed.txt -o sl_out/KO_7_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-7_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_7.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-7_2_trimmed.txt -o sl_out/KO_7_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-7_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_7.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-8_1_trimmed.txt -o sl_out/KO_8_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-8_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_8.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-8_2_trimmed.txt -o sl_out/KO_8_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-8_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_8.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-9_1_trimmed.txt -o sl_out/KO_9_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-9_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_9.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-9_2_trimmed.txt -o sl_out/KO_9_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-9_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_9.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-10_1_trimmed.txt -o sl_out/KO_10_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-10_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_10.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-10_2_trimmed.txt -o sl_out/KO_10_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-10_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_10.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-11_1_trimmed.txt -o sl_out/KO_11_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-11_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_11.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-11_2_trimmed.txt -o sl_out/KO_11_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-11_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_11.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-12_1_trimmed.txt -o sl_out/KO_12_1/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-12_1_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_12.txt –barcode_type 5

split_libraries_fastq.py -i ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-12_2_trimmed.txt -o sl_out/KO_12_2/ -b ~/Desktop/Shared_Folder/fastx_trimmed_files/1926-KO-12_2_barcode.txt -m ~/Desktop/Shared_Folder/QIIME_mapping_files/1926_KO_12.txt –barcode_type 5

Then reran demultiplexing, this time using QIIME 1.5 Virtual box on iMac. Seems like these commands allow no errors in the barcode, though (I think these outputs are the ones listed in sl_out_nobarcodemismatch files)

Picking OTUS

Using UCLUST, 99% similarity with trie prefiltering on QIIME 1.5 VirtualBox on iMac

pick_otus.py -m uclust -s 0.99 -t -z -/home/qiime/Desktop/Shared_Folder/sl_out/GOM_illumina_demultiplexed_fwd_concat_1to12_1 -o /home/qiime/Desktop/Shared_Folder/GOM_uclust99_otus/

Next Steps

  1. Concat de-multiplexed samples (two separate combined files, PE-Fwd and PE-Rev)
    • Looks like there were a lot of barcodes in the reverse direction with one N character within the barcode; should probably re-run demultiplexing allowing for one mismatch (at least in reverse direction, anyway).
  2. Run OTU clustering – UCLUST at 99%
  3. Assign Taxonomy – PYNAST and SILVA reference database
  4. Diversity analyses in QIIME
    • Alpha Diversity – Rarefaction
    • Beta Diversity – PCoA, UPGMA clustering
    • OTU Network analyses, view in Cytoscape
    • New analyses released in QIIME 1.5.0
  5. Run OTUs through PhyloSift
    • 18S taxonomy assignments using SSU-align and tree-placement method (run on individual samples)
    • Edge PCOA and Squash Clustering in guppy, for multisample comparison
  6. Run Metatranscriptome data through PhyloSift
    • Does protein-coding taxonomy agree with rRNA data?