18S_chimera – 100% subsampled OTUs

Been thinking about OTU picking, and if we really want to figure out how chimeric sequences are being incorporated into OTUs, I have to cluster 100% of the failure OTUs. So running another round of analyses:

96 clustering:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref96_alldenovo_18Oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 1.0 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_96_amazon.txt

99% clustering:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref99_alldenovo_20Oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 1.0 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_99_amazon.txt

18S Chimera – rerunning relabeled files

Wrote a script last night to label chimeric sequences with >chimera_ – now rerunning QIIME analyses locally on my iMac

pick_open_reference_otus.py -i /Users/hollybik/Desktop/Data/18S_chimera/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /Users/hollybik/Desktop/Data/18S_chimera/chimera_openref96_18Sept -r /macqiime/silva_111/eukaryotes_only/rep_set_euks/99_Silva_111_rep_set_euk.fasta --parallel -O 2 -s 0.1 --prefilter_percent_id 0.0 -p /Users/hollybik/Dropbox/QIIME/qiime_parameters_18Schimera_96_iMac.txt 

Update (10/3/13) – iMac taking way too long for OTU picking, so moved over to Amazon AWS. Command for 96% open ref:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref96_3oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_96_amazon.txt

Command for 99% open ref:

pick_open_reference_otus.py -i /home/ubuntu/data/chim_demux.extendedFrags_primersremoved_fastxtrimmed_chimeraslabelled.fasta -o /home/ubuntu/data/18S_chimera_openref99_5oct13 -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 --prefilter_percent_id 0.0 -p /home/ubuntu/data/qiime_parameters_18Schimera_99_amazon.txt

Converting .biom files and replacing metadata

There’s no “remove metadata” command for the .biom files, so what I’ve been doing is converting to Classic OTU tables (and pulling down taxonomy information in this conversion), and then converting *back* to .biom to add the new set of metadata:

convert_biom.py -i /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_closed_reference_otu_table_w_sample_md.biom -o /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_closed_reference_otu_table_classic.txt --biom_to_classic_table --header_key taxonomy

Then I went in and edited the sample mapping file (to condense the metadata to only relevant columns for visualization). Then re-convert the classic OTU table to .biom using the new condensed metadata mapping file:

convert_biom.py -i /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850
_closed_reference_otu_table_classic.txt -o /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_closed_reference_otu_table_condensed_metadata.biom -m /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_mapping_file_condensed_metadata.txt --biom_table_type="otu table"

Also converting my published Deepsea data and GOM data into new .biom files

First, Rename “Consensus Lineage” final column to “taxonomy” – Next, run command:

convert_biom.py -i /Users/hollybik/Dropbox/Projects/Visualization/my_test_data/Deepsea_OTUtable_uclust99_F04NF1.txt -o /Users/hollybik/Dropbox/Projects/Visualization/my_test_data/Deepsea_OTUtable_uclust99_F04NF1.biom --process_obs_metadata taxonomy -m /Users/hollybik/Dropbox/Projects/Visualization/my_test_data/QIIMEmappingfile_Deepsea_MASTER.txt --biom_table_type="otu table"

However, there are a few problems with this conversion:

  1. Taxonomy strings are the old comma-delimited version. They aren’t separated by the new k__;p__;c__;o__;f__;g__;s__ delimiters
  2. I can’t figure out if the sample names (OTU table column headers got pulled through). It doesn’t look like they did, only the ¬†QIIME mapping file info got pasted in at the bottom. Maybe I’m wrong about this, it was just based on a quick glance through the .biom table

So what I think I’ll do for these data:

  1. Re-run taxonomy assignment using new SILVA database and RDP classifier. This will put the taxonomy strings in the correct format
  2. Just construct a new .biom table from the OTU mapping file and taxonomy assignments, instead of trying to convert old classic OTU tables into .biom files.

Organizing GOM Illumina data

Organizing GOM analyses run to data – downloaded completed runs onto 1TB external hard drive, along with parameter files (and copied command ran into a comment line at the top of the parameter file). Proceeding with more AWS analysis.

Forward reads at 96% (m2.4xlarge was running out of memory, so dropped down to 6 parallel jobs):

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /home/ubuntu/data/uclust_openref96_fwd_16Sept -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 6 -s 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref96_GOMamazon_16sept.txt --prefilter_percent_id 0.0

(9/20/13) Forward reads at 99% – kept at 6 parallel jobs

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_concat1.7_fwd_demulti_1to12_1.fna -o /home/ubuntu/data/uclust_openref99_fwd_20Sept -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 6 -s 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --prefilter_percent_id 0.0

Reverse reads at 99%:

pick_open_reference_otus.py -i /home/ubuntu/data/GOM_concat1.7_rev_demulti_1to12_2.fna -o /home/ubuntu/data/uclust_openref99_rev_16Sept -r /home/ubuntu/data/silva_111/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 -p /home/ubuntu/data/qiime_parameters_18Sopenref99_GOMamazon_16sept.txt --prefilter_percent_id 0.0

Playing around with R for #indoorevol

James took us through a tutorial in R today – see Dropbox folder for the .md, .html, and .rmd files to view/execute in R.

I had to install these additional packages to use his scripts:


#indoorevol metanalyses – fungi

Getting down to meta-analysis for the indoorevol project. Running Closed Ref OTU picking on Amazon Cloud, with a new fungi parameter file I compiled. Command ran:

pick_closed_reference_otus.py -i /home/ubuntu/data/fungi/Fungal_long_seqs.fasta -o /home/ubuntu/data/fungi/uclust_closedref_97 -r /home/ubuntu/data/fungi/its_12_11_otus/rep_set/99_otus.fasta -t /home/ubuntu/data/fungi/its_12_11_otus/taxonomy/99_otu_taxonomy.txt --parallel -O 8 -p /home/ubuntu/data/fungi/qiime_parameters_fungi.txt