Converting .biom files and replacing metadata

There’s no “remove metadata” command for the .biom files, so what I’ve been doing is converting to Classic OTU tables (and pulling down taxonomy information in this conversion), and then converting *back* to .biom to add the new set of metadata:

convert_biom.py -i /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_closed_reference_otu_table_w_sample_md.biom -o /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_closed_reference_otu_table_classic.txt --biom_to_classic_table --header_key taxonomy

Then I went in and edited the sample mapping file (to condense the metadata to only relevant columns for visualization). Then re-convert the classic OTU table to .biom using the new condensed metadata mapping file:

convert_biom.py -i /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850
_closed_reference_otu_table_classic.txt -o /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_closed_reference_otu_table_condensed_metadata.biom -m /Users/hollybik/Dropbox/Projects/Visualization/Test\ data\ from\ Greg/biom_for_holly/global-gut/study_850_mapping_file_condensed_metadata.txt --biom_table_type="otu table"

Also converting my published Deepsea data and GOM data into new .biom files

First, Rename “Consensus Lineage” final column to “taxonomy” – Next, run command:

convert_biom.py -i /Users/hollybik/Dropbox/Projects/Visualization/my_test_data/Deepsea_OTUtable_uclust99_F04NF1.txt -o /Users/hollybik/Dropbox/Projects/Visualization/my_test_data/Deepsea_OTUtable_uclust99_F04NF1.biom --process_obs_metadata taxonomy -m /Users/hollybik/Dropbox/Projects/Visualization/my_test_data/QIIMEmappingfile_Deepsea_MASTER.txt --biom_table_type="otu table"

However, there are a few problems with this conversion:

  1. Taxonomy strings are the old comma-delimited version. They aren’t separated by the new k__;p__;c__;o__;f__;g__;s__ delimiters
  2. I can’t figure out if the sample names (OTU table column headers got pulled through). It doesn’t look like they did, only the ¬†QIIME mapping file info got pasted in at the bottom. Maybe I’m wrong about this, it was just based on a quick glance through the .biom table

So what I think I’ll do for these data:

  1. Re-run taxonomy assignment using new SILVA database and RDP classifier. This will put the taxonomy strings in the correct format
  2. Just construct a new .biom table from the OTU mapping file and taxonomy assignments, instead of trying to convert old classic OTU tables into .biom files.

PhyloSift analysis of Deepsea OTUs

Prepping for lab meeting tomorrow, so looking at the results of the PhyloSift runs for the Deepsea OTU data.

Edge PCA (produces an .xml tree file) :

./guppy pca –out-dir ~/phylosift_v1.0.0_01/ ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_ShallowCalif.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_ShallowGulf.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic22.1.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic25.2.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic29.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic43.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic45.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific128.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific237.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific321.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific422.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific528.fna/treeDir/18s_reps.1.jplace –prefix guppyDS

Squash clustering (

./guppy squash –out-dir ~/phylosift_v1.0.0_01/ ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_ShallowCalif.fna/treeDir/ShallowCalif_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_ShallowGulf.fna/treeDir/ShallowGulf_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic22.1.fna/treeDir/Atlantic22.1_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic25.2.fna/treeDir/Atlantic25.2_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic29.fna/treeDir/Atlantic29_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic43.fna/treeDir/Atlantic43_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic45.fna/treeDir/Atlantic45_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific128.fna/treeDir/Pacific128_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific237.fna/treeDir/Pacific237_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific321.fna/treeDir/Pacific321_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific422.fna/treeDir/Pacific422_18Sreps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific528.fna/treeDir/Pacific528_18Sreps.1.jplace –prefix guppyDeepsea_squash

Kantorovich-Rubinstein Distance:

~/phylosift_v1.0.0_01/bin$ ./guppy kr –out-dir ~/phylosift_v1.0.0_01/ ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_ShallowCalif.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_ShallowGulf.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic22.1.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic25.2.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic29.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic43.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Atlantic45.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific128.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific237.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific321.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific422.fna/treeDir/18s_reps.1.jplace ~/phylosift_v1.0.0_01/PS_temp/QiimeSplit_F04_Pacific528.fna/treeDir/18s_reps.1.jplace¬†-o guppy_Deepsea_KRdistance

Cryptomycota BLAST against Deepsea MolEcology data

Figured I’d get a move on and BLAST some of the cryptomycota data to see if there was anything interesting in my deep sea OTU data.

Was running into a bit of trouble and kept hitting error like this:

[blastall] ERROR: SeqPortNew: lcl|OCTU5627 stop(321) >= len(264)
[blastall] ERROR: SeqPortNew: lcl|OCTU5627 stop(321) >= len(264)
[blastall] ERROR: SeqPortNew: lcl|OCTU33393 start(262) >= len(232)
[blastall] ERROR: SeqPortNew: lcl|OCTU33393 start(323) >= len(232)

But then I talked to Aaron and he said I was formatting the DB with -p T (parsing with an index), which was giving me that problem. So in the end I formatted the DB with the command:

formatdb -i OTUrepset_uclust99_bothloci -p F -o F

And then the error went away and I was successfully able to run the Cryptomycota comparison:

megablast -d /home/hollybik/BLAST_db/99_QIIME_OTU_deepsea/OTUrepset_uclust99_bothloci -i /home/hollybik/BLAST_db/99_QIIME_OTU_deepsea/Cryptomycota_rRNAseq_Jones_2011_mod.txt -o cryptomycota_Deepsea_BLAST_3.txt -v 5 -b 5 -D 2

Where -v is the number of db matches, -b is the number of alignments to show, and -D is the output format (standard BLAST)