Objective 1: Phylogenetics

After going over the data yesterday, Kelley set out 3 main objectives which I should be aiming for with the metagenomics data:

  1. Phylogenetics–Use 95% OCTU dataset to compile a phylogeny that will hopefully be informative for higher clade relationships (deep splits).  For this purpose we have to assume that each 95 OCTU is a monophyletic group (although I need to look into this more!)
  2. Phylogeography–For each 95% OCTU, compile a phylogeny of the 99% octus that make up this grouping.  Look at separation (or lack thereof) of Pacific/Atlantic and Deep/Shallow reads.  Are any patterns taxa-specific?
  3. Phylogenetic species concepts–Find an algorithm that can be implemented on individual OCTU phylogenies that will define species using tree branch lengths and/or Newick tree files.

Today I was working out the kinks for Objective 1.  Mainly these were related to ARB–sorting the data, exporting it with correct labels and formatting the Fasta files.  Still working in the MasterDB_95A ARB database.  I’ve separated the OCTUs according to primer set (respectively labelled ‘Dorota’ [2019 spp] and ‘Simon [1923 spp] in the tax_slv field of ARB).  I also set up the PT server for this new, aligned database of 95 OCTUs under User4.arb.  A few OCTUs in the database had to be re-aligned by hand, and two OCTUS (653 and 3826) were completely nonsensical and I think are error/chimera reads.  These two OCTUs have been excluded from tree building, and are labelled as ‘unaligned error’  in the tax_slv field of ARB.

For future information (in case I need to build a filter at some point), Simon’s primer set goes from position 0-10221 and Dorota’s primer set goes from 10222-end of alignment.

Also been working out how to export ARB alignments and retain the taxonomic information (and avoid time-consuming tree annotations).  The best way to do it is to export the files in fasta_acc.eft format and then convert this to Phylip using the Readseq tool on CIPRES.  Readseq will cut off the names at the first space in the FASTA header, so you need to put in underscores (e.g. between binomial species names) to stop the relevant information from being truncated–Readseq is awesome because it doesn’t truncate at 10 characters like Dataconvert.  After converting the file to Phylip, you need to download this file, replace all . with ? in the alignment, and also delete any prohibited characters from the species names, including . – / () Then, voila!  Upload the modified Phylip file back to CIPRES and its ready to run on RAxML.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: