Lots more alignment edits

Busy few days of alignment edits:

On June 10th I got the first full Nematode RAxML tree back.  It was a complete mess and I’ve now gone back to the alignment and made some major edits. 

Firstly, I removed taxa from the ARB tree that had extremely long branches on the best ML tree from RAxML job #418200.  These removed sequences were as follows:

  • MlgInc16, MLgHisp3, PyncZea2, UncNe311, UncNe482, UncNe469, HtaGly20, WucBanc6, Pd6Strong, Pd6Punct, Pd6Cylin

Next I edited alignments for the following groups:

  • Bunonema, Trichurus, Daptonema, Theristus, Pelodera, Rhabditas, Ditylenchus

The tree ‘SSU_MajorReduc_11June’ reflects the following changes:

  1. I removed all the duplicate sequences that RAxML job #418200 indicated were in the tree.  See file ‘Arb_ident_Seq_7June’ [or something similar] for the list of duplicate sequences with ARB ID/taxonomy information.  After this step there were 2969 sequences in the NJ tree. [I believe this step is also reflected in ARB database ”SSU_10June’]
  2. The Uncultured nematodes were really clogging up the tree and didn’t seem to be adding anything informative.  They added long branches to clades, and a lot of them were just short barcode sequences anyway.  I went back to the tree and removed all the ‘uncultured nematodes’ and ‘uncultured nematodes sourhope farm’. After this step there were 2836 sequences in the NJ tree.
  3. Removed lots of Acrobeloides sequences–sequences I kept in the tree are highlighted in yellow on the printout of ARB taxa in database.  After this step there were 2510 sequences in the NJ tree.

I exported the resulting 2510 taxa into a Phylip file named ‘NewSSU_2510taxa.phy’.  I ran the following jobs in RAxML:

  • Job #514028  was run with GTR +G
  • Job #514044 was run with GTR + I + G


I kept going with the reductionist edits, next removing my own list of duplicate SSU sequences–only the sequences with duplicate SSU AND D2D3 were removed from the SSU database (so we can see the topology differences if seqs are identical for SSU but different for D2D3).  Following sequences were removed:

  • AUK 10, 35
  • HUK 1
  • LUK 3
  • OUS 1, 14, 21, 9, 3, 5, 7, 8
  • HCL 10, 11, 12, 2, 27, 5, 7, 9, 20, 32, 23
  • BUS 1, 2, 3, 4, 5, 7
  • NUS 2, 4, 5, 6, 14, 40
  • DBA 4, 2, 3, 5, 6, 7
  • SBA 3, 5, 1, 12
  • NAR 1, 5, 15, 16, 7, 2, 8
  • SUS 2, 21, 10, 15, 6
  • WUS 5, 2, 4
  • LCL 2, 3, 4, 7, 8, 5, 9, 19
  • BCA 10, 1, 31, 47, 5, 6, 21, 23
  • SBN 2
  • Cr 55, 73a, 83a, 84b
  • TCR 1, 3, 12, 130, 139, 188, 202, 158

I also went through the following genera and removed sequences which were shorter than 1000bps.  Generally for these groups, the short sequences were outliers and attached to other clades (presumably because of LBA due to their shortness), and there were plenty of full SSU sequences availalbe in the database which could be used to represent these genera.  I’m now aiming to limit representative sequences to 15 or less per genera (excluding Enoplids) Groups edited were as follows:

  • Strongyloides
  • Rhabdolaimus
  • Bursaphalenchus
  • Longidorus
  • Pratylenchus
  • Tricodorus
  • Paratrichodorus
  • Rotylenchus/Rotylenchulus
  • Ditylenchus
  • Globodera
  • Heterodera
  • Pristionchus
  • Pellioditis
  • Aphelenchoides
  • Steinernema

After all these edits, I rebuilt the NJ tree early morning on June 12.  This NJ tree had 1876 sequences.

I also built the initial LSU database in ARB.  I downloaded the alignment for Phylum Nematoda from SILVA (containing 1544 taxa).  I imported this into ARB and edited the following groups:

  • Caenorhabditis
  • Strongyloides
  • Bursaphelenchus
  • Pratylenchus

The LSU aligments are a MESS.  The conserved regions are OK, but the less conservative regions are not aligned at all.  I’ve been using fast aligner a lot to rectify things within genera, as oftentimes it seem that things align ok if you use this method–although it doesn’t work for everything and some regions are impossible to work with. 



RAxML run

Ran the first RAxML run over the weekend (Job 418163).  ed of all of my own Enoplid Sequences (#256) and all the ‘Enoplid*’ search results by taxonomy in ARB, as well as Sabatieria sequences (6) chosen as the outgroup.  The file was subbmitted in Phylip format (filename Enoplids_2_28May.phy, total taxa=444)

  • GTR +G, ML estimate of alpha-parameter, 100 bootstrap replicates performed
  • Final ML Optimisation Likelihood  = -57408.599186
  • Annotated the best-scoring ML tree with bootstrap value

RAxML Job 418200 –>  Cox1_Ts_replaced.phy (this file has since been renamed ‘SSU_Ts_replaced.phy’).  Job was submitted on May 28th, and has not finished as of this morning. Estimated to take 145 hours.  URL: http://phylobench.vital-it.ch/raxml-bb/index.php?jid=418200