BLAST Dongying’s markers and RAxML run on thesis news

Making progress on BLASTing Dongying vs. Parfrey’s eukaryotic markers. Ran command on Edhar:

blastall -p blastp -d /home/hollybik/euks_vs_dongying_markers/DW_BacArch_ComboMarkers.faa -i /home/hollybik/Euks_ParfreyMarkers/Euk_ParfeyMarkers_allgenes_unaligned.fasta -o Parfrey_vs_Dongying_blastp.txt -v 3 -b 3

Having major problems with file conversions, so trying to run RAxML locally now on Edhar:

raxmlHPC -s SSUalign_BikThesisNems_Phylip_21Apr.txt -n SSUalign_RAxML_GTRCAT -m GTRCAT -f a -T 4

That didn’t work so Guillaume fixed the Phylip file and we ran the following command:

raxmlHPC -s gjospin_23Apr_out.phylip -n SSUalign_RAxML_GTRCAT -m GTRCAT -f a -T 4 -x 12345 -# 100 -p 12345

Also re-downloaded the newest version of PhyloSift (including all new markers), and re-ran the xeno_assembly_low_cov.fa

./phylosift all /home/hollybik/TestData/xeno_assembly_low_cov.fa


Phylogenetic placement of short reads

Today I’ve been exploring how to place our short pyrosequencing reads into an evolutionary context.

RAxML EPA webserver ( )

  • Downloaded the SILVA 102 release non-redundant reference alignment, but ran into problems with guide tree uploading.  Apparently I need a tree with branch information AND alignment info, which I’m not sure how to do in ARB.  Have e-mailed Alexis and Simon Berger to see if they can help me prepare this file.

Placing sequences onto a tree in ARB

  • I’ve exported my ‘guide tree’ of nematode sequences from my thesis–file is saved as SSU_NematodeThesisRef_w99As_18Aug.arb on the IOMEGA data hard drive.  Full-length reference sequences are labelled as ‘Thesis Ref’ under the import field.  Nematode sequences from the 99A dataset are lablled as ‘Nema99A’ under the import field.

Re-running jobs for BMC Enoplid Paper

I’ve been re-running all the jobs for the BMC paper, now including the Rhabdolaimidae for the Enoplid-only tree.  However, inclusion of these sequences has caused some (previously resolved) nodes to become unresolved.  So I’ve modified two new RAxML jobs to include 1000 bootstrap replicates and another with an extended majority rule consensus tree.  I’m also re-running the Enoplid Bayes tree to run for 4 million generations.

Did something similar with the Nematomorpha tree (Bayesian), using the nexus file for the thesis job Holly_longjob2 (CIPRES Bayes folder on old Dell).  Running this for 3 million and 4 million generations.

Getting started at UNH

Resurrecting my online lab book so that I remember what I do!

David Lunt sent back his comments regarding the (now combined) BMC papers detailing the Enoplid phylogeny.  He pointed out some of my outgroup choices, so I’ve had to go back and re-run a couple of the trees.

Just FYI in ARB, to export secondary structure features:

  • Helix, box reads  . 0 – =
  • Loops, box reads [<>]

I removed the Chromadorid outgroups and many Dorylaimid species from the Enoplid-only ML trees and ran the following jobs:

  • Raxml Job #891091  11Mar10_EnopReducedDS.phylip  (Normal ML run with Dorylaimia OGs)
  • Raxml Job #894968  DO NOT USE!  Forgot to add data partition file
  • Raxml Job #896361 11MarEnop_LSU_ML (Normal ML run with Dorylaimia OGs)
  • Raxml Job #898456 11MarEnop_StemLoopCombo, partition fie Raxml_SL_11Mar.txt (Stem/Loop ML run with Dorylaimia OGs)

CIPRES portal was rejecting my Bayesian runs, so will try again today.

Big Updates

Been doing a lot of updates over the past week or so. 

There are now separate ARB databases for the Enoplid-only tree (12June_EnopTree.arb)and the Full nematode tree (14June_FullNemTree.arb).     Firstly, I severely cut down the Full Nem SSU tree to 1798 taxa, and further to 1796 taxa on June 18thThe Enoplids-only tree now contains 784 taxa.

On the 12th June I edited the following genera in the Full Nem tree, during the process of cutting down taxa:

  • Oscheius
  • Phasmarhabditis
  • Plectus
  • Ancylstoma
  • Protorhabditis
  • Mesorhabditis
  • Schistonchus
  • Aphelenchoides
  • Camallus/Procamallus/Spirocamallus
  • Brugia
  • Toxocara
  • Philonema
  • Poikilolaimus
  • Howardula
  • Pratylenchus
  • Hirschmanniella
  • Basiria
  • Helicotylenchus
  • Cephalob*
  • Zeldia
  • Philometra/Philometroides
  • Nacobbus
  • Eumonhystera
  • Oxydirus
  • Labronema
  • Clavicaudoides
  • Robbea
  • Laxus
  • Caenorhabditis
  • Rhabditis
  • Panagrolaimus
  • Anguina
  • Aphelenchus
  • Adoncholaimus
  • Daptonema
  • Theristus

The Enoplids Only tree contains the following species:

  1. All Enoplea (I searched in tax_embl field for ‘Enoplea’ and selected all results)–includes Enoplida, Dorylaimia, etc.
  2. Chromadoridae (searched in tax_embl field for ‘Chromadoridae’)–represents outgroup
  3. Holly Enoplids (All sequences–no duplicates removed)

The total of the above Enoplids was 989 taxa.  This was then cut down to 784 by removing excess sequences in some taxa (but none of Holly Enoplids were removed) Some of the removed taxa are listed in the file 12June_removedEnop.alpha, but more were removed that aren’t listed in that file.

The alignment was checked for ALL Enoplid sequences by systematically going down the tree.  On June 13th (after all edits were finished), all Enoplid taxa left in tree (784 taxa) were submitted to RaxML:

  • Job 533582 –> 13June_EnopFulledit, GTR + G
  • Job 533583 –> 13June_EnopFulledit, GTR +G +I

On June 14th, the same Enoplid taxa were submitted for Bayesian Inference analysis via the CIPRES web server.  The taxa need to be exported from ARB in PAUP format, and then you need to add a) MrBayes command block b) Remove some info from the header (change RNA to DNA, delete some characters for missing data):

  • File 14_June_EnopBayes.nxs, GTR + G
  • File 14_June_EnopBayes.nxs, GTR + G +I

The next few days have been spent editing ALL taxa alignments in the Full Nem SSU tree.  These edits were finished on June 18th, and the Full Nem tree (no outgroups added yet) was submitted to RAxML:

  • Job 569141 –> 18June_FullNem.phy, GTR + G
  • Job 569160 –> 18June_FullNem.phy, GTR + G + I

I tried submitting the Full Nem tree to the CIPRES server for MrBayes, but it appears the file size is too big for the server to accept–I have to contact the administrators if I want them to run the job.

Lots more alignment edits

Busy few days of alignment edits:

On June 10th I got the first full Nematode RAxML tree back.  It was a complete mess and I’ve now gone back to the alignment and made some major edits. 

Firstly, I removed taxa from the ARB tree that had extremely long branches on the best ML tree from RAxML job #418200.  These removed sequences were as follows:

  • MlgInc16, MLgHisp3, PyncZea2, UncNe311, UncNe482, UncNe469, HtaGly20, WucBanc6, Pd6Strong, Pd6Punct, Pd6Cylin

Next I edited alignments for the following groups:

  • Bunonema, Trichurus, Daptonema, Theristus, Pelodera, Rhabditas, Ditylenchus

The tree ‘SSU_MajorReduc_11June’ reflects the following changes:

  1. I removed all the duplicate sequences that RAxML job #418200 indicated were in the tree.  See file ‘Arb_ident_Seq_7June’ [or something similar] for the list of duplicate sequences with ARB ID/taxonomy information.  After this step there were 2969 sequences in the NJ tree. [I believe this step is also reflected in ARB database ”SSU_10June’]
  2. The Uncultured nematodes were really clogging up the tree and didn’t seem to be adding anything informative.  They added long branches to clades, and a lot of them were just short barcode sequences anyway.  I went back to the tree and removed all the ‘uncultured nematodes’ and ‘uncultured nematodes sourhope farm’. After this step there were 2836 sequences in the NJ tree.
  3. Removed lots of Acrobeloides sequences–sequences I kept in the tree are highlighted in yellow on the printout of ARB taxa in database.  After this step there were 2510 sequences in the NJ tree.

I exported the resulting 2510 taxa into a Phylip file named ‘NewSSU_2510taxa.phy’.  I ran the following jobs in RAxML:

  • Job #514028  was run with GTR +G
  • Job #514044 was run with GTR + I + G


I kept going with the reductionist edits, next removing my own list of duplicate SSU sequences–only the sequences with duplicate SSU AND D2D3 were removed from the SSU database (so we can see the topology differences if seqs are identical for SSU but different for D2D3).  Following sequences were removed:

  • AUK 10, 35
  • HUK 1
  • LUK 3
  • OUS 1, 14, 21, 9, 3, 5, 7, 8
  • HCL 10, 11, 12, 2, 27, 5, 7, 9, 20, 32, 23
  • BUS 1, 2, 3, 4, 5, 7
  • NUS 2, 4, 5, 6, 14, 40
  • DBA 4, 2, 3, 5, 6, 7
  • SBA 3, 5, 1, 12
  • NAR 1, 5, 15, 16, 7, 2, 8
  • SUS 2, 21, 10, 15, 6
  • WUS 5, 2, 4
  • LCL 2, 3, 4, 7, 8, 5, 9, 19
  • BCA 10, 1, 31, 47, 5, 6, 21, 23
  • SBN 2
  • Cr 55, 73a, 83a, 84b
  • TCR 1, 3, 12, 130, 139, 188, 202, 158

I also went through the following genera and removed sequences which were shorter than 1000bps.  Generally for these groups, the short sequences were outliers and attached to other clades (presumably because of LBA due to their shortness), and there were plenty of full SSU sequences availalbe in the database which could be used to represent these genera.  I’m now aiming to limit representative sequences to 15 or less per genera (excluding Enoplids) Groups edited were as follows:

  • Strongyloides
  • Rhabdolaimus
  • Bursaphalenchus
  • Longidorus
  • Pratylenchus
  • Tricodorus
  • Paratrichodorus
  • Rotylenchus/Rotylenchulus
  • Ditylenchus
  • Globodera
  • Heterodera
  • Pristionchus
  • Pellioditis
  • Aphelenchoides
  • Steinernema

After all these edits, I rebuilt the NJ tree early morning on June 12.  This NJ tree had 1876 sequences.

I also built the initial LSU database in ARB.  I downloaded the alignment for Phylum Nematoda from SILVA (containing 1544 taxa).  I imported this into ARB and edited the following groups:

  • Caenorhabditis
  • Strongyloides
  • Bursaphelenchus
  • Pratylenchus

The LSU aligments are a MESS.  The conserved regions are OK, but the less conservative regions are not aligned at all.  I’ve been using fast aligner a lot to rectify things within genera, as oftentimes it seem that things align ok if you use this method–although it doesn’t work for everything and some regions are impossible to work with. 


RAxML run

Ran the first RAxML run over the weekend (Job 418163).  ed of all of my own Enoplid Sequences (#256) and all the ‘Enoplid*’ search results by taxonomy in ARB, as well as Sabatieria sequences (6) chosen as the outgroup.  The file was subbmitted in Phylip format (filename Enoplids_2_28May.phy, total taxa=444)

  • GTR +G, ML estimate of alpha-parameter, 100 bootstrap replicates performed
  • Final ML Optimisation Likelihood  = -57408.599186
  • Annotated the best-scoring ML tree with bootstrap value

RAxML Job 418200 –>  Cox1_Ts_replaced.phy (this file has since been renamed ‘SSU_Ts_replaced.phy’).  Job was submitted on May 28th, and has not finished as of this morning. Estimated to take 145 hours.  URL: