Big Updates

Been doing a lot of updates over the past week or so. 

There are now separate ARB databases for the Enoplid-only tree (12June_EnopTree.arb)and the Full nematode tree (14June_FullNemTree.arb).     Firstly, I severely cut down the Full Nem SSU tree to 1798 taxa, and further to 1796 taxa on June 18thThe Enoplids-only tree now contains 784 taxa.

On the 12th June I edited the following genera in the Full Nem tree, during the process of cutting down taxa:

  • Oscheius
  • Phasmarhabditis
  • Plectus
  • Ancylstoma
  • Protorhabditis
  • Mesorhabditis
  • Schistonchus
  • Aphelenchoides
  • Camallus/Procamallus/Spirocamallus
  • Brugia
  • Toxocara
  • Philonema
  • Poikilolaimus
  • Howardula
  • Pratylenchus
  • Hirschmanniella
  • Basiria
  • Helicotylenchus
  • Cephalob*
  • Zeldia
  • Philometra/Philometroides
  • Nacobbus
  • Eumonhystera
  • Oxydirus
  • Labronema
  • Clavicaudoides
  • Robbea
  • Laxus
  • Caenorhabditis
  • Rhabditis
  • Panagrolaimus
  • Anguina
  • Aphelenchus
  • Adoncholaimus
  • Daptonema
  • Theristus

The Enoplids Only tree contains the following species:

  1. All Enoplea (I searched in tax_embl field for ‘Enoplea’ and selected all results)–includes Enoplida, Dorylaimia, etc.
  2. Chromadoridae (searched in tax_embl field for ‘Chromadoridae’)–represents outgroup
  3. Holly Enoplids (All sequences–no duplicates removed)

The total of the above Enoplids was 989 taxa.  This was then cut down to 784 by removing excess sequences in some taxa (but none of Holly Enoplids were removed) Some of the removed taxa are listed in the file 12June_removedEnop.alpha, but more were removed that aren’t listed in that file.

The alignment was checked for ALL Enoplid sequences by systematically going down the tree.  On June 13th (after all edits were finished), all Enoplid taxa left in tree (784 taxa) were submitted to RaxML:

  • Job 533582 –> 13June_EnopFulledit, GTR + G
  • Job 533583 –> 13June_EnopFulledit, GTR +G +I

On June 14th, the same Enoplid taxa were submitted for Bayesian Inference analysis via the CIPRES web server.  The taxa need to be exported from ARB in PAUP format, and then you need to add a) MrBayes command block b) Remove some info from the header (change RNA to DNA, delete some characters for missing data):

  • File 14_June_EnopBayes.nxs, GTR + G
  • File 14_June_EnopBayes.nxs, GTR + G +I

The next few days have been spent editing ALL taxa alignments in the Full Nem SSU tree.  These edits were finished on June 18th, and the Full Nem tree (no outgroups added yet) was submitted to RAxML:

  • Job 569141 –> 18June_FullNem.phy, GTR + G
  • Job 569160 –> 18June_FullNem.phy, GTR + G + I

I tried submitting the Full Nem tree to the CIPRES server for MrBayes, but it appears the file size is too big for the server to accept–I have to contact the administrators if I want them to run the job.

Lots more alignment edits

Busy few days of alignment edits:

On June 10th I got the first full Nematode RAxML tree back.  It was a complete mess and I’ve now gone back to the alignment and made some major edits. 

Firstly, I removed taxa from the ARB tree that had extremely long branches on the best ML tree from RAxML job #418200.  These removed sequences were as follows:

  • MlgInc16, MLgHisp3, PyncZea2, UncNe311, UncNe482, UncNe469, HtaGly20, WucBanc6, Pd6Strong, Pd6Punct, Pd6Cylin

Next I edited alignments for the following groups:

  • Bunonema, Trichurus, Daptonema, Theristus, Pelodera, Rhabditas, Ditylenchus

The tree ‘SSU_MajorReduc_11June’ reflects the following changes:

  1. I removed all the duplicate sequences that RAxML job #418200 indicated were in the tree.  See file ‘Arb_ident_Seq_7June’ [or something similar] for the list of duplicate sequences with ARB ID/taxonomy information.  After this step there were 2969 sequences in the NJ tree. [I believe this step is also reflected in ARB database ”SSU_10June’]
  2. The Uncultured nematodes were really clogging up the tree and didn’t seem to be adding anything informative.  They added long branches to clades, and a lot of them were just short barcode sequences anyway.  I went back to the tree and removed all the ‘uncultured nematodes’ and ‘uncultured nematodes sourhope farm’. After this step there were 2836 sequences in the NJ tree.
  3. Removed lots of Acrobeloides sequences–sequences I kept in the tree are highlighted in yellow on the printout of ARB taxa in database.  After this step there were 2510 sequences in the NJ tree.

I exported the resulting 2510 taxa into a Phylip file named ‘NewSSU_2510taxa.phy’.  I ran the following jobs in RAxML:

  • Job #514028  was run with GTR +G
  • Job #514044 was run with GTR + I + G

 

I kept going with the reductionist edits, next removing my own list of duplicate SSU sequences–only the sequences with duplicate SSU AND D2D3 were removed from the SSU database (so we can see the topology differences if seqs are identical for SSU but different for D2D3).  Following sequences were removed:

  • AUK 10, 35
  • HUK 1
  • LUK 3
  • OUS 1, 14, 21, 9, 3, 5, 7, 8
  • HCL 10, 11, 12, 2, 27, 5, 7, 9, 20, 32, 23
  • BUS 1, 2, 3, 4, 5, 7
  • NUS 2, 4, 5, 6, 14, 40
  • DBA 4, 2, 3, 5, 6, 7
  • SBA 3, 5, 1, 12
  • NAR 1, 5, 15, 16, 7, 2, 8
  • SUS 2, 21, 10, 15, 6
  • WUS 5, 2, 4
  • LCL 2, 3, 4, 7, 8, 5, 9, 19
  • BCA 10, 1, 31, 47, 5, 6, 21, 23
  • SBN 2
  • Cr 55, 73a, 83a, 84b
  • TCR 1, 3, 12, 130, 139, 188, 202, 158

I also went through the following genera and removed sequences which were shorter than 1000bps.  Generally for these groups, the short sequences were outliers and attached to other clades (presumably because of LBA due to their shortness), and there were plenty of full SSU sequences availalbe in the database which could be used to represent these genera.  I’m now aiming to limit representative sequences to 15 or less per genera (excluding Enoplids) Groups edited were as follows:

  • Strongyloides
  • Rhabdolaimus
  • Bursaphalenchus
  • Longidorus
  • Pratylenchus
  • Tricodorus
  • Paratrichodorus
  • Rotylenchus/Rotylenchulus
  • Ditylenchus
  • Globodera
  • Heterodera
  • Pristionchus
  • Pellioditis
  • Aphelenchoides
  • Steinernema

After all these edits, I rebuilt the NJ tree early morning on June 12.  This NJ tree had 1876 sequences.

I also built the initial LSU database in ARB.  I downloaded the alignment for Phylum Nematoda from SILVA (containing 1544 taxa).  I imported this into ARB and edited the following groups:

  • Caenorhabditis
  • Strongyloides
  • Bursaphelenchus
  • Pratylenchus

The LSU aligments are a MESS.  The conserved regions are OK, but the less conservative regions are not aligned at all.  I’ve been using fast aligner a lot to rectify things within genera, as oftentimes it seem that things align ok if you use this method–although it doesn’t work for everything and some regions are impossible to work with. 

 

More SSU edits

The following groups were edited on May 27/28th:

  • Pellioditis
  • Meloidogyne
  • Neochromadora
  • Trochodorus

The following species look dodgy, and were labelled with ‘sourhope removed’ in the import field before being removed fromm the tree:

  • MlgInc34
  • MlgInc13
  • MlgInc21
  • MlgInc12
  • MlgHap10
  • MlgInc31
  • MlgInc20
  • MlgInc33
  • Sgd Vene2
  • SgdSter6
  • BrgMal14
  • BrgMala9
  • P25Paci5
  • Cb3Rema3
  • P25Paci4
  • UncRhab2
  • UncNem74
  • UncNem68
  • UnnN1414

The following sequences looked like they could be included in the final tree, but they need to be re-aligned using the PT server when that is set up.  They were removed for the time being, labelled with PT search (in the import field I think):

  • S94Batur (Soboliphyme)
  • Eu6Dispa (Eucoleus)
  • M65Tenui (Capillaria)

Today I also ran four trees on the RaxML web server.  I realised (after the fact) that two datasets were incomplete, so the correct job numbers:  Job _____, from the file ‘Enoplids_2_May28’ (Enoplid only tree, with Sabatieria as the outgroup) and Job 416526, from the file ‘Cox1_Ts_replaced’.

Continuing SSU edits

41 more nematodes were removed from the tree.  These included 38 unidientified Sourhope nematodes (labeled before removal), and 3 Canorhabditis specimens: Cb3Rema2, Cb3Bren2, Cb3Reman.  These last 3 specimens had awfully messy alignments that couldn’t be fixed by fast aligner.  The info field revealed that these sequences were resulting from a whole genome shotgun project, and so I suspect they are of dubious quality (or aren’t actually 18S sequences).  Pristonchus sequence P25Paci5 also looks dodgy but I left it in the tree for now.

The following groups were edited on May 25/26:

  • Caenorhabditis
  • Oscheius
  • Sphaerolaimus
  • Terschellingia
  • Prismatolaimus
  • Acrobeloides
  • Diplolaimelloides
  • Pristionchus
  • Spirinia/Robbea/Laxus (neighbours in a group)
  • Toxocara
  • Philonema
  • Plectus
  • Pikilolaimus
  • Anguillicola
  • Anoplostoma
  • Rotylenchus/Rotylenchulus
  • Diphtherophora
  • Campydora
  • Eucephalobus
  • Mesocriconema
  • Eudorylaimus
  • Naccobus

Cleanup of SSU tree in ARB

Today I started by cleaning up the tree a bit.  The Sourhope samples mostly didn’t add any information and just clogged up the tree, so I removed a total of 1922 unidentified Sourhope sequences from the tree–they are accessible in the ARB database, however..  They were marked with ‘sourhope removed’ in the import field.

The following groups were edited today:

  • Mesomermis
  • Monoposthia
  • Xiphinema
  • Pratylenchus
  • Phasmarhabditis
  • Strongyloides–the outlier sequences are a mess, although the main group is fine.  I removed species named ‘SgdRatt5’ and ‘RtdSpeci’ because they looked to be of bad quality.  They didn’t align coherently with fastaligner, and the sequences didn’t agree even the super-conserved regions.

I re-built the NJ tree at 9pm after today’s major edits–hopefully the groups will start falling into place.

Cox1 Treebuilding and SSU Alignment Edits

Cox1 Tree building—23 May 09

Note: in previous Cox1 treebuilding (files from 4th-11th May), the correct Cox1 alignment to be used is named ‘Align_4May_nts_truncnames

 Today I imported all the cox1 sequences I could find from genbank—the full list of species and accession numbers are listed in the file ‘Cox1_Genbank_Acc_nos’.  The files I worked on are as follows:

  • Genbank_cox1nems.mas (protein sequence alignments—you CANNOT back translate this file to nts in MEGA for some reason)
  • Genbank_cox1nems_ntseq.mas (nucleotide sequence alignment—all sequences had to be re-downloaded from Genbank)
  • Cox1_GBplusEnop_23May.mas (combines all my Enoplid sequences with all the Genbank files I downloaded)
  • Cox1_GBplusEnop_23Maya.mas (removed the following bad sequences: BUS 4, JCC 59, LCL 3, OUS 14, TCR 69, AND WUS 3)
  • Cox1_GBplusEnop_23Mayc.mas (added more Pellioditis sequences from Genbank, and also deleted more sequences as follows: B. ruftipennis, S. lupi, H. muscae, S. sp, R sp., P. sp., C. nassatus, T. skrjabini, G. binucleatum [2 seqs], T. native, C. briggsae, R. iyengari.  These sequences were either too short or looked too divergent to be fitted into the alignment.)
  • Cox1_GBplusEnop_23Mayd.mas (additionally removed the following taxa, which were a bit divergent: O. volvulus, D. immitis, B. malayi, R. lichstensfelsi, B. debrae)

I ran likelihood analyses in PhyML using the divergent and tight alignments (23Mayc and 23Mayd, respectively). I first aligned the translated protein sequences in each .mas file using CLUSTAL in MEGA (default parameters).  The output nucleotide alignments are saved as 23Mayc_DNAaln and 23Mayd_DNAaln.

Alignments had to be saved as sequential PHYLIP files to be run through PhyML—alterted parameters were as follows: HKY model, Ts/Tv ratio estimated, invariable sites estimated, 4 categories of nt substitution, sequential input sequences.

 

Nematode groups edited in the ARB alignment on 22 May 08:

  • Trichura
  • Trichinella
  • Bathyeurystomina
  • Enchelidiidae
  • Pareurystomina
  • Thoracostomopsidae
  • Enoploides
  • Calyptronema
  • Oxystomina
  • Thalassoalaimus
  • Litinium

 The species Birgit used in her paper (Meldal 2007, A revised phylogeny of the phylum Nematoda; I searched for all the accession numbers listed in the appendix [saved as ‘Meldal_appendix_acc_nos.pdf’]), were annotated with the phrase ‘Meldal’ under the field ‘publication_doi’.  Two species weren’t in my ARB database: Trigulla aluta and Belondira apitica.  Somehow, though, although Birgit listed 214 species, I only have 211 marked in ARB (minus the two not found, I missed one somewhere?)

SSU Alignment Edits

The following species were edited on May 20th and 21st:

  • Tripyloides
  • Bathylaimus
  • Enoplus
  • Chaetonema
  • Metachromadora
  • Oncholaimus
  • Adoncholaimus
  • Spirinia
  • Viscosia
  • Enoplus
  • Enoplolaimus
  • Mesacanthion
  • Paramesacanthion
  • Sabatieria
  • Daptonema
  • [Mesodorylaimus]
  • [Leptonchus (4:0)]
  • [Bathyeurystomina]

After two days of edits, I rebuilt the entire NJ tree  at 5pm on May 21st.

(Bold genera names mean I search and queried all sequences with that genera name. [Bracket] genera names mean these genera were edited alongside Bold genera as reference sequences–thus, not all sequences with this genera assignment have been searched and edited):