BMC paper work
March 29, 2010 Leave a comment
Submitted the reduced Enoplid (stem/loops) tree to Raxml for the BMC paper (File 29Mar_EnopDaimiStemLoopsCombo.txt on Mac)
My continuing adventures as a research scientist
March 29, 2010 Leave a comment
Submitted the reduced Enoplid (stem/loops) tree to Raxml for the BMC paper (File 29Mar_EnopDaimiStemLoopsCombo.txt on Mac)
March 27, 2010 Leave a comment
Typing up some old pages, and here are some stats I calculated for the OCTUs.
99A Dataset:
95A Dataset:
95B Dataset:
19 March Daimi StemLoops combo submitted to CIPRES; partition is Helix (alignment sites 1-1175) and Loops (alignment sites 1176-2154)
March 23, 2010 Leave a comment
So I’ve been thinking about how we’re going to proceed towards publication with the data analysis we’ve done so far.
Phylogenetics:
Phylogeography
Phylogenetic species concept
March 19, 2010 Leave a comment
For the Metagenomics 95A OCTU phylogeny, the final files for phylogenetic are as follows (stored in folder ‘Aligned_OCTUs_95A’ and derived from the MasterDB_95A.arb):
For the BMC paper, the Enoplid-only tree file
March 19, 2010 Leave a comment
David Lunt Correspondence:
http://treethinkers.blogspot.com/2009/05/for-this-unauthorized-installment-of.html
http://treethinkers.blogspot.com/2009/04/when-we-fail-mrbayes.html
http://www.microbesonline.org/fasttree/
There are several ways to concatenate sequence alignments.
There is a utility to concatenate sequences in this web page
http://phylemon.bioinfo.cipf.es/cgi-bin/utilities.cgi
Mesquite will concatenate alignments
http://www.mesquiteproject.org
Check out this site. You can do lots of useful things including join 2
alignments. The sequences have to be in fasta format though.
http://www.daimi.au.dk/~biopv/php/fabox/index.php
Also the perl script seqCat.pl on this page
http://www.molekularesystematik.uni-oldenburg.de/en/34011.html
It can take several formats as input.
I think that the program DnaSp can concatenate also.
Something to convert alignment formats might be useful to you
http://www.ii.uib.no/~matthewb/tools/align_convert_in.cgi
My Public Folder
http://public.me.com/dhlunt
March 19, 2010 Leave a comment
After going over the data yesterday, Kelley set out 3 main objectives which I should be aiming for with the metagenomics data:
Today I was working out the kinks for Objective 1. Mainly these were related to ARB–sorting the data, exporting it with correct labels and formatting the Fasta files. Still working in the MasterDB_95A ARB database. I’ve separated the OCTUs according to primer set (respectively labelled ‘Dorota’ [2019 spp] and ‘Simon [1923 spp] in the tax_slv field of ARB). I also set up the PT server for this new, aligned database of 95 OCTUs under User4.arb. A few OCTUs in the database had to be re-aligned by hand, and two OCTUS (653 and 3826) were completely nonsensical and I think are error/chimera reads. These two OCTUs have been excluded from tree building, and are labelled as ‘unaligned error’ in the tax_slv field of ARB.
For future information (in case I need to build a filter at some point), Simon’s primer set goes from position 0-10221 and Dorota’s primer set goes from 10222-end of alignment.
Also been working out how to export ARB alignments and retain the taxonomic information (and avoid time-consuming tree annotations). The best way to do it is to export the files in fasta_acc.eft format and then convert this to Phylip using the Readseq tool on CIPRES. Readseq will cut off the names at the first space in the FASTA header, so you need to put in underscores (e.g. between binomial species names) to stop the relevant information from being truncated–Readseq is awesome because it doesn’t truncate at 10 characters like Dataconvert. After converting the file to Phylip, you need to download this file, replace all . with ? in the alignment, and also delete any prohibited characters from the species names, including . – / () Then, voila! Upload the modified Phylip file back to CIPRES and its ready to run on RAxML.
March 17, 2010 Leave a comment
So I am currently trying to organize a reasonable (and accurate) way of constructing these metagenetics phylogenies. Muscle alignments, although fast and easy to implement through the CIPRES portal, are not at all accurate and the resulting alignments look horrendous. Plus, CIPRES’s 3-day limit on all jobs meant that even the (few) OCTUs from the 95A dataset were terminated before the alignment was completed.
Next, on to my original idea for using SILVA’s SINA aligner and the ARB program as a database manager. I had aligned the 95A and 99A OCTUs back in December when we were preparing the NSF grant application; however, I just looked over the 95A ARB database (MasterDB_95A.arb) and there were four OCTUs that somehow got dropped during the alignment process. These four missing 95 OCTUs are: 2661, 2085, 3062, and 3287.
In the 95A dataset, OCTU 3287 (Oncholaimidae) is identical to OCTU 3464 (100% query coverage and sequence identity), and ARB will not incorporate identical sequences into the database. Of, the remaining three OCTUs, 2085 (99 OCTU 21340, 1 read) and 2661 (99 OCTUs 33262 and 38572, 2 reads total) match to Arabidopsis with 100% query coverage and 99% sequence identity, and didn’t align with anything via SINA either–we’ll class these as error reads. The last OCTU (3062) is an N/A match, has no significant similarity in Genbank, and contains only 1 read (as well as only one corresponding 99% OCTU (49011) containing 1 read)–all this suggests it is another error read!
So, final exported alignment contains 3944 OCTUs and was saved as file name ‘95A_AlloctusAlign_17Mar.fasta‘ in the Meta95A folder. However, I couldn’t submit this file to RAxML or CIPRES because the file is too big…
March 12, 2010 Leave a comment
Resurrecting my online lab book so that I remember what I do!
David Lunt sent back his comments regarding the (now combined) BMC papers detailing the Enoplid phylogeny. He pointed out some of my outgroup choices, so I’ve had to go back and re-run a couple of the trees.
Just FYI in ARB, to export secondary structure features:
I removed the Chromadorid outgroups and many Dorylaimid species from the Enoplid-only ML trees and ran the following jobs:
CIPRES portal was rejecting my Bayesian runs, so will try again today.