Metagenetics Phylogeny work

So I am currently trying to organize a reasonable (and accurate) way of constructing these metagenetics phylogenies.  Muscle alignments, although fast and easy to implement through the CIPRES portal, are not at all accurate and the resulting alignments look horrendous.  Plus, CIPRES’s 3-day limit on all jobs meant that even the (few) OCTUs from the 95A dataset were terminated before the alignment was completed.

Next, on to my original idea for using SILVA’s SINA aligner and the ARB program as a database manager.  I had aligned the 95A and 99A OCTUs back in December when we were preparing the NSF grant application; however, I just looked over the 95A ARB database (MasterDB_95A.arb) and there were four OCTUs that somehow got dropped during the alignment process.  These four missing 95 OCTUs are: 2661, 2085, 3062, and 3287.

In the 95A dataset, OCTU 3287 (Oncholaimidae) is identical to OCTU 3464 (100% query coverage and sequence identity), and ARB will not incorporate identical sequences into the database.  Of, the remaining three OCTUs, 2085 (99 OCTU 21340, 1 read) and 2661 (99 OCTUs 33262 and 38572, 2 reads total) match to Arabidopsis with 100% query coverage and 99% sequence identity, and didn’t align with anything via SINA either–we’ll class these as error reads.  The last OCTU (3062) is an N/A match, has no significant similarity in Genbank, and contains only 1 read (as well as only one corresponding 99% OCTU (49011) containing 1 read)–all this suggests it is another error read!

So, final exported alignment contains 3944 OCTUs and was saved as file name ‘95A_AlloctusAlign_17Mar.fasta‘ in the Meta95A folder.  However, I couldn’t submit this file to RAxML or CIPRES because the file is too big…


