Playing around with GOM Illumina rRNA data

So I’ve been trying to come up with a useful workflow 18S rRNA eukaryotic data, aiming to incorporate a number of different tools (PhyloSift and QIIME) to compare biological comparisons from multi-sample comparisons. Test dataset has been the Illumina GOM data.

  1. The QIIME analysis broke at the alignment step – using PYNAST against the SILVA reference database, I ended up with ALL sequences failing and an empty alignment file.
  2. Transferred over to ssu-align on the iMac, which has been running for a couple days and is slowly making its way through the rep_set OTUs from QIIME (99% de novo clustered OTUs)
  3. Now moving over to hmm-align within phylosift (align mode) to see if I can get a speedup on this end. Wasn’t working the other day, so I am liaising with Guillaume to push this forward.

GOM paper brainstorm

QIIME analyses

  • Another round of demultiplexing – using QIIME 1.5.0 an upping the number of barcode mismatches
  • OTU picking
    • de novo (99%)
    • Open-Reference (SILVA database and de novo 99%)
  • Alignments
    • PYNAST
    • ssu-align (iMac)
    • hmm-align (PhyloSift)
  • Taxonomy assignment
    • RDP
    • BLAST
    • Phylo Placement?

PhyloSift analyses

  • Single-end analysis (OTUs from QIIME)
    • 1_1 dataset (big file)
    • 1_2 dataset (much smaller, because of read quality issues?)
  • Paired-end analysis (raw Illumina data)
    • Should I trim off barcodes before running PE-analysis? I would lean toward “no”, because the hmm-align step is going to trim off non-matching parts of the reads.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: