PhyloSifting updates

So I spent the weekend looking through yatsunenko data after my discovery about QIIME’s reference-based OTU picking protocol. It turns out that our 16S data is OK (this was a closed-reference based process, where all the reads not matchging greengenes were discarded), but the 16S derived from PhyloSift metagenome analysis was an open-ref process (where QIIME created new de novo clusters for sequences that didn’t match greengenes–e.g. opposite of what yatsunenko did).

Because I wanted to keep things consistent in the manuscript (keeping amplicon and metagenome workflows the same, and directly comparable with the Yatsunenko methods), I re-ran the metagenome data now using a Closed-Reference OTU picking process. That means we have a lot less OTUS, and might see different patterns in the PCoAs.

New data has been downloaded to Edhar: /home/hollybik/yatsunenko_QIIME/qiime_analyses_yatsunenko_metagenomes_7oct

Also have started playing around again with PhyloSift devel branch, particularly for 18S rRNA data and Pam Brannock’s Euk mRNA contigs from the GOM project.

For 18S rRNA data, it seems like PhyloSift is not pulling down all the sequences — I used two PE input files, each about 1.4GB in size, and only got 6 chunks of 18S sequences where the alignDir files were ~3MB each. These file sizes seem a bit small for such a big input files of 18S amplicon data, no? The combined fasta file of demultiplexed GOM sequences didn’t seem to get any useful output (need to re-check this, though).

Also updated the website with info about fastq trimming feature.


