QIIME on Starcluster – configure start_parallel_jobs_sc.py script

Some of my jobs weren’t finishing via iPython notebook run on StarCluster.  After looking through the QIIME documentation, I realized I needed to change the main qiime_config file – Apparently Starcluster uses a different method to distribute jobs in paralell, requiring a change to the script filepath.

QIIME config file needs to look like this (pointing to start_parallel_jobs_sc.pysee bottom of QIIME AWS tutorial page here):

cluster_jobs_fp /home/ubuntu/qiime_software/qiime-1.7.0-release/bin/start_parallel_jobs_sc.py


Installing FASTX Toolkit

Not sure how I got away so long without FASTX toolkit on my iMac. Followed these install instructions to get it set up on my computer (need to quality trim the 18S chimera sequences):

First, libgtextutils:

curl -O http://hannonlab.cshl.edu/fastx_toolkit/libgtextutils-0.6.tar.bz2
tar xvjf libgtextutils-0.6.tar.bz2
cd libgtextutils-0.6
sudo make install

Then the FASTX-Toolkit – note the step to define PKG_CONFIG_PATH:

curl -O http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit-0.0.13.tar.bz2
tar xjvf fastx_toolkit-0.0.13.tar.bz2
cd fastx_toolkit-0.0.13
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
sudo make install

Then I used fastq_quality_filter to remove low quality reads:

fastq_quality_filter -i chim_demux.extendedFrags_primersremoved.fastq -o chim_demux.extendedFrags_primersremoved_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v

Quality cut-off: 20
Minimum percentage: 80
Input: 2829756 reads.
Output: 2746576 reads.
discarded 83180 (2%) low-quality reads.

This site has a good tutorial for using FASTX trim to quality filter reads. And this site is where I got the install help for the FASTX toolkit.

Next step was to convert FASTQ to FASTA:

fastq_to_fasta -i chim_demux.extendedFrags_primersremoved_fastxtrimmed.fastq -o chim_demux.extendedFrags_primersremoved_fastxtrimmed.fasta -n -v -Q 33

Input: 2746576 reads.
Output: 2746576 reads.

I was originally getting an “invalid quality score value” error, but upon further investigation it seems like you need to use the -Q 33 parameter to indicate the new encoding on Illumina quality values (see here: http://seqanswers.com/forums/archive/index.php/t-7399.html)

Installing macqiime and R packages

Finally took the leap and did  “native” install of macqiime on my MacAir and iMac (fed up with using VirtualBox…).

Followed all the install instructions for Macqiime 1.7.0, available here. Some notes on the install

  • Downloaded the full suite of Greengenes files (gg_12_10) and moved into the /macqiime/greengenes/ root folder
  • Installed legacy BLAST 2.2.22 using the instructions here
  • Downloaded the 32-bit versions of usearch 5.2.236  (renamed “usearch”) and the usearch 6.1 executable (renamed “usearch61”); both of these I put in the /macqiime/bin/ root folder since that directory is already in my path.
  • Installed a whole bunch of R packages (after installing R from one of the links on this page: http://cran.r-project.org/mirrors.html and then installing RStudio from http://www.rstudio.com/ide/download/):
    • install.packages(“ggplot2”)
    • install.packages(“XML”)
    • install.packages(“tm”)
    • install.packages(“RCurl”)
    • install.packages(“plyr”)
    • install.packages(‘randomForest’)
    • install.packages(‘ape’)
    • install.packages(‘vegan’)
    • install.packages(‘optparse’)
    • install.packages(‘gtools’)
    • install.packages(‘klaR’)
    • install.packages(‘RColorBrewer’)
    • install.packages(“labdsv”)
  • Didn’t install AmpliconNoise – no more 454 data!!
  • Installed TopiaryExplorer (needed to install Java runtime environment on my MacAir)
  • Installed Cytoscape 3.01

Installing iPython Notebook and StarCluster

Finally taking this leap to get iPython Notebook and StarCluster up and running on my MacAir laptop. The QIIME tutorial for this process can be found here. Mainly I didn’t find the QIIME page to have enough detail, so I followed instructions on this NYU install tutorial for Mac OS X, as follows:

  • Install command line tools on Xcode (Preferences–>Downloads)
  • Checked my version of Python (yes, I have Python >2.7)
  • Installed SciPy superpack from this site (used instructions for OS X 10.8)
  • Installed PySide pyside-1.1.0-qt47-py27apple.pkg from this site
  • Installed readline, pyzmq, pygments, tornado using this command: sudo easy_install readline pyzmq pygments tornado
  • Installed iPython from Github using this command: sudo easy_install https://github.com/ipython/ipython/tarball/master
  • Tested iPython install success by typing this command: ipython notebook –pylab inline (it worked! HTML notebook launched!)
  • Followed the manual install instructions for StarCluster (see quick start install guide below) – had to download tarball, unzip and run script.
  • Followed the StarCluster config instructions (but not sure if I did this right…)

Questions I still have:

  • Do I need to install QT? Seems like I don’t need it if I’m using iPython Notebook in a web browser, but can’t find much info about possible QT dependencies.
  • Need some advice on the StarCluster config file – AWS settings and iPython plugin options
  • Then I need to figure out how to actually link iPython to AWS and start doing my analyses…

Useful Websites: