GOM re-de-multiplexing and looking at PhyloSift sprint issues

GAH!!! My GOM demultiplexing attempt #2 failed horribly–forgot to change the Mapping files for each barcode, so ended up with the same (incorrect) sample labels across all files. Re-de-multiplexing now with the corrected commands:

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-2_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-2_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_2_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_2.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-2_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-2_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_2_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_2.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-3_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-3_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_3_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_3.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-3_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-3_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_3_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_3.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-4_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-4_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_4_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_4.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-4_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-4_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_4_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_4.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-5_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-5_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_5_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_5.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-5_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-5_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_5_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_5.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-6_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-6_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_6_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_6.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-6_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-6_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_6_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_6.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-7_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-7_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_7_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_7.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-7_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-7_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_7_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_7.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-8_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-8_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_8_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_8.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-8_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-8_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_8_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_8.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-9_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-9_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_9_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_9.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-9_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-9_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_9_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_9.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-10_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-10_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_10_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_10.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-10_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-10_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_10_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_10.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-11_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-11_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_11_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_11.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-11_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-11_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_11_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_11.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-12_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-12_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_12_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_12.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-12_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-12_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_12_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_12.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

PhyloSift stuff I did today:

  • Closed issue for updating website with dynamic FASTQ quality trimming info
  • Investigated build_marker some more. Seems like read conciler doesn’t like 2 sequences or less in a file, so you have to generate a taxon map manually (which I did — running a test of the new xeno marker set with taxon map to see if I can get any results from raw Illumina data. Command ran: ./phylosift.pl all ~/Desktop/PhyloSift/test_data/xeno_unassembled/xeno_raw_s_2_1_sequence.txt (on iMac). Build_marker fix may need to be hard coded eventually…esp. if users only have a few sequences and want to use it.
Advertisements

PhyloSifting updates

So I spent the weekend looking through yatsunenko data after my discovery about QIIME’s reference-based OTU picking protocol. It turns out that our 16S data is OK (this was a closed-reference based process, where all the reads not matchging greengenes were discarded), but the 16S derived from PhyloSift metagenome analysis was an open-ref process (where QIIME created new de novo clusters for sequences that didn’t match greengenes–e.g. opposite of what yatsunenko did).

Because I wanted to keep things consistent in the manuscript (keeping amplicon and metagenome workflows the same, and directly comparable with the Yatsunenko methods), I re-ran the metagenome data now using a Closed-Reference OTU picking process. That means we have a lot less OTUS, and might see different patterns in the PCoAs.

New data has been downloaded to Edhar: /home/hollybik/yatsunenko_QIIME/qiime_analyses_yatsunenko_metagenomes_7oct

Also have started playing around again with PhyloSift devel branch, particularly for 18S rRNA data and Pam Brannock’s Euk mRNA contigs from the GOM project.

For 18S rRNA data, it seems like PhyloSift is not pulling down all the sequences — I used two PE input files, each about 1.4GB in size, and only got 6 chunks of 18S sequences where the alignDir files were ~3MB each. These file sizes seem a bit small for such a big input files of 18S amplicon data, no? The combined fasta file of demultiplexed GOM sequences didn’t seem to get any useful output (need to re-check this, though).

Also updated the website with info about fastq trimming feature.

Viral Test Datasets

Running into errors on both master and devel branches this morning, so I’m moving forward downloading Joe DiRisi’s viral datasets. Downloading the split files and will have to concatenate everything together. Instructions from Mark Stenglein:

The data is combined into a tar.gz file, which I had to split into 11 files (google docs has a max file size).To get the data, you should download the split files and recombine them by running:

cat pool5_all.fastq.tar.gz.split_* > pool5_all.fastq.tar.gz

Then:

gunzip pool5_all.fastq.tar.gz
tar xvf pool5_all.fastq.tar

The tarfile contains one file per barcode.

**Aaron says run PhyloSift tests in debug mode so we can get verbose info on what is actually going wrong.

PhyloSift – More test data and Devel continuation

Other Shotgun Metagenomes for HMP studies

We needed to get some PE-Illumina HiSeq data, since it looks like the HMP mock communities are only GAIIx single-end data. File sizes for Hiseq data are waaaay bigger though, so I’m starting off with just one SRA dataset:

http://www.ncbi.nlm.nih.gov/sra/SRX025948

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX025/SRX025948/SRR064531/SRR064531.lite.sra

To unpack SRA files, I’ve been running the following command for the toolkit on Edhar:

/home/koadman/software/sratoolkit.2.1.10-ubuntu32/bin/fastq-dump –split-spot SRR064531.lite.sra

Devel branch progress

After running into issues yesterday, I waited for devel to re-build overnight and downloaded the new version this morning. Need to specify the devel marker path by using the command as follows:

./phylosift.pl all -marker_url=”http://edhar.genomecenter.ucdavis.edu/~koadman/phylosift_markers/devel” /home/hollybik/TestData/xeno_assembly_low_cov_NameEdit.fa

PhyloSift Test Datasets and Devel Branch

Human Microbiome Project Mock Pilot Data: http://www.ncbi.nlm.nih.gov/bioproject/48475

454 HMP Mock even sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX030/SRX030841/SRR072233/SRR072233.lite.sra

454 HMP Mock staggered sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX030/SRX030842/SRR072232/SRR072232.lite.sra

Illumina HMP Mock even sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX055/SRX055380/SRR172902/SRR172902.lite.sra

Illumina HMP Mock staggered sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX055/SRX055381/SRR172903/SRR172903.lite.sra

Venter GOS Data: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA13694

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX026/SRX026986/SRR066138/SRR066138.lite.sra

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX026/SRX026987/SRR066139/SRR066139.lite.sra

To download latest devel release:

http://edhar.genomecenter.ucdavis.edu/~koadman/phylosift/devel/phylosift_latest.tar.bz2