Metagenomes (xeno, gom_fungi) and rerunning Open Ref OTUs

Xeno HiSeq data – Talked with David, still trying to figure out if we have xeno in there (David will run the raw reads through PhyloSift). What I’m doing:

  • Running Xeno data through MG-RAST. Trying to get an initial overview of the shotgun data
  • Running Xeno data through QIIME (prefiltering, ref-based picking only at 60%) to pull out any rRNA reads that might be in there. Hopefully we can get a better picture of the microbial community. Command ran:
!pick_closed_reference_otus.py -i /Users/hollybik/Desktop/Data/metagenomes/HB_RN_March2013_XENO_unzip.fasta -r /macqiime/silva_111/rep_set/Silva_111_full_unique.fasta -o /Users/hollybik/Desktop/Data/metagenomes/xeno_qiime60prefilter -p /Users/hollybik/Dropbox/QIIME/qiime_parameters_filterMGforrRNA.txt --parallel -O 2

Also uploaded GOM_Fungi data to MG-RAST to get an idea of what’s in the sample – data is processing through the pipeline now.

Made some final tweaks to the open ref OTU picking protocol on StarCluster. This should hopefully be the final command that will run to completion after changing the SC script in qiime_config:

!pick_open_reference_otus.py -i /gom_data/GOM_concat1.7_rev_demulti_1to12_2.fna -o /gom_data/uclust_openref96_ref_22Aug -r /gom_data/99_Silva_111_rep_set_euk.fasta --parallel -O 8 -s 0.1 -p /gom_data/qiime_parameters_18Sopenref96_GOMamazon.txt --prefilter_percent_id 0.0 -f
Advertisements

Xeno data analysis progress

Transferring info over from Google Docs. Here is the recent progress with the xeno dataset:

7/25/12

Ran Illumina FASTA files through QIIME to mine 18S sequences from raw reads. (to see if we have any hits to Rhizaria). Using reference-based OTU picking against the SILVA 108 database.

pick_reference_otus_through_otu_table.py -i /home/qiime/Desktop/Shared_Folder/xeno_data/xeno_raw_s_2_1_sequence.fasta -r /home/qiime/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -o /home/qiime/Desktop/Shared_Folder/xeno_data/QIIME_18S_Silva_scan/ -t /home/qiime/Silva_108/taxa_mapping/Silva_108_taxa_mapping.txt


7/27/12

The QIIME virtual box kept crashing and not working, so I re-installed it (upgraded to QIIME 1.5 release though), and am trying the parallel script to pick reference OTUs through uclust now.

parallel_pick_otus_uclust_ref.py -i /home/qiime/Desktop/Shared_Folder/xeno_data/xeno_raw_s_2_1_sequence.fasta -o xeno_raw_s_2_1_qiime18S -r /home/qiime/Desktop/Shared_Folder/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -s 0.99 –enable_rev_strand_match


7/30/12

QIIME still giving me trouble, particularly the parallel scripts. Now trying your bog standard ref-based OTU picking, hopefully this will work. If not, I’ll run this on Edhar next because the VB approach seems to be failing me…

pick_otus.py -i /home/qiime/Desktop/Shared_Folder/xeno_data/xeno_raw_s_2_1_sequence.fasta -o xeno_raw_s_2_1_qiime18S -m uclust_ref -r /home/qiime/Desktop/Shared_Folder/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -s 0.99 -t -z


8/2/12

Still been having problems with running the 18S reference picking through QIIME. Have now uploaded the xeno FASTA file of raw Illumina reads to qiime@localhost on Edhar, and running closed-reference picking:

pick_reference_otus_through_otu_table.py -i /home/qiime/data/hbik/xeno_raw/xeno_raw_s_2_1_sequence.fasta -r /home/qiime/data/hbik/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -o /home/qiime/data/hbik/xeno_18S/ -t /home/qiime/data/hbik/Silva_108/taxa_mapping/Silva_108_taxa_mapping.txt –parameter_fp /home/qiime/data/hbik/xeno_raw/qiime_parameters_xeno.txt -f


8/6/12

Also running the xeno contigs on the latest build of PhyloSift devel (phylosift_devel_20120806) and master (phylosift_master_20120806) using both the core markers and the devel markers. Yesterday I noticed that the two marker sets don’t give you overlapping results in terms of the contigs they pull out – even when you’re apparently using the same markers (e.g. Viral – although it might be different *versions* of the markers)

No hits to devel branch for some reason – xeno data not producing any outputs at all?!

Getting hits using devel branch and devel markers – but not all the files in the alignDir are giving me .jplace files (16S/18S specifically)


8/10/12

I was getting problems with the FASTA file on Edhar, even after fixing all the parameter files it said that QIIME needed…so I’ve upped my memory on my iMac and am trying again with ref-based OTU picking against SILVA

pick_reference_otus_through_otu_table.py -i /home/qiime/Desktop/Shared_Folder/xeno_data/xeno_raw_s_2_1_sequence.fasta -r /home/qiime/Desktop/Shared_Folder/Silva_108/rep_set/Silva_108_rep_set_Eukarya_only.fna -o /home/qiime/Desktop/Shared_Folder/xeno_data/xeno_18S/ -t /home/qiime/Desktop/Shared_Folder/Silva_108/taxa_mapping/Silva_108_taxa_mapping.txt

Runinng xeno data using Master markers (master branch 20120810) and Devel markers (devel branch 20120810). Going to compare the outputs and see how the markers used affect the taxonomy summary and scaffold mining (e.g. clarify the discrepancies I was seeing across old PhyloSift builds)


Next Steps

  1. Look at probability distributions for a subset of contigs, across time for PhyloSift analyses. Do they get better or worse?
  2. Run Illumina FASTA files through QIIME to mine 18S sequences from raw reads. See if we have any hits to Rhizaria.
  3. Look in the Eukaryote-specific .jplace files to see if we are getting hits to the foraminifera for the xeno data. Need to re-run on non-updated markers (delete .updated files from devel markers)
  4. Look for xeno data in raw reads – use 18S built marker packages (still haven’t resolved this – GitHub issue #322)
  5. Run raw reads through PhyloSift to see difference in taxonomy summary versus contigs (raw reads taking up too much memory to run through PhyloSift at the moment).

GOM re-de-multiplexing and looking at PhyloSift sprint issues

GAH!!! My GOM demultiplexing attempt #2 failed horribly–forgot to change the Mapping files for each barcode, so ended up with the same (incorrect) sample labels across all files. Re-de-multiplexing now with the corrected commands:

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-2_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-2_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_2_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_2.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-2_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-2_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_2_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_2.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-3_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-3_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_3_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_3.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-3_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-3_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_3_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_3.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-4_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-4_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_4_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_4.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-4_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-4_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_4_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_4.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-5_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-5_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_5_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_5.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-5_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-5_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_5_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_5.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-6_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-6_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_6_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_6.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-6_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-6_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_6_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_6.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-7_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-7_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_7_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_7.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-7_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-7_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_7_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_7.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-8_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-8_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_8_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_8.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-8_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-8_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_8_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_8.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-9_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-9_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_9_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_9.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-9_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-9_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_9_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_9.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-10_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-10_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_10_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_10.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-10_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-10_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_10_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_10.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-11_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-11_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_11_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_11.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-11_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-11_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_11_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_11.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20
split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-12_1_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-12_1_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_12_1/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_12.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

split_libraries_fastq.py -i /home/ubuntu/fastx_trimmed_files/1926-KO-12_2_trimmed.txt -b /home/ubuntu/fastx_trimmed_files/1926-KO-12_2_barcode.txt -o /home/ubuntu/GOM_demultiplexed/KO_12_2/ -m /home/ubuntu/QIIME_mapping_files/1926_KO_12.txt –barcode_type=5 –max_barcode_errors=1.5 -q 20

PhyloSift stuff I did today:

  • Closed issue for updating website with dynamic FASTQ quality trimming info
  • Investigated build_marker some more. Seems like read conciler doesn’t like 2 sequences or less in a file, so you have to generate a taxon map manually (which I did — running a test of the new xeno marker set with taxon map to see if I can get any results from raw Illumina data. Command ran: ./phylosift.pl all ~/Desktop/PhyloSift/test_data/xeno_unassembled/xeno_raw_s_2_1_sequence.txt (on iMac). Build_marker fix may need to be hard coded eventually…esp. if users only have a few sequences and want to use it.

Looking for Xeno DNA in the metagenome

Trying to figure out if we actually have any Xeno DNA in the sequencing run. First thing I’m doing is looking through the .jplace files for the Parfrey marker genes. The closest relatives for the Xenophyophore are Rhizaria, and the species included in the Parfrey study are listed below:

PhyloSift Marker

Rhizaria Species

14-3-3
  • Reticulomyxa filosa
40S (none)
Actin
  • Ammonia sp. T7
  • Ovammina opaca
  • Reticulomyxa filosa
  • Corallomyxa tenera
  • Massisteria marina
  • Gymnophrys sp
  • Bodomorpha minima
  • Dimorpha sp.
  • Capsellina sp.
  • Plasmodiophora brassicae
  • Proleptomonas faecicola
  • Thaumatomonas seravini
  • Gymnochlora stellata
  • Allogromia sp.
  • Gromia oviformis
  • Spongomonas
  • Lotharella globosa
  • Clathrulina elegans
  • Hedriocystis reticulata
  • Euglypha rotunda
  • Amphisorus hemprichii
  • Globobulimina turgida
  • Bonamia ostreae
  • Haplosporidium costale, H. louisiana, H. nelsoni
  • Minchinia chitonis
  • Urosporidium crescens
  • Lecythium sp.
  • Sorosphaera veronicae
  • Spongospora subterranea
  • Collozoum inerme
  • Thalassicolla pellucida
Atub
  • Ovammina opaca
  • Reticulomyxa filosa
  • Corallomyxa tenera
  • Massisteria marina
  • Gymnophrys sp
  • Bodomorpha minima
  • Dimorpha sp.
  • Capsellina sp.
  • Proleptomonas faecicola
  • Thaumatomonas seravini
  • Rhabdammina cornuta
Btub
  • Ovammina opaca
  • Reticulomyxa filosa
  • Corallomyxa tenera
  • Massisteria marina
  • Gymnophrys sp
  • Bodomorpha minima
  • Dimorpha sp.
  • Capsellina sp.
  • Plasmodiophora brassicae
  • Proleptomonas faecicola
  • Thaumatomonas seravini
  • Allogromia sp.
  • Rhabdammina cornuta
  • Gromia oviformis
  • Spongomonas
  • Astrammina rara
Ef1a
  • Corallomyxa tenera
Ef2
  • Corallomyxa tenera
Enolase
  • Reticulomyxa filosa
Grc5
  • Corallomyxa tenera
  • Reticulomyxa filosa
Hsp70cyt
  • Reticulomyxa filosa
Hsp90
  • Reticulomyxa filosa
  • Massisteria marina
  • Gymnophrys sp
MetK
  • Reticulomyxa filosa
Rps22a (none)
Rps23a
  • Gymnochlora stellata
Tsec61
  • Plasmodiophora brassicae