More updates to PhyloSift website

Finished updating the outputs section of the PhyloSift website. Going to close the GitHub issue but here’s what I still need to update:

  • Column 2 of sequence_taxa files – what is it?
  • Confirm that marker_summary.txt in main output directory summarizes the alignDir marker info
  • More clear info on the .jnlp and .xml output files that Aaron is working on for the fat tree visualization.

Phylosift website – update info about output files

Went through the updated list of output files with Guillaume. Here are the deets for all the files now being created in the PS_temp directory for each run:


protein coding markers

  • *1.unmasked – aligned protein with no masking – not used in downstream analyses
  • *codon.updated.1.fasta – nucleotide, aligned and masked
  • *.newCandidate.aa.1 – same file (unaligned version of hits)
  • *.updated.1.fasta – protein, aligned and masked

.1 refers to chunk number – so if you have duplicate files with .2, etc


  • .1.unmasked – aligned nucleotide ith no masking
  • .short.1.fasta – alignment using cmalign, with masking
  • .long.1.fasta – will we have this too? and sep file for unmasked long sees?

Do we get two .unmasked files if we have a mix of short and long sequences?

no unaligned file in alignDir for 16S/18S data


  • marker_summary.txt – how many hits per marker for each gene

Search mode –keep_search – flag that retains all the search info in the BLAST directory; automatically retains the temp blast files

–keep_search – just undocumented, need to document this under output for all mode


  • enolase.codon.updated.sub1.1.jplace — nucleotide jplace
  • enolase.updated.1.jplace — aa jplace

How is the information from codon and aa trees used in phylosift summaries?

Main output directory – Krona reports

  • filename_allmarkers.html – all markers in treeDir with jplaces
  • filename.html – core markers DNGNGWU only
  • filename.jnlp – javascript of FAT tree visualization
  • filename.xml – fat tree viz itself?

Main output directory – summary files

  • marker_summary.txt – based off of the taxon summary files
  • run_info.txt – going to be updated in the next few days; lists commands and md5 sums and step completion status (start/end time and duration for each chunk at each step – search, align, place, summarize)
  • sequence_taxa_summary.1.txt – summary of chunk
  • sequnce_taxa_summary.txt – combined info from all chunks
  • sequence_taxa.1.txt – summary of chunk
  • sequence_taxa.txt – combined information from all the chunks
  • taxa_90pct_HPD.txt –
  • taxasummary.txt


PhyloSift paper and web updates, continued

More progress toward the PhyloSift paper and web updates. Here are the things I need to follow up on this weekend:

Intro edits – say something about:

  • If you want to test for an organism’s presence, people would do a BLAST search with a homology test. But doesn’t tell you if you have several hits, or about the evolutionary relationship of what’s in your sample.
  • People also to tree based analysis too right now (manual inspection) – need a better way to do this (a method for the HTP era), and something that is statistically robust way of doing this (not just exploratory methods).

***Put each sentence on a separate line – for intro for Latex purposes

***Go through Aaron’s Nov 1 email outline to make sure I mention all the necessary things in the intro


Test out Bayes factor test and start putting together the web tutorial 

run all mode with –bayes flag

and then run test_lineage mode with the relevant flags.

Aaron is going to prep an analysis for the paper, and then we can make a tutorial with this data

PhyloSift website updates

General questions:

  • What are our “levels” of PhyloSift markers – need to standardize terminology
    • “elite markers” – only DNGNGWU?
    • “core markers” – all devel markers?
    • “extended markers” – protein families; additional download
  • What is the ‘web’ folder that now appears in PS download folder?  – Eric’s scripts; Aaron will remove.
  • Still finding the –help dialogue flag structure really confusing. These do nothing: commands: list the application’s commands help: display a command’s help screen

Intro Tutorial – web updates needed

  • Update screenshot of output directory with newest collection of output files
  • Update the names of the krona files that get generated in the output directory
  • Add info about the automatic fat tree visualization that Aaron just added to outputs

Bayes factor tests – new page creation

  • Screenshot of equation used
  • Explanation about what Bayes factor tests do
  • Commands/workflow needed to run Bayes factor tests
  • How to interpret the outputs generated – biological context, uncertainty/detection thresholds

Output files – major page update

  • Go through outputs from HMP data and update and explain new file types

General Web Updates

  • Check all pages to make sure double dashes are inserted – Done already – Intro tutorial, phylosift RC file page, Monkey, Kangaroo, DBupdate
  • Update example command lines on Monkey, Kangaroo, DBupdate (e.g. with new flag structure). Done- 11/9/12