Oyster River Protocol For Transcriptome Assembly

The Oyster River Protocol for (eukaryotic) transcriptome assembly is an actively developed, evidenced based method for optimizing transcriptome assembly. The manuscript corresponding to this protocol is here: https://peerj.com/articles/5428/ In brief, the protocol assembles the transcriptome using a multi-kmer multi-assembler approach, then merges those assemblies into 1 final assembly.

Contact Information

  • Gitter (preferred) ImageLink
  • Email (good): Matthew.MacManes@unh.edu
  • Twitter (good): @MacManes
  • Phone (discouraged): 603-862-4052
  • Office (I’m hiding under my desk): 434 Gregg Hall

Some method you’d like me to benchmark? File an issue

1. Installing the software

In general, the ORP can be successfully and easily installed on Linux operating systems. OSX might work, but I have not tried and it is unsupported. Don’t try this on Windows.

Here are the instructions for installation. Getting stuff installed will be the hard part (the included makefile should do must/all of the hard work, though). Once you have things installed, should be smooth sailing!

How to install the ORP

2. List of dependencies

Sorry there are so many. Assembly is complex.. The makefile should take care of this.

  • Rcorrector, Trimmomatic, Trinity, SPAdes, TransABySS, MCL, Metis, OrthoFuser, BLAST, seqtk, BUSCO (make sure to install databases), TransRate (the ORP version packaged here).
  • Python modules numpy, scipy, biopython, cvxopt.

3. oyster.mk Usage

After activating the orp_v2 conda environment. this command will run the entire ORP in one shot! You can add the `--dry-run` flag to the end to see the individual commands that it will run, if you are curious. The STRAND=RF allows for strand specific assembly in version 2.1.0 of the ORP.

You must use the full PATH to the oyster.mk script for it to work

source activate orp_v2

/path/to/Oyster_River_Protocol/oyster.mk main \
STRAND=RF \
MEM=150 \
CPU=24 \
READ1=SRR2016923_1.fastq \
READ2=SRR2016923_2.fastq \
RUNOUT=SRR2016923

4. strandeval.mk Usage

After activating the orp_v2 conda environment. this command will run the evaluate the strandedeness of your assembly in ORP version 2.1.0. It should help you understand if you have assembled the reads using the proper flags. You can add the `--dry-run` flag to the end to see the individual commands that it will run, if you are curious. The evaluation script was modified from a similar script in the Trinity distribution (https://github.com/trinityrnaseq/trinityrnaseq/wiki/Examine-Strand-Specificity).

See Oyster River Strand Exam Tool for some help in interpreting the results.

You must use the full PATH to the strandeval.mk script for it to work

source activate orp_v2

/path/to/Oyster_River_Protocol/strandeval.mk main \
ASSEMBLY=assembly.fasta \
CPU=24 \
READ1=SRR2016923_1.fastq \
READ2=SRR2016923_2.fastq \
RUNOUT=SRR2016923

5. report.mk Usage

After activating the orp_v2 conda environment. this command will generate a transcriptome assembly report, in ORP version 2.1.0. You can add the `--dry-run` flag to the end to see the individual commands that it will run, if you are curious. It can be run on an assembly generated by any method.

** The LINEAGE= flag must be specified, and the database you specify must be in /path/to/Oyster_River_Protocol/busco_dbs. The Eukaryotic database is there by default.

source activate orp_v2

/path/to/Oyster_River_Protocol/report.mk main \
ASSEMBLY=assembly.fasta \
CPU=24 \
LINEAGE=eukaryota_odb9
READ1=SRR2016923_1.fastq \
READ2=SRR2016923_2.fastq \
RUNOUT=SRR2016923

6. Changelog

Version 2.1

  • Strand specific libraries are now assembled properly, this is enabled by adding the STRAND= flag. Both RF and FR are options, tho RF is the most common option.
  • There is a new tool, strandeval.mk, which helps you evaluate the strandedness of your assembly.
  • There is a new tool, report.mk, which generates an assembly report for you.
  • There is a new tool, quant.mk, which facilities the quantitation procedure.
  • Typing oyster.mk help, report.mk help, strandeval.mk help will print a help message.

Version 2.0

  • The final assembly is now called $RUNOUT.ORP.fasta.
  • Shannon has been removed, and TransABySS has been added in it’s place. MANY users (and myself) have struggled with the RAM use and runtime of Shannon. TransABySS is much faster, and uses much less RAM.
  • Diamond is leveraged for transcript recovery. It had been noted by some users that a few “real” transcripts were getting lost during the OrthoFuser steps.. Diamond, which is run after, recovers those.
  • The use of LinuxBrew has been removed, in favor of conda. Dependencies are now managed by conda. You will need to launch the orp_v2 conda environment before assembling.
  • cd-hit-est is now run as default.