Goby API development (20130609132956)

Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines.


edu.cornell.med.icb.goby.algorithmic.algorithm Efficient algorithms to derive counts from mapped reads.
edu.cornell.med.icb.goby.algorithmic.data Data structures for algorithms to derive counts from mapped reads.
edu.cornell.med.icb.goby.alignments Classes to read and write alignment information stored in the Goby "compact" format.
edu.cornell.med.icb.goby.alignments.filters Filters used make decisions about whether alignment entries are of sufficient quality to be kept.
edu.cornell.med.icb.goby.alignments.processors Classes to process alignment entries.
edu.cornell.med.icb.goby.config Support for configuring Goby, particularly for defining paths to third party aligners.
edu.cornell.med.icb.goby.counts Support to write and read the Goby compressed counts files.
edu.cornell.med.icb.goby.exception Exceptions specific to the Goby framework.
edu.cornell.med.icb.goby.modes Goby "mode" implementations.
edu.cornell.med.icb.goby.R Classes in this package interface to The R Project for Statistical Computing.
edu.cornell.med.icb.goby.readers Parsers for files in FASTA and FASTQ formats.
edu.cornell.med.icb.goby.readers.vcf Parsers for files in the VCF format.
edu.cornell.med.icb.goby.reads Support for the compact reads format.
edu.cornell.med.icb.goby.stats Classes that implement support for differential expression analysis.
edu.cornell.med.icb.goby.util Utility/Support classes for the goby framework.
edu.cornell.med.icb.goby.util.barcode Support to match barcodes in reads.
samples Source code in this package corresponds to the code snippets shown at http://goby.campagnelab.org.


Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. The program is distributed under the GNU General Public License (GPL). See the download page for the most recent distribution.

Goby provides compressed file formats that are time and space efficient. It also provides a few utilities that support the most common secondary data analyses. Goby defines and uses several file formats. These formats include:

compact reads
An alternative to FASTA/FASTQ, which is fast to parse, unambiguous, compact, and chunckable. Chunkability means that a very large file can be processed in independent chunks without having to traverse the entire file, just the chunk of interest can be read. This property is leveraged by GobyWeb to support parallel alignments.
compact alignments
An alternative to Elan text format, MAQ, or SAM. Goby alignments are chunkable, compact, unambiuous, fast to parse.
A representation of the histogram of read count along a reference sequence, at single base pair resolution. This representation is highly space efficient. Each count transition (positions where the value of the count changes along the histogram) is generally encoded in about 13 bits.
count archives
An archive of counts, one histogram per reference sequence in an alignment. Archives can store histogram data for a complete genome. They are very space efficient, with only about 20Mb needed to store a histogram of reads aligned against the human genome at base pair resolution. In contrast, a wiggle plot stored at 20bp resolution needs about 45Mb.

In addition to these file formats, Goby provides utilities that implement common next-gen data computations. See http://goby.campagnelab.org/ for details.

Goby API development (20130609132956)

Copyright © 2009-2013 Institute for Computational Biomedicine, All Rights Reserved.