Bioconductor Package Guidelines

Introduction

The Bioconductor project strives to promote high-quality, well documented, and interoperable software. These guidelines help to achieve this object; they are not meant to put undue burden on package authors, and authors having difficultly satisfying guidelines should seek advice on the bioc-devel mailing list.

Package maintainers are urged to follow these guidelines as closely as possible when developing Bioconductor packages.

Correctness, Space and Time

Bioconductor packages must pass R CMD build (or R CMD INSTALL --build) and pass R CMD check with no errors and no warnings using a recent R-devel. Authors should also try to address all notes that arise during build or check.

Do not use filenames that differ only in case, as not all file systems are case sensitive.

The source package resulting from running R CMD build should occupy less than 2MB on disk. The package should require less than 5 minutes to run R CMD check. This includes the time required to build the Sweave vignette.

Package Name

Choose a descriptive name. An easy way to check whether your name is already in use is to check that the following command fails

source("http://bioconductor.org/biocLite.R") biocLite("mypackage")

License

The "License:" field in the DESCRIPTION file should preferably refer to a standard license (see opensource.org or wikipedia) using one of R's standard specifications. Be specific about any version that applies (e.g., GPL-2). Core Bioconductor packages are typically licensed under Artistic-2.0. To specify a non-standard license, include a file named LICENSE in your package (containing the full terms of your license) and use the string "file LICENSE" (without the double quotes) in the "License:" field of your DESCRIPTION file.

Package Content

Packages must

  • Contain a Sweave-style vignette that demonstrates how to use the package to accomplish a task (more on this below).
  • Include examples in all man pages.
  • Specify one or more biocViews categories.
  • Contain a NAMESPACE file to define the functions, classes, and methods that are imported into the name space, and exported for users.
  • Contain (literature) references to the methods used as well as to other similar or related packages.
  • Make use of appropriate existing packages (e.g., biomaRt, AnnotationDbi, Biostrings) and classes (e.g., ExpressionSet, AnnotatedDataFrame, RangedData, RLE, DNAStringSet), and avoid duplication of functionality available in other Bioconductor packages.
  • Document data structures used and, if different from data structures used by similar packages, explain why a different data structure was used.
  • Contain only code that can be redistributed according to the package license. In particular, packages may not include any code from Numerical Recipes.
  • Not contain unnecessary files such as .DS_Store, .project, .svn, cache file, log files, etc.

Package Dependencies

Reuse, rather than re-implement or duplicate, well-tested functionality from other packages. Specify package dependencies in the DESCRIPTION file, listed as follows

  • Imports: is for packages that provide functions, methods, or classes that are used inside your package name space. Most dependencies are listed here.
  • Depends: is appropriate when the package whose functionality you are using does not have a name space. In this case, use fully qualified variables (pkg::variable). Depends: is also appropriate when a package is used in the example section of a man page. It is very unusual for a package to list more than three packages as 'Depends:'.
  • Suggests: is appropriate for packages used in your vignette.

Packages should specify the R version on which they depend. This is usually the current development version.

S4 Classes and Methods

We recommend the following structure/layout:

  1. All class definitions in R/AllClasses.R
  2. All generic function definitions in R/AllGenerics.R
  3. Methods are defined in a file named by the generic function. For example, all show methods would go in R/show-methods.R. This is not written in stone, but tends to provide a useful organization. Sometimes a collection of methods that provide the interface to a class are best put in a SomeClass-accessors.R file.

A Collates: field in the DESCRIPTION file may be necessary to order class and method definitions appropriately during package installation.

Vectorized Calculations

Many R operations are performed on the whole object, not just the elements of the object (e.g., sum(x), not x[1] + x[2] + ...). In particular, relatively few situations require an explicit for loop.

End-User Messages

  • message() communicates diagnostic messages (e.g., progress during lengthy computations) during code evaluation.
  • warning() communicates unusual situations handled by your code.
  • stop() indicates an error condition.
  • cat() or print() are used only when displaying an object to the user, e.g., in a show method.

The Sweave Vignette

A vignette demonstrates how to accomplish non-trivial tasks embodying the core functionality of your package. A Sweave vignette is an .Rnw file that contains LaTeX and chunks of R code. The R code chunk starts with a line <<>>=, and ends with @. Each chunk is evaluated during R CMD build, prior to LaTeX compilation. Refer to Writing package vignettes for technical details.

A vignette provides reproducibility: the vignette produces the same results as copying the corresponding commands into an R session. It is therefore essential that the vignette embed R code between <<>>= and @; short-cuts (e.g., using a LaTeX verbatim environment, or using the Sweave eval=FALSE flag) undermine the benefit of vignettes.

All packages are expected to have at least one Sweave vignette.

Citations

Appropriate citations must be included in help pages (e.g., in the see also section) and vignettes; this aspect of documentation is no different from any scientific endeavor. The file inst/CITATION can be used to specify how a package is to be cited.

Version Numbering

All Bioconductor packages use an x.y.z version scheme. The following rules apply:

  • x is usually 0 for packages that have not yet been released.
  • y is even for packages in release, and odd for packages in devel.
  • z is incremented whenever committing changes to a package.

For more details, see Version Numbering Standards

C or Fortran code

If the package contains C or Fortran code, it should adhere to the standards and methods described in the System and foreign language interfaces section of the Writing R Extensions manual. In particular:

  • Use internal R functions, e.g., R_alloc and random number generators, over system supplied ones.
  • Use C function registration (See the Registering native routines)
  • Use R_CheckUserInterrupt in C level loops when there is a chance that they may not terminate for certain parameter settings or when their run time exceeds 10 seconds with typical parameter settings, and the method is intended for interactive use.
  • Make judicious use of Makevars and Makefile. These are often not required at all (See the Configure and cleanup).

Use of external libraries whose functionality is redundant with libraries already supported is strongly discouraged. In cases where the external library is complex the author may need to supply pre-built binary versions for some platforms.

Duplication of Packages in CRAN and Bioconductor

Authors are strongly discouraged from placing their package into both CRAN and Bioconductor. This avoids burdening the author with extra work and confusing the user.

Package Maintainer Responsibilities

Acceptance of packages into Bioconductor brings with it ongoing responsibility for package maintenance. These responsibilities include:

  1. Subscription to the bioc-devel mailing list.
  2. Response to bug reports and questions from users regarding your package, as posted on the bioconductor mailing list.
  3. Package Maintenance through software release cycles, including prompt updates to software and documentation necessitated by underlying changes in R.

Package Guidelines (last edited 2010-02-04 19:50:53 by MartinMorgan)