- Introduction
- Package Name
- License
- Correctness
- Size Requirements
- Time Requirements
- Package Contents
- Package Dependencies
- Using S4 Classes and Methods
- The Sweave style vignette
- Citations
- Version Numbering
- C or Fortran code
- Duplication of packages in CRAN and Bioconductor
- Time allowed to get your package through the approval process
- Responsibilities of a BioC Package Maintainer
- Further Reading: Seth Falcon's thoughts on coding Style
Introduction
Package maintainers (and contributors) are urged to follow these guidelines as closely as possible when developing Bioconductor packages.
The Bioconductor project wishes to promote high-quality, well documented, and interoperable software. These guidelines aren't meant to put undue burden on package authors. If you are having trouble meeting one of the guidelines, tell us!
Package Name
The package name should be at least three letters long, and be descriptive. Of course, you should not pick up a name already in use on CRAN, Omegahat or Bioconductor. An easy way to check this is to try to install a package of that name with:
source("http://bioconductor.org/biocLite.R")
biocLite("mypackage")or with:
install.packages("mypackage", rep="http://www.omegahat.org/R")Both should fail.
License
The "License:" field in the DESCRIPTION file should preferably refer to a standard license like one of the many Open Source Licenses (another good place to look is this comparison of free software licences). If a license is versioned (GPL, for example), be specific about the versions that apply (2, >=2, 3). Please have a look at what other Bioconductor software packages use. If you choose a license already in use in Bioconductor, then use exactly the same spelling (just copy/paste it into your DESCRIPTION file).
If you really want to use a "home-made" license (i.e. a license that you wrote yourself so that it best fits your needs) then include a LICENSE file in your package (containing the full terms of your license) and use the string "file LICENSE" (without the double quotes) in the "License:" field of your DESCRIPTION file.
Correctness
- Can be built (with R CMD build or R CMD INSTALL --build) and pass R CMD check with no errors and no warnings using a recent R-devel.
- Authors should also try to deal with all notes that arise during build or check.
- Don't use filenames that differ only in case. This would lead to a file name case conflict when people extract the source tarball of your package (or checkout its source from svn) on a case insensitive file system. Note that this situation is not detected by R CMD check.
Size Requirements
- The source package resulting from running R CMD build should occupy less than 2MB on disk.
- Ideally, the directory containing package sources should also be less than 2MB.
Time Requirements
- Require less than 5 minutes to run R CMD check. This includes the time required to build the Sweave vignette.
Package Contents
We require that the package
- Contains a Sweave style vignette that demonstrates how to use the package to accomplish a task (more on this below).
- Includes examples in the man pages.
- Specifies one or more biocViews categories.
- Contains a NAMESPACE file to define the functions, classes, methods that should be exported for users.
- Uses only the newer .db style annotation packages. For example, don't depend on 'hgu95av2', instead use 'hgu95av2.db' etc.
- Contains (literature) references to the methods used as well as to other similar or related packages.
- Documents data structures used and, if different than the data structures used by similar packages, explains why a different data structure was used.
- Works with (i.e. uses and define methods for) ExpressionSet and AnnotatedDataFrame objects for the appropriate kinds of data. Please no longer use the exprSet and phenoData classes, which have been deprecated in favor of ExpressionSet and AnnotatedDataFrame, respectively. (See the original announcement on the Bioc-devel mailing list.)
- Contains only code that can be redistributed according to the package license. In particular, packages may not include any code from Numerical Recipes.
Package Dependencies
Using well-tested functionality from other packages is encouraged over re-implementation or duplication. When you use functionality from another package, please think about whether you really need to attach that package to the global search path (i.e. the package is in the 'Depends' field of the DESCRIPTION file), or whether it is sufficient to import some or all of its contents into a namespace only used by the code in your package (i.e. the package is in the 'Imports' field of the DESCRIPTION file). The advantage of the second is less pollution of the search path seen by the user of your package.
There are three categories of package "dependencies" ('Depends', 'Imports', and 'Suggests') and they are recommended to be assigned in the following manner:
- Depends - packages whose objects are visible to the end-user. These packages are typically used in the examples section of the man pages.
- Imports - packages whose objects are hidden from the end-user. These packages contain functions that are used by your packages functions.
- Suggests - packages that are used to create the vignette or contains optional functionality that is regulated by require statements.
Anyone using more than 5 dependencies in their package should expect to give some very good justifications for this.
Using S4 Classes and Methods
If you are using S4 classes or methods, add a 'Collate' field to your package's DESCRIPTION file. Generally, class definitions come first, generics second, methods third, and then everything else. For the files in the 'R' source code directory of your package, we recommend the following structure/layout:
- All class definitions in R/AllClasses.R
- All generic function definitions in R/AllGenerics.R
- Methods are defined in a file named by the generic function. For example, all show methods would go in R/show-methods.R. This is not written in stone, but tends to provide a useful organization. Sometimes a collection of methods that provide the interface to a class are best put in a someClass-accessors.R file.
The Sweave style vignette
Please refer to the "1.4 Writing package vignettes" section of the Writing R Extensions manual for important technical details about vignettes.
- What's a "real" Sweave style vignette?
- A "real" Sweave style vignette is a .Rnw file that contains chunks of code that are evaluated by R at 'R CMD build' time or on demand by the user with the Sweave command. Those chunks of code are delimited by <<>>= and @. The code contained in those chunks should show a typical workflow i.e. the commands (+ output) issued by a user during a typical interactive session with the package. The vignette should preferably demonstrates how to use the package to accomplish a non-trivial task.
- Why is it important?
- The key property of a vignette is "reproducibility": anybody should be able to copy and paste the R code from the vignette and get _exactly_ the same output. Reproducibility can only be guaranteed by using <<>>= / @ blocks in the .Rnw file, not begin{verbatim} / end{verbatim} blocks. An important requirement for any Bioconductor package is to provide at least 1 "real" Sweave style vignette.
Citations
Users are encouraged to use citations in their vignettes. Those who don't are likely to be asked why not? It's always a good idea to give credit to those whose work you have built on.
Version Numbering
All Bioconductor packages should use an x.y.z version scheme. The following rules apply:
- The y number should be odd for packages in devel and even for packages in release. This makes it easier for users to know whether the package they have installed is release or devel.
- We encourage package maintainers to increment z whenever committing changes to a package in devel. Any change committed to a released package, no matter how small, must bump z.
For more details, see Version Numbering Standards
C or Fortran code
If the package contains C or Fortran code it should adhere to the standards and methods described in the Writing R Extensions manual. In particular:
- Use of internal R functions, especially random number generators over system supplied ones.
- Use of R_alloc over malloc.
- Use of the C function registration mechanism.
Use of external libraries whose functionality is redundant with libraries already supported is strongly discouraged. In cases where the external library is complex the author may need to supply prebuilt binary versions for some platforms.
Duplication of packages in CRAN and Bioconductor
All package authors are strongly discouraged from placing their package into both CRAN and Bioconductor. This is because this creates an extra burden on the package maintainers and can create problems for us if both packages are not kept up to date with each other. Packages that are placed in both repositories are expected to be kept up to date and maintained at both locations. If packages are not kept up to date at both locations, then we will be forced to drop the one that is in Bioconductor.
Time allowed to get your package through the approval process
For most packages, we will try very hard to get things turned around within 3-5 weeks, but sometimes things can happen that can hold a package back. This is understandable, and in most cases the delays are not a problem. But if you submit a package to the issue tracker and then abandon it for greater than 3 months with no changes and no word about what you are doing, then we reserve the right to purge you from our system. This does not mean that we don't want your package anymore, but it will reset you back to very earliest stage of submitting your package. This means that the next time that you decide to continue the process you will find yourself starting over again. We are not doing this to be mean, but only because we have limited hours to deal with the dozens of packages that are submitted at any one time. And unfortunately, a fair number of packages get abandoned in the issue tracker, which means we need some sort of cutoff for the purposes of fairness.
Responsibilities of a BioC Package Maintainer
- Subscribe to the bioc-devel mailing list.
- Respond to bug reports and questions from users regarding your package.
- Maintain your package and its capabilities as R and other Bioconductor packages evolve. Typically this involves some work every six months, when a new release is being prepared. We can assist by answering questions, but it is your responsibility to look for and make any needed changes. If you are unwilling or unable to do this your package will be removed from the upcoming release. Users will still be able to find the older versions.
Further Reading: Seth Falcon's thoughts on coding Style
Seth's Coding Standards for Bioconductor packages.
