The intention of this page is to discuss how to unify the Bioconductor packages that deal with Illumina gene expression data. Currently, beadarray, BeadExplorer and lumi offer ways to represent and analyse summarised the output of Illumina's software (BeadStudio) but use slightly different class structures; making it difficult for users to switch between the packages.
When the data comes out of BeadStudio there is one row per gene and one columns per array. Therefore it seems natural to use an ExpressionSet (or some extension of it).
beadarray implementation
In beadarray, I extend ExpressionSet and call the class ExpressionSetIllumina and have the usual slots assayData, phenoData, featureData, experimentData, annotation and I also create a QC slot for storing quality control information from BeadStudio.
There seems to be no standard way of exporting data from BeadStudio and users get control over the columns they can export. Therefore I tried to accomodate the case, when all the columns were exported at once rather than forcing the user to only export certain columns. Narrays and arrayStDev seem to give the same information for each bead on an array, so can probably be discarded from the class. It is also possible to export annotation information from BeadStudio, but I am not recommending that users do this because it increases the file size, contains weird characters that are a hassle to deal with in R and can be read in later anyway.
The current version of our assayData has the following by default
| name | Description |
|---|---|
| exprs | expression value of each probe |
| se.exprs | standard error of each probe on each array |
| NoBeads | number of beads used to compute the expression values |
| Detection | p-value associated with each probe detected above background measurements |
| Narrays | |
| arrayStDev | |
| DiffScore | Illumina's differential expression score |
In the QC slot, I store exprs, se.exprs, NoBeads as before and also controlType which gives the type of control (eg negative, biotin etc) that each probe represents
Given that we store a lot of the same information in assayData and QC I think there is a good case for having them both in the same slot and we could use featureData to identify the type of probe?
lumi representation
The LumiBatch class in lumi package has similar implementation. Basically it extends ExpressionSet class.
The assayData slot includes the following matrix:
| name | Description |
|---|---|
| exprs | expression value (mean of bead replicates) (required) |
| se.exprs | expression standard deviation of bead replicates (required) |
| beadNum | bead replicate number of each type of probe (optional) |
| detection | p-value associated with each probe detected above background measurements (optional) |
Other slots of LumiBatch class:
QC slot: a list keeps the statistical summary of the samples.
controlData: a data.frame keeps the control probe measurements, which usually are separately outputted by BeadStudio
history: a data.frame recording the previous operation over the object
Currently, the feature data of LumiBatch object keeps the mapping from Illumina TargetID or ProbeID to nuID. I think the "DiffScore" in ExpressionSetIllumina class can be included in the featureData.
--Some key points I would like to discuss
Given that we store a lot of the same information in assayData and QC I think there is a good case for having them both in the same slot and we could use featureData to identify the type of probe?
When a common class is agreed on, should it be stored in Biobase or some other common package so that we can ensure lumi and beadarray are always using the same class version?
Since both packages are reading the same data format and creating the same object, why not make the function to read BeadStudio output the same in both packages too? If so, how would that work in practice?
There are also two-channel arrays being produced by Illumina (SNP, methylation, DASL) and already the beadarraySNP package deals with SNP data in the SNPSetIllumina class. Therfore, do we also need to consider adopting an NChannelIlluminaSet object instead?
A plug-in to generate Bioconductor-compatible outputs directly from BeadStudio
As discussed before, there seems no standard or fixed way to export data from BeadStudio. Besides different versions of BeadStudio, users can also manipulate the output format. As such, writing a universal parser is very difficult, although currently 'lumi' is able to accommodate all the variations we have seen.
To solve it from the root of the problem, Northwestern is trying to write a plug-in for BeadStudio, so that users can output data in a fixed format, easy to be loaded in R. A prototype of this plug-in is working. As long as we reach a consensus on the Illumina object structure, a beta test of the plug can be released.
