Bearing in mind the 4 major use cases previously outlined:
http://wiki.fhcrc.org/bioc/Discussion_of_some_likely_Use_Cases_for_Short_read_Sequencing
The present design is to parse the relevant data (from UCSC, biomaRt, or a custom source) and to insert that data into a DB, and provide an object with a connection to that database. The most general thing to do is to pass in a data.frame to the function that can parse a custom source, and so this is also the function that the more specialized parsing functions call. Once the DB is made and the connection returned wrapped in an AnnotationObject a set of methods can provide access to the contents of the database tables.
Here are some types of annotations that I propose be considered for making into Annotation Objects. We may decide that some of these types are similar enough to each other that we want to lump categories together.
- Transcript annotations:
- contains: transcript positions, exon boundaries.
- used for: potentially any use cases.
- key fields: name,chrom,strand,txStart,txEnd,exonStarts,exonEnds,
- SNP annotations:
- contains: SNP locations
- used for: potentially any use cases.
- all fields: chrom,chromStart,chromEnd,name,strand,refNCBI,refUCSC,observed,molType,class,valid,avHet,avHetSE,func,locType,weight
- Genomic Locations: This could become several types, or possibly, we may want to consolidate:
- contains: locations for features like Histones, TFBS's, CpG islands etc.
- used for: potentially any use cases.
- CpG fields: chrom,chromStart,chromEnd,name,length,cpgNum,gcNum,perCpg,perGc,obsExp
- TFBS fields: chrom,chromStart,chromEnd,name,score,strand,zScore
- Histones fields: chrom,chromStart,chromEnd,name,score,strand,signalValue,pValue,qValue
- Disease associations:
- contains: locations for regions associated with a particular disease marker.
- used for: potentially any use cases.
- key fields: name, chrom, chromStart, chromEnd
- ESTs:
- contains: aligned data from EST databases.
- used for: RNA-seq.
- all fields: matches,misMatches,repMatches,nCount,qNumInsert,qBaseInsert,tNumInsert,tBaseInsert,strand,qName,qSize,qStart,qEnd,tName,tSizet,Start,tEnd,blockCount,blockSizes,qStarts,tStarts
