Bearing in mind the 4 major use cases previously outlined:

http://wiki.fhcrc.org/bioc/Discussion_of_some_likely_Use_Cases_for_Short_read_Sequencing

The present design is to parse the relevant data (from UCSC, biomaRt, or a custom source) and to insert that data into a DB, and provide an object with a connection to that database. The most general thing to do is to pass in a data.frame to the function that can parse a custom source, and so this is also the function that the more specialized parsing functions call. Once the DB is made and the connection returned wrapped in an AnnotationObject a set of methods can provide access to the contents of the database tables.

Here are some types of annotations that I propose be considered for making into Annotation Objects. We may decide that some of these types are similar enough to each other that we want to lump categories together.

Transcript annotations:
  • contains: transcript positions, exon boundaries.
  • used for: potentially any use cases.
  • key fields: name,chrom,strand,txStart,txEnd,exonStarts,exonEnds,
SNP annotations:
  • contains: SNP locations
  • used for: potentially any use cases.
  • all fields: chrom,chromStart,chromEnd,name,strand,refNCBI,refUCSC,observed,molType,class,valid,avHet,avHetSE,func,locType,weight
Genomic Locations: This could become several types, or possibly, we may want to consolidate:
  • contains: locations for features like Histones, TFBS's, CpG islands etc.
  • used for: potentially any use cases.
  • CpG fields: chrom,chromStart,chromEnd,name,length,cpgNum,gcNum,perCpg,perGc,obsExp
  • TFBS fields: chrom,chromStart,chromEnd,name,score,strand,zScore
  • Histones fields: chrom,chromStart,chromEnd,name,score,strand,signalValue,pValue,qValue
Disease associations:
  • contains: locations for regions associated with a particular disease marker.
  • used for: potentially any use cases.
  • key fields: name, chrom, chromStart, chromEnd
ESTs:
  • contains: aligned data from EST databases.
  • used for: RNA-seq.
  • all fields: matches,misMatches,repMatches,nCount,qNumInsert,qBaseInsert,tNumInsert,tBaseInsert,strand,qName,qSize,qStart,qEnd,tName,tSizet,Start,tEnd,blockCount,blockSizes,qStarts,tStarts

here (last edited 2009-10-30 17:41:50 by MarcCarlson)