SGDI Installation and Admin Guide

Date: 24 October 2007
Authors: Seth Falcon and Jeff Gentry

Introduction

This document is intended to serve as a guide to administrators for installation of the Software for Genomic Data Integration (SGDI) web application.

Installing an SGDI Instance

An SGDI instance consists of a database in a PostgreSQL RDBMS and a Zope Product that provides a web interface to the datasets in the database. These instructions detail the installation of all required software to produce one or more SGDI instances.

Supported Platforms

The SGDI web application has been developed and tested on GNU/Linux. We expect the system to operate without significant modification on any unix-like operating system (e.g. BSD or OSX).

Required Software

The SGDI software requires several other applications to install properly. These are all provided in a single bundle, but a complete list follows.

  1. PostgreSQL (>= 8.1)
  2. Python version (>= 2.4.2)
  3. PyXML (>= 0.8.4) - A Python module
  4. mxDateTime (>= 2.0.6) - A Python module
  5. psycopg (= 1.1.21, not psycopg2) - A Python module
  6. Zope (>= 2.9.1 and < 2.10, but not 3.x)
  7. NuxUserGroups - A Zope product
  8. exUserFolder - A Zope product (note that we distribute a SGDI patched version and not the standard version)
  9. ZPsycopgDA - A Zope product
  10. R (>= 2.6.0)
  11. The SGDI R packages SGDI and ontoElicitor

Note for Mac users: In its default state, OSX will not be able to build R properly, and you must prepare your Mac for this to be possible. Please visit the R for Mac OSX FAQ for instructions on how to make sure you are prepared to run this installer.

Installation

The SGDI bundle provides all of the appropriate software, but more importantly also provides a makefile which will construct a self contained environment with everything built inside of it. Administrators may choose to use their own versions and/or installation techniques, but to ensure version compatability we recommend building via the bundle make functionality. The bundle file is named SGDIbundle_X.Y.Z.tar.gz where the X.Y.Z is the version number of the bundle. To get started, unpack the tarball, which will uncompress into a directory named SGDIinstall.

First thing to take care of is that there are several default identities and passwords for administrative access to various portions of the software which should be changed for increased security. To do this, edit the file Makevars in the SGDIinstall directory.

  1. SGDIadmin and SGDIadpass. These are the Zope administrator user and password values. The default is sgadmin.
  2. OWNER_PW, VEIL_PW, and USER_PW. These correspond to the three required Veil accounts (which are by default sgdi_owner, sgdi_veil and sgdi_user``, and are safe to leave as default). These should be changed from the defaults.

There is another pair of variables which may need to be changed. By default, PostgreSQL and Zope run on TCP port 5432 and 8080 respectively. If there are already either or both of these running on your system you will want to configure these to run on alternate ports by changing the PGPORT and/or the ZOPEPORT variables. To verify that nothing is running on these ports, you can do:

telnet localhost PORTNUMBER

Lastly, by default the build installs itself to the directory $HOME/SGDIlive. If you wish this to install somewhere else, you should change the SGDITOP directory.

Please be careful: Make sure that you do already have the specified ports open and in particular that you do not have a running PostgreSQL server running on PGPORT, as this will cause problems with the build process. Also make sure that the SGDITOP variable points to a non-existing directory so that you do not overwrite an existing instance. If you do run into trouble with this, you will need to first solve the initial problem (ie shut down the running PostgreSQL server or change PGPORT, or in the latter case either remove the SGDIlive directory or change the SGDITOP variable) and run the build again.

You will need to make sure you have an appropriate LD_LIBRARY_PATH variable set up in your current environment. For instance, if the SGDITOP was set to the default, you would do (in bash, other shells may vary):

export LD_LIBRARY_PATH=$HOME/SGDIlive/SGDIExternal/postgres-dist/lib:$LD_LIBRARY_PATH

At this point you should be able to enter the directory SGDIinstall/External and simply type make. This process may take some time, particularly on a machine with older hardware, but it will compile and build all of the required software as well as providing your SGDI instance with a few default datasets to test out.:

Note for OSX Users:  You will need to do the following
1.  Make sure your LD_LIBRARY_PATH is already set before building
2.  Set MACOSX_DEPLOYMENT_VERSION to 10.4 (using export as above)

Zope setup

After the make command finishes, you will have a PostgreSQL database and a Zope instance in your SGDIlive directory. The Zope instance lives in SGDIlive/zopeSGDI. To start the Zope instance, you will need to run SGDIlive/zopeSGDI/bin/zopectl start. At this point you can point a browser to localhost:PORT/manage (where PORT is either 8080 or the port you specified with ZOPEPORT) and enter the Zope Administrator user and password (specified with SGDIadmin and SGDIadpass).

At this point you will see the Zope Management Interface (ZMI). The first step in the ZMI is to initialize the Z Psycopg Database connection, which allows Zope to talk with the PostgreSQL server. See the dropdown below Accelerated HTTP Cache Manager in the figure below. Presence of SGDI and Z Psycopg Database Connection are essential indications of appropriate installation of these products; if either is absent, nothing will work, and you need to check the log for the zope instance to see what is wrong.

initDrop.png

Now you will need to create a ZPsycopg database adapter. Do this by clicking on the Z Psycopg Database connection option in the dropdown and filling in the connection parameters. In the Enter a Database Connection String field, you will need to enter:

dbname=sgdi user=SGDI_OWNER password=OWNER_PW
admPsyConnStr.png

Where SGDI_OWNER and OWNER_PW are the values from the Makevars file. Also, if you have specified an alternate port via the PGPORT variable, you will also need to specify port=PGPORT in that string.

After clicking the add button, when you return to the ZMI root folder, you should see an entry for the connection in the objects list

admRootPsyco.png

Clicking on the database connection object will bring you to a tabbed interface

admPsycoTabs.png

Click on the test tab and issue the command select * from sgdi_expr_control_table;

admDoCount.png

You should see three experiments listed ( the picture below does not depict the default; there are 20 expts noted in this particular instance.)

admQueryRes.png

At this point, you know that the default example datasets are installed in postgres and are suitably visible through the zope-postgres interface. You will need to establish access privileges below, using the SGDI product.

SGDI Zope Product Activation

You create an SGDI instance by using the dropdown again.

initDrop.png

Now you will give it a name and a title

admSGDIinstInit.png

Then a new entry will appear in the main page of the ZMI, here we used wedsDemo as the name

admRootSGDIopt.png

Click on the new SGDI instance, with the cylinder icon, and you will get a tabbed interface.

The final step for setting your instance up is to click the Contents tab, and then click again on the index.html file. You will see a text entry box with the current contents of the front page, with some default information. You should edit this as you see fit, and then click on the Save Changes button.

Adding Users and Groups to the Database

Any users which might connect to your SGDI web client will need to have an entry in the database. To create users, you will use the addVeilperson command. For this example, we will add four users: tom, dick, harry and bob. First you will need to connect to the database within your R session:

library(SGDI)
db <- dbConnectVeilAdmins("SGDI_DB", "OWNER_PW", "VEIL_PW", port=PGPORT)

You can add any user individually:

addVeilPerson(db, "tom", "PASSWORD")

Replace the PASSWORD string with the user's password. You can also add several at the same time:

newUsers <- c("dick", "harry", "bob")
newPWs <- c("kcid", "yrrah", "bob")
mapply(addVeilPerson, newUsers, newPWs, MoreArgs=list(db=db))

Users can also be part of groups. Groups allow multiple users to share the same set of privileges. For instance, Tom may want to share data with Dick and Harry but not with Bob. We could create a group named tomsdata and add the three of them:

addVeilGroup(db, "tomsdata")
sapply(c("tom", "dick", "harry"), function(x)
      addPersonToVeilGroup(db, "tomsdata", x))

Note that there is a group named allusers to which every user is automatically added to. You can effectively make a dataset public by allowing the allusers group to have access to it.

Loading Datasets into the Database

The SGDI database can be populated with datasets from the SGDI project. In this section we describe how to do that, and provide sample scripts to demonstrate how to populate the database. There are currently no other supported methods for populating the database.

Details for the R data packages
  • Microarray expression data represented using an instance of the ExpressionSet class.
  • The annotation slot must be a string giving the name of the annotation data package appropriate for the data.
Example session

As an example, we will demonstrate how to load two data packages.

Tarball Annotation Chip Type
author2006_1.0.1.tar.gz hgu95av2 Affymetrix
scientist2005_1.2.3.tar.gz scientist2005anno cDNA

Please note that we are using a convention where a spotted array chip has the annotation package named ''scientist2005anno''. This is related to the nature of our public legacy data where there was a one to one mapping between dataset and spotted array chip. If you have custom chips shared between multiple experiments, there is no reason why those experiments could use the same annotation package (as with the Affymetrix based experiments).

  1. Install all data packages and associated annotation data packages in R. If you have a directory containing the package tarballs you could use the following shell command:

    cd /dir/with/tarballs/
    for p in *; do
      R CMD INSTALL $p
    done

    If the data packages are available in a CRAN style repository, you can use the following from inside an R session:

    URL <- "http://REPOSITORY-URL-HERE"
    install.packages(repos=URL, dependencies=TRUE)
  2. Here is an example R session script that loads the example packages. Each package must have the following four phenotype variables defined: species, tissue.type, experiment.type, and sample.type.

    db <- dbConnectVeilAdmins("SGDI_DB", "OWNER_PW", "VEIL_PW", port=PGPORT)

    Replace the values of SGDI_DB, OWNER_PW, VEIL_PW and PGPORT with the values from the install. If PGPORT was left as default, you do not need to include it.

    platforms <- c("hgu95av2", "scientist2005anno")
    z <- sapply(platforms, function(x) loadPlatformAnnotations(x, db))
    
    packages <- c("author2006", "scientist2005")
    x <- sapply(packages, function(x) loadDataPackage(x, db,
    query=FALSE, normalization=theNormalization))

    Note here that 'theNormalization' was the normalization method used when constructing the dataset (ExpressionSet, oGtypeExSet, etc). You can pass a singular string if all assays in the set are the same method (the typical case), or a named vector (where names are the assay names) can be passed if the different assays have different normalization methods).

Public Datasets

A number of public datasets have been collected by the SGDI project and built into R data packages. These have been made available in a CRAN style repository at the URL http://packages.sgdi.org. Likewise, annotation packages for the spotted array datasets in that repository are also available at this URL and follow the naming convention of scientist2005anno for a package named scientist2005). Please see the instructions above on how to install these packages if you are interested in using them.

You can see what packages are available using the available.packages() function in R

available.packages(contriburl=contrib.url("http://packages.sgdi.org/"))[,"Package"]
Access Control

To assign (or remove) access rights for datasets with your users, go into the ZMI and enter the SGDI instance. The default tab is Manage Data Access Control. Here you can assign access control in two different ways. The first is on a per-dataset level, the other is on a per-group level (you can combine the two as well). You can click on a dataset, which will provide you with a list of all the available groups and you can allow groups to have access by checking the box next to their name. Similarly you can select the name of a group, which will then display a list of all of the datasets and you can assign which datasets that group has access to.

Ontology Support

SGDI supports the use of ontologies to enable sample selection across multiple datasets. This requires loading an ontology into the database as well as mappings of phenotypic information between datasets you wish to be covered by the ontology and the ontological terms themselves.

Ontologies can be constructed using the ontoElicitor tool from the SGDI project. We have included two prebuilt ontologies (one for breast cancer and another for ovarian cancer) and currently also provide mappings for several public breast cancer datasets. Mappings can be constructed using the SGDI web client which will then allow the user to download the mapping file.

If you would like to use one of our ontologies, they exist in the SGDI R project in the dataset sgdiOntos as the matrices sgdi_breast and sgdi_ovarian. To load one of these (or any other ontology) into the database, the following procedure should be used (we will use sgdi_breast for this example)

library("SGDI")
db <- dbConnectVeilAdmins("SGDI_DB", "OWNER_PW", "VEIL_PW", port=PGPORT)

data(sgdiOntos)
onto2pgsql(sgdi_breast, db)

If you choose to use the breast ontology as well as our public datasets, mappings have been provided between 19 (currently) breast cancer datasets and the sgdi_breast ontology. Mappings are serialized in XML files and their names are of the format ONTOLOGYNAME_DATASET_sugmaps.xml. The SGDI R package has a directory ontoMaps which currently has one subdirectory, breast which contain the mappings that we are distributing. To see a full listing of the files

mapDir <- system.file("ontoMaps", "breast", package="SGDI")
dir(mapDir)

To load one of these maps into the database (for example, sgdi_breast_chang2003_sugmaps.xml), do the following

library("SGDI")

mapDir <- system.file("ontoMaps", "breast", package="SGDI")
mapFile = file.path(mapDir, "sgdi_breast_chang2003_sugmaps.xml")
readAndLoadExprOntoMap(mapFile, db)

Note that the dataset associated with this map must already be loaded in the database or an error will occur. You could also load these all at the same time

sapply(maps, function(x) readAndLoadExprOntoMap(x, db))

Ongoing Administration

User and Group management

Adding users and groups was demonstrated above. But what if you wish to remove groups, remove users or remove users from groups? The syntax is almost identical, except using drop instead of add. To remove a user

db <- dbConnectVeilAdmins("SGDI_DB", "OWNER_PW", "VEIL_PW", port=PGPORT)
dropVeilPerson(db, "tom")

And to remove a group

dropVeilGroup(db, "tomsdata")

If you wish to remove a user from a group

dropPersonFromVeilGroup(db, "tomsdata", "dick")

Dataset Management

Adding datasets is done via the loadDataPackage command, as you saw above. You will need to ensure that the platform is already loaded into the database before loading your package. Removing a dataset is done with the removeEset command. You will need to know the name of the dataset you wish to remove, and then do the following

db <- dbConnectVeilAdmins("SGDI_DB", "OWNER_PW", "VEIL_PW", port=PGPORT)
removeEset(db, "scientistYear")

InstallationGuide (last edited 2008-01-22 22:49:25 by JeffGentry)