HOWTO: A basic query using SGDI
| Date: | 1 Nov 2007 |
|---|---|
| Author: | Jeff Gentry |
Introduction
This document is intended to serve as a brief tutorial for performing a basic query with the SGDI software via the use of an (artificial) example. While it is not exhaustive in showing potential functionality, this type of query should be quite common for most users.
Logging In
Your first step is to point your browser to the URL for the SGDI instance you wish to use. For this example we will use http://sgdi-dev.fhcrc.org/sgdi_public/index.html. Here you will see some basic information about this instance, and a button that says Log In. Press this button to authenticate with the server. Upon pushing the Log In button, you will receive an authentication challenge. Enter your name and password (provided by your administrator) and press Ok.
Selecting a Workspace
Here you will see a screen which will let you select a workspace to use. A SGDI workspace should be viewed as one line of thought. It represents a set of samples from a set of samples and a corresponding set of reporters (e.g. genes) which can then be used to extract the appropriate data. If you have no workspaces defined, a message will alert you to this effect, otherwise your current workspaces will be listed here. You can also create a new workspace on this screen. Create a new workspace here and select it to move on.
Selecting Experiments
You will now come to the basic SGDI page. On the left side you will have a variety of options, many related to workspace management but the first several devoted to selection (which are also repeated in the main frame). The hierarchy of
- Experiment Selection Page
- Select Samples Via Ontology
- Gene Expression Reporter Collapse Method
- Gene Expression Reporter Selection Page
- Display Selected Data
should be viewed as the appropriate order to do these steps (with the possible exception of #2, which will be discussed later).
For now, we need to select experiments to explore, so click the "Experiment Selection Page" link. Here you will see a listing of all the datasets that you have permission to see, along with descriptions of the type of tissue they relate to, the microarray platform, and the number of samples. The latter three pieces of information can be filtered (e.g. Show only experiments from the 'hgu95av2' platform) or sorted by (by clicking on the hyperlink in the column heading). Check the boxes for the experiments that you wish to look at. For this example, we will use alizadeh2000, armstrong2002 and blalock2004 as they come standard as example packages in every SGDI install.
Before hitting Update Workspace & Return, there are two options at the bottom that you should make note of. The first option specifies if you want to allow the use of any reporter from any of the platforms used or only the ones that they have in common. What this means is that if a particular gene is not represented on one of the platforms (e.g. the alizadeh2000 spotted array chip) but is on the others it will not be included if you select the option to only use shared reporters. The default behavior is to use all reporters from all platforms:
The other option is which samples should be selected by default - all or none. The default is for no samples to be automatically included in these datasets, which is convenient when selecting samples via an ontology, but the other option (selecting all samples by default) tends to be more convenient when selecting samples manually (as we are here). Select the option of Default to all samples selected, and now click on the Update Workspace & Return button.
Selecting Assays and Samples
You will find that you are back at the main page, but with some key differences. The experiments you selected will now be listed on the main page, along with the number of currently selected samples. You will also have tabs across the main frame listing the selected experiments. Clicking on either one of these tabs or the hyperlinks from the selected experiments list will bring you to a specific page for that experiment. We will do this for each experiment to select samples.
For this example, lets start with the alizadeh2000 dataset. You will find at the top a bit of information about the dataset, including the species, platform, total selected samples and the number of reporters associated with the platform.
Some experiments have multiple assays or sub-experiments. These are datasets which are logically connected to the primary experiment. In the current version of SGDI, an experiment will either simply have the one gene expression dataset or it will have the gene expression (exprs) as well as a dataset of identical proportions consisting of standard errors (se_exprs). For the experiments with two assays, a user can select both assays, or just one individually. Assays are listed in the top portion of the experiment page. You can check the ones you want individually or check the top box to toggle all/none.
If you scroll down you will see a table that describes any known clinical information about the samples of this set. You can select samples manually (via the check all/uncheck all functionality or simply checking the boxes that you wish) or by specific values (e.g. only ER+ samples). If you're selecting by values, you will want to look at the section between the description and the clinical information, as it contains two important options. These options are used to chain selections together by giving one the ability to use AND/OR and NOT operators. Any selections made will be based on these operators. The default is to use OR only, but one can select AND and toggle the NOT operator.
To select a value, simply click on the dropdown list and make the selection that you wish. If you wish to make further selections, do so now (applying the appropriate operators as described above), and press Apply Selection when you are finished. For this example, we will select only the Hematopoetic cell lines samples from the subcategory variable. When you press Apply Selection you will be taken to the main screen again, but the information for alizadeh2000 will have been updated to reflect that only 13 samples are selected.
At this point, select the armstrong2000 experiment. You will find that the structure of this page is identical to the previous. Select the MLL values from the tumor.type variable, which will select 20 samples. Lastly, go to the blalock2004 page and select the hippocampal AD values from tumor.type, which will select 22 samples.
Selecting Samples Via Ontology
This functionality is not covered in this tutorial but will become increasingly more important as your SGDI instance grows. The SGDI software allows for ontologies to define relationships between clinical information of many datasets where differing terms are used for the same meanings. Use of this functionality will be discussed in another tutorial.
Reporter Collapsing
In many cases, there might be multiple reporters which map to a single gene. You can choose to collapse this down to a single reporter per gene. Currently one method is available, which will select the reporter with the largest variance for every selected gene.
Reporter Selection
At this point you should select reporters that you wish to investigate across your selected samples. Clicking on the Reporter Selection Page link will take you to a page which lists different mechanisms for selecting reporters as well as the now familiar AND/OR and NOT options. You can currently select reporters by gene symbol (e.g. CHUK), by Chromosome ("all genes on chromosome 13"), KEGG pathway ("all genes in the apoptosis pathway"), GO term ("all genes associated with digestion") and potentially other mechanisms. You are also allowed to upload this data by text file. Using the AND/OR and NOT functionality you can chain together complex constructs to get the exact reporter selection you wish ("All genes on chromosome 13 AND in the apoptosis pathway").
For this example, select KEGG Pathway by pushing the radio button. The format of the KEGG selection screen is the same as the other options. There is a text box to enter in a specific example if you know what pathway you would like to use, or you can select a pathway (or pathways) by hitting the Select from list link, and finally there is an option to upload from a local file. For this example, type apoptosis into the text box and hit the Submit button. You will see a description of the number of reporters you have selected (which you can see in more detail by hitting that hyperlink), and the ability to apply your selection or to cancel it. Select Apply Selection.
You will now be back at the reporter selection page, where you could continue to make more selections if you desired. However, at this point, we are done selecting reporters and wish to see the output of this (artificial) query. Click on the Display Selected Data link.
Displaying Data
After clicking the Display Selected Data link, you will have multiple options. You can choose to extract a CSV file of the data or display in a browser (and if you choose the latter, you can export to a CSV later). You can choose to display with samples as columns or reporters as columns (this becomes important due to restrictions on the number of columns in many spreadsheet applications), and you can choose to view either a combined dataset or an individual dataset. You can also choose which datasets you wish to display the results for. Individually select the datasets or use the top checkbox to select (or unselect) all of them. Select Display In Browser, Display By Samples and use the top checkbox to select all experiments.
You will now be presented a listing of every unique clinical variable between all of the selected experiments. If you wish to display the values of these variables, select the ones you wish to see. You may recall that we specifically chose values from subcategory and tumor.type, but perhaps we wish to see these with the sample information. Select these two variables and click the Select Variables & Display Output button:
You will now be presented with an HTML table displaying your output. If you wish to download a CSV file with this information, click on the Download To CSV File button:
