How to Analyze GCAT Microarray Data Once You Have It.

Full Protocol - by Dr. Denise Wallack, Muhlenberg College, PA

General Advice, Collected Emails from GCAT-L


Data Analysis with ScanAlyze

Preparation for Data Analysis

1. Find a computer with at least 256 Mb RAM that you and your students can use for the data analysis. If you do not have access to this much RAM, downloading data into ScanAlyze takes a VERY long time.

2. Download ScanAlyze from the Software section of the Eisen homepage <http://rana.lbl.gov/EisenSoftware.htm> onto the computer that you have chosen for your data analysis. You will need to register and receive a password in order to download software.

3. Receive your data from Malcolm either by CD-ROM or FTP.

4. For additional information, the Scanalyze User Manual can be downloaded as PDF from <http://rana.lbl.gov/EisenSoftware.htm>.

Using ScanAlyze

1. Open the ScanAlyze2 program by going to the Start button (bottom left of the computer screen). Hit that button, select programs, from the list of programs, select ScanAlyze2 and finally the ScanAlyze2 program. An image control form will appear which allows you to load the image and control how the image is displayed.

Loading the Image:

1. Click on the Channel 1 Load button. Channel 1 represents the Cy3 green fluorescent color. Select the appropriate disk location for your TIFF file of data from Malcolm for the Cy3 channel, and select the file. Depending upon RAM and the size of the file, it could take a while for the file to load. Be patient.

2. Click on the Channel 2 Load button and repeat what you just did, but this time load Cy5 file since Channel 2 represents the Cy5 red fluorescent color.

Displaying the Image:

1. Move all the way to the bottom of the Image box and you will find a button labelled Redraw. If you click on that button, a pseudocolor image of the array will be displayed. You can go back and adjust the appearance of the image in a few ways. Click on the x at the top right corner of the image to go back and draw a new version.

2. On the Image Display box, you will notice a region called Gain and one called Norm, each with a set of values that you can adjust by clicking on the arrow, holding down the left mouse button and sliding up and down. Alternatively, you can type a numerical value in the box to the right of the scale of numbers. The Gain value controls the brightness of the image, and the Normalization value controls the balance between the two images (Cy3 and Cy5). The gain value may be adjusted to make the best possible image. DO NOT ADJUST THE NORM VALUE because this changes the raw data, not just the image.

3. Adjust the gain value so that you can see the most spots against the black background. (You can see the image after each adjustment by hitting Redraw. Close each image by hitting the x in the right corner to get back and try a new value.)

4. You can also adjust the colors by selecting the View/Options menu from top of the Image box and selecting colors. (This is particularly important for colorblind researchers who can&Mac226;t differentiate between red and green, but can be used by anyone who would like a different color combination.)

5. From the same Options screen, you can choose Display and change to Scaled to see the spots looking more equal to each other or Ratio which displays spots based on the ratio of fluorescent dyes. Both of these options make the spots more uniform than Normal which uses red and green to represent the differences in fluorescent intensity.

6. Once you have decided on an image that you like which displays the most spots, hit Save Image, although there does not appear to be a way to reopen saved images at a later date, so it may be an unnecessary step.

Gridding the Image:

The next step is to form a grid to tell the computer where to look for pixels of color to calculate the expression pattern. This is the most important step and also the most tedious.

The grid control form should have appeared automatically when you displayed your image. It is this form that you will use to make a grid with each spot in your image circled.

Our chips consist of 32 blocks. Each block has 24 spots across and 12 spots down. There are also some blobby spots to the upper right of each 24 X 12 block resulting from a problem when printing the grids. Ignore these extra spots.

1. To begin, click on New Grid and you will get the New Grid Form displayed. This form allows you to tell the program how many sectors (grids to add) your array has and to set the number of columns and rows for the sectors, spacing between the columns and rows, spacing between the sectors (tip spacing), and the width and height of the spots, as well as the position of the left side and top of the grid.

The parameters you should use are:

2. Click Create and grids will appear with rows and columns of little circles that you want to line up with the spots of your array. This is the part that takes time and practice. You can manipulate the grids as a whole or individual circles of the grids (spots) in a number of ways.

Positioning the Grids:

1. You can move the grid by left-clicking the mouse on the grid to click and drag the image to an appropriate location. When you let go of the mouse button, you will see the image. Drag your grid to the appropriate sector that you plan to analyze.

2. You can stretch and compress the grid with eight „handles‰ which are small boxes at the corners and in the middle of each side of the grid. By left clicking and dragging any of these handles, you can make the grid bigger or smaller so that all of the spots of your array are contained in the grid. You might find it easier as you adjust the size of the grid to select your grid by clicking on select by grid. Then move your cursor to the grid, and then hold down the Shift key and click your left mouse button on the grid. It should turn a different color to show that it has been selected.

3. You can now hit the Hide Grid button and the grid part will disappear, with only the box around it and the handles displayed. You can get back the grid by selecting the grid as above by holding down the shift key and clicking the mouse on the grid and then hitting the Show Grid button. You can unselect the grid by hitting the Unselect button.

4. What you will notice is that the microarray printers are not exact, making your rows and columns not quite straight. You will need to make adjustments so that the circles of the grid line up with the spots of your image.

5. You can make additional adjustments to the whole grid by selecting the grid as above and using the Action Buttons as described below:

6. In lining up your grids, there are a few landmarks to utilize:

If you think of the grids as numbered consecutively from left to right, starting at the upper left corner, and returning to the left-most grid each time you start a new row, you should have grid #1 at the upper left corner, and grid #32 at the lower right corner.

"Bogus" set of 6 or so blobby spots which floats above each grid; there should always be room for about 3 rows of spots between the "bogus" row and the first row of the real grid. Unfortunately, this bogus row can't always be seen, and sometimes the left-most "seen" spot in the bogus row won't correspond to the real left-most spot, but the vertical distance between the bogus row and the first real row should be constant.

Also, there are Cy3 and Cy5 dye standards in the control plate that got spotted down, and you should always make sure that these spots are in the correct place within their grid...then you can use these grids are anchors too. There are 4 Cy5 (Red) spots; within each grid they are always in Row 11, position #14 from the left, and are in grids #13, 14, 15, and 16 (5 ul, 2.5 ul, 1 ul and 0.1 ul of dye respectively...you usually can't see the spot in grid 16). Likewise there are the same amounts of Cy3 (Green) dye in the same grid position in grids #17, 18, 19 and 20.

7. After each grid is made, check VERY carefully that there aren't spots outside of the grid on any of the 4 sides of the grid...if there are, try moving the grid to accomodate those spots and see if the new location makes more sense relative to the bogus row, relative to the neighboring grids, etc.

8. To delete grids in Scanalyze, you first need to be sure the grids are all selected (i.e., they are a deep magenta color, as opposed to light pink when unselected). You can easily select all the grids by first hitting the "Unselect" button and then hitting the "All" button under "Select by". Then just hit "Delete Grids" and they should disappear. You can select one grid at a time by pushing the "Grid" button under "Select by" and then clicking on that single grid while hitting the Shift key. Anything you select (whether by spot or grid) can be unselected by clicking again while hitting the Shift key (i.e., it's a toggle function). You can also select regions of a grid by dragging the mouse over the region while depressing the Shift key. If you ever think there are some selected things lurking around and you don't want them to be selected any more, then the Unselect button will just unselect everything.

Adjusting spots within the grids:

1. You will probably also have to make adjustments to individual circles of the grid or groups of circles of the grid. This can be done with the same move key and arrows described for the grid, but this time, you will want to select by spot rather than grid.

To select a single spot, hit Select By spot, move your cursor to the spot of interest and hit the Shift key and your mouse left button at the same time. The selected circle will change color. You can move a group of circles in the grid by choosing Select By Spot, and holding down the Shift key while you hold down the left button of your mouse and drag the mouse to draw a rectangle around the spots you want to select. The place you start your mouse drag will be one corner of the box and the place you stop will be the corner diagonally across from it. When you release the button of the mouse the color of the selected spots will change. Once you have selected the appropriate spots of the grid, you can move them with the arrow keys.

2. Move the spots until every spot of the array image is surrounded by a circle (spot) of the grid.

3. As you work, you can magnify the image by selecting the + key in the Image Control Form. You can zoom back out by selecting the - key. You can restore to normal image by clicking on the magnification label ___% and changing it to 100%. You can view different parts of the image by using horizontal and vertical scroll bars.

4. When you have the grid close to where it should be, you can use the Refine button to get automatic adjustments to bring the grid spots closer to the image spots. You will have to hit the Refine button repeatedly to get adjustments made.

5. Once you have the grid lined up to your satisfaction, hit Save Grid. (You can also hit this if you need to stop work for a period of time and come back to it.) Again, make sure that you select to save on the K: drive in your folder, not the C: drive.

6. If you need to return to a grid you already worked on, after reloading the image, select Load Grid and choose the file that is your grid.

Calculating and Saving the Data:

1. Once the grid is adjusted appropriately, hit Save Data and the data will be calculated and saved for your array.

2. You can open your data file in Excel to see the data table.

3. If you want, you can insert the godlist of genes as a column of the data table for analysis, although data analysis is easier after loaded onto SMD.

Sending Data to SMD:

1. You will need to FTP 4 files for each microarray: xxx_532_nm.tif, xxx_635_nm.tif, (alternatively, these can be called xxx_cy3.tif and xxx_cy5.tif) xxx.sag, and xxx.dat, where xxx is a unique identifier for the slide; the actual slide number, e.g., "yO1n035.sag", works best. Note that the O in yO1 is the capital letter, while in n035 it is a zero...sorry about the confusing nomenclature!!

In addition to the files, you need to also make a batch file for the data sets to be loaded onto the SMD database. This can be done using the template batch file in a spreadsheet program like Excel.

Here is the template file:
<http://bio.davidson.edu/Biology/GCAT/protocols/Batch_File_Template_PC.xls>

You may submit multiple experiments with the same batch file. Just use a new row in the Excel spreadsheet for each experiment.

There is a batch file help page in SMD which may be helpful:

<http://genome-www4.stanford.edu/Microarray/help/batch_load.html>

2. Make sure that your file names are consistent with the names listed in the batch file.

3. For your sample batch file, if you set the format of the Expt Date field to "text", you can get it to display 2000-12-25. The date has to be displayed in this format of yyyy-mm-dd. If you don't set it to text, Excel will screw up the formatting for you!

4. The "SMD Experiment Name" must be unique in the whole SMD database, so they should never have the same name (if it's an experiment redo, you'd have to name it "ExptRedo" or "Expt2" or something). You might want to start off with a College name as has been done with Pomona and Swarthmore.

5. The main thing about the batch file is that all the headings have to be there even if you don't have entries for some of them. You should enter "Normalization Type" as "Computed", and when this is true, then "Norm Value" should always be left blank. "Computed" normalization means that when you enter the data into SMD, a program will adjust all the red (Cy5) values by a constant normalization value such that the average spot ratio across the ENTIRE chip is 1.0. For most uses, the computed value is fine, but there are some experiments where you might want to put in a particular normalization value, and that is where you would put "User Defined" for "Normalization Type", and then enter a value for "Norm Value".

6. You must have entries in ALL fields except "Norm Value", which should be kept blank, and "SMD Experiment Description", which is optional (this can be used to describe expt'l details, refer to lab notebook pages, describe how the scan looked, etc.). The "SMD Experiment Name" is different--it is required and has to be unique for each slide; this is the name that will show up on the cluster-grams, so it should be descriptive but on the short side!

7. Use the category "GCAT" and choose a subcategory from the list at SMD's "List Data" list.

8. For the Experiment description, you might want to include the name of your college, the student(s) who did the experiment, notebook pagesm and other information for keeping detailed track of experiments.

9. FTP the four files and the batch file to the GCAT FTP site;

10. E-mail Barbara Dunn at Stanford to let her know that you are sending data. She does not work on Mondays.


Email Advice

How Much RAM do I NEED?

We have found that our Windows (NT or 2000) machines typically require 256Mb to be useful when analyzing array scans. If this is a problem for some of you, I would suggest that you still continue to do the scans at 10um resolution in order to make sure that you have at least 60 pixels per feature. I'm not sure of the size of the spots on the arrays that Pat provided. On our yeast mini-chips, the feature size is ~150um diameter so a 20um scan would only generate about 50 pixels per feature. Not good enough.

Yeast Chips from Pat Brown

As far as rows and columns in each block, there should be 24 spots across and 11 spots down. There should be 32 of these "blocks" total. I also forgot to tell you that when we printed our first plate on this run, it was off center and made some very blobby spots, so we stopped the run and then started printing again a little ways away from the blobs. So you will always see some blobby spots to the upper right of each 24 X 11 "block", but just ignore them.

Scanalyze Software Adivice

Scanalyze User Manual can be downloaded as PDF from
rana.lbl.gov
(Eisen home page) by going to the Software section.

Regarding using the godlist in scanalyze, I don't do it that way--I just use scanalyze (I use genepix now) to grid, and then to generate the data (.dat) and grid (.sag) files, and then I input the tif, dat and sag files into SMD. I then us SMD for seeing clickable images, generating .pcl files to use in "Cluster", and otherwise looking at my data. Do you really need to have Scanalyze use the godlist?

Here are some hints for gridding this yO1 print run. I know it's really hard, especially with really low signal, so you need to use landmarks as much as possible. You can use the "bogus" set of 6 or so blobby spots which floats above each grid as an anchor point (it was our mistake, but it can come in handy!)...there should always be room for about 3 rows of spots between the "bogus" row and the first row of the real grid.

Unfortunately, this bogus row can't always be seen, and sometimes the left-most "seen" spot in the bogus row won't correspond to the real left-most spot, but the vertical distance between the bogus row and the first real row should be constant.

There is a yO1 test hyb available for everyone to look at. It is the first slide of the print run (yO1n001), and so it is quite blobby (as are the first 10 slides or so of all print runs), but it might be good for people to see. It can be viewed on SMD under Public Search, experimenter BDUNN, category GCAT (although it's the only WORLD-viewable array I have, so you don't have to select the category).

After each grid is made, check VERY carefully that there aren't spots outside of the grid on any of the 4 sides of the grid...if there are, try moving the grid to accomodate those spots and see if the new location makes more sense relative to the bogus row, relative to the neighboring grids, etc.

If you think of the grids as numbered consecutively from left to right, starting at the upper left corner, and returning to the left-most grid each time you start a new row, you should have grid #1 at the upper left corner, and grid #32 at the lower right corner.

Also, there are Cy3 and Cy5 dye standards in the control plate that got spotted down, and you should always make sure that these spots are in the correct place within their grid...then you can use these grids are anchors too. There are 4 Cy5 (Red) spots; within each grid they are always in Row 11, position #14 from the left, and are in grids #13, 14, 15, and 16 (5 ul, 2.5 ul, 1 ul and 0.1 ul of dye respectively...you usually can't see the spot in grid 16). Likewise there are the same amounts of Cy3 (Green) dye in the same grid position in grids #17, 18, 19 and 20.

You should enter "Normalization Type" as "Computed", and when this is true, then "Norm Value" should always be left blank. "Computed" normalization means that when you enter the data into SMD, a program will adjust all the red (Cy5) values by a constant normalization value such that the average spot ratio across the ENTIRE chip is 1.0. For most uses, the computed value is fine, but there are some experiments where you might want to put in a particular normalization value, and that is where you would put "User Defined" for "Normalization Type", and then enter a value for "Norm Value".

As far as deleting grids in Scanalyze, you first need to be sure the grids are all selected (i.e., they are a deep magenta color, as opposed to light pink when unselected). You can easily select all the grids by first hitting the "Unselect" button and then hitting the "All" button under "Select by". Then just hit "Delete Grids" and they should disappear. You can select one grid at a time by pushing the "Grid" button under "Select by" and then clicking on that single grid while hitting the Shift key. Anything you select (whether by spot or grid) can be unselected by clicking again while hitting the Shift key (i.e., it's a toggle function). You can also select regions of a grid by dragging the mouse over the region while depressing the Shift key. If you ever think there are some selected things lurking around and you don't want them to be selected any more, then the Unselect button will just unselect everything.

**Do not alter the Normalization in Scanalyze (this is a sliding scale labelled "Norm" in the Image Control window); it should always be set to 1. If you change the Norm, it actually alters the raw data (whereas "Gain" does not), and you can always play with the Normalization once it's in SMD without altering the raw data.

**Don't over-flag in Scanalyze; only flag obviously bad spots--e.g., spots that bleed into each other, spots with bright dust specks in them, "torn off" or ripped-looking spots, or any spots that just don't look right. You can always filter out data based on channel intensities, intensity:background ratios, etc., once the data is in SMD. Err on the side of keeping as much valid raw data as possible, and then using filters later.

Tips for Scanning Microarrays

**When scanning, try not to have more than 1 to 3 spots saturated for intensity in either channel (on our scanner, saturated spots will look white and have an intensity of 65,000). We bring up the PMT for each channel so that 1 - 3 spots are saturated and then do the final high-resolution scan at those PMTs.

**When generating the tif files from the scanner, try to get the edges of the scanned image as close to the spots as possible (i.e., try not to have very much blank slide in the final scan that you save)...this will make the tif files smaller and not take as long to FTP.


GCAT Home Page

Biology Home Page


© Copyright 2001 Department of Biology, Davidson College, Davidson, NC 28036
Send comments, questions, and suggestions to: macampbell@davidson.edu