Sunday 24 April 2016

,

FGT Part 3 - Data Normalization for Microarray Data



Why is it important to normalize microarray gene expression data before carrying out data analysis?

The goal of microarray experiment is to identify or compare gene expression pattern through the detection of the expressed mRNA levels between samples. Assuming that the measured intensities for each arrayed gene represent its relative expression level, biologically relevant patterns of expression could be identified by comparing measured expression levels between different states on a gene-by-gene basis.

In microarray experiment, RNA was isolated from the samples (which could be from different tissues, developmental stages, disease states, or drug treated samples), labeled, hybridized to the arrays, washed, and then the intensity of the fluorescent dyes (which is the hybridized target-probes) was scanned. This results in an image data (a grayscale image) which then analyzed to identify the array spots and to measure their relative fluorescence intensities (for each probe-sets).

Every step in transcriptional profiling experiments can contribute to the inherent ’noise’ of array data.
Variations in biosamples, RNA quality and target labeling are normally the biggest noise introducing steps in array experiments. Careful experimental design and initial calibration experiments
can minimize those challenges.

Because of the nature of the process (the biochemical reactions and optical detections), subtle variations between arrays, reagents used, and environmental conditions may lead to sligthly different measurements for the samples. These variations give effects to the measurements: (1) Systematic Variaton which affect a large number of measurement simultaneusly and (2) stochastic components or noise, which are totally random. While noise cannot be avoided, it can be reduced, systematic variation could lead to differences in the shape and center of distribution from the measured data. When these effects are quite significant, gene expression analysis could results in false conclusion because of the variation compared does not result from biological reasons, but systematic error.

Therefore, normalisation would adjust systematic error / variation caused by the process. Thus, by adjusting the distribution of the measured intensities, normalization facilitate comparison and thus enables further analysis to select genes that are significantly differentially expressed between classes of samples. Failure to correctly normalise will invalidate all analysis of the data.

What kind of normalisation could be applied to a dataset?

Normalisation removes systematic biases from data. Normalisation usually applies a form of scaling to the data:

1. It could do scaling by physical grid position to remove spatial biases

Scaling by grid position was caused because there are significant difference in between grid positions. This problem could usually inspected visually. We expect that the intensity in between grid positions should be random, so when we see a patch or pattern between the grid that have different intensities, we could scale those grids up or down to match with other grids. This is also called fit surface or per pin normalisation, and sometimes occurs in dual channel approach.

2. It could remove intensity dependent biases
It uses Loess regression. Consider excluding elements which are not necessary Flagged absent across all experiments. Basically try to transform the data into more linear trend.

3. It could scale intensity values to remove per-chip variation
Per Chip Scaling. Log transform it. and do scaling. You could scale by its mean/median to normalize, make all the mean the same. but sometimes it does not address difference in shape distribution. Linear scaling does not correct distribution differences. Other powerful method is the Quantile Normalisation. this type of normalisation is powerful, and I will talk more about it in below.

4. Per gene normalisation
Uses distances. Genespring commonly assumes this normalisation has been applied.

Quantile Normalisation
Quantile normalisation replace the first value of chips with the first value of the reference until all values have been replaced. This will cause all values to have the same (reference distribution). Of course the probe sets at intensity position could be different for each sample. This assume that most of the genes will not expressed, so the distribution of the data should be quite similar.



0 comments :

Post a Comment