FGT Part 7 - RNA Sequencing ~ MasMatin

RNA sequencing is the alternative to microarray. RNA sequencing measures population of RNA by generating cDNA with adaptors.

Why choose RNA seq over microarray?

In functional genomics, we are not only interested to see differential expression of genes, but also to see combination of exons which give rise to RNA population. This is difficult because most mRNA expressed never cover entire exonic region. At the moment, we are able to see level of expression, but we do not know splice variants and its interaction with transcription levels. We cannot just see how genome behave by just looking at expression levels.

Other than that, we are also interested to see the importance of non-coding RNA, which is also relevant to the coding RNA. One of the example is microRNA (miRNA), miRNA is a short (~22 nucleotides) which roles is in interference pathway. miRNA controls the expression of mRNA by promoting cleavage, destabilize poly-A tail (increase degradation speed), and make ribosom binding less efficient. But, we need special treatment to measure miRNA.

So, currently, microarray technology cannot answer those questions. This is because, microarray is limited by design and therefore not able to detect novel genes. If the rate of novel discovery of genes is too rapid, microarray will have trouble to keep up in the design.

And, RNA seq could solve those problems.

How it works?

RNA seq was born from the SAGE (Serial analysis of gene expression). In SAGE, mRNA was processed into cDNA tags, which are short sequence of oligos which correspond to certain mRNA. These tags (then called expressed sequence tags or EST) was concatenated, amplified, and sequenced. The result then was mapped to a reference genome, and the tags was counted.

Before attempting RNA seq, RNA sample need to be prepared, Because 99% of the RNA population in the cell came from rRNA, it needs to be removed. this can be done by removing by depletion or poly-A selection for mRNA using magnetic gel beads. For mRNA analysis, cDNA was generated using cDNA primer. The resulting primers than given adaptor and also barcode (for multiplexing). And sequenced!

What do you need to consider when designing RNA-seq experiment?
The aim of the experiment is to find differentially expressed genes. Therefore, experiments must be designed to accurately measure both the counts of each transcript and the variances that are associated with those numbers. The primary thing we need to consider is the same as microarray: (1) The number of replicates in order to estimate within- and among-group variation, (2)

1. Sequence length
First to consider is the sequence length, or how long a read needs to be generated. We need the reads to be long enough because small reads will give high number of hits when referenced to the genome. Around 35-50 bp is long enough to analyze complex genome. Smaller length give more time to reconstruct to be reconstructed, but longer reads can cost more money. Longer reads needs to be considered when analysing splice sites.

2. Sequencing Depth

The depth of sequencing means how many reads or rounds of sequencing need to be done. The depth requirement was estimated by knowing the predicted amount of trancript of interest (is it has low number or high number). Variation due to the sampling process makes a large contribution to the total variance among individuals for transcripts represented by few reads. This means that identifying a treatment effect on genes with shallow coverage is not likely amidst the high sampling noise. More reads will increase sensitivity of the sequencing to detect rare species. But it is limited by the number of unique molecules, so if no more unique molecule present, then no more sequencing by synthesis can happen.

3. Single or Paired End Sequencing
Using Paired End Sequencing could give information on the (1) length and (2) exon prediction. By knowing the starting and ending of a paired sequencing, we can determine exact size of the fragment. This is simply because if we know that a paired end reads correspond to exons, they should be next to each other in the genome.

4. Creating Library: Random run or multiplexing?
The next generation sequencing platform sequence the sample in a flow cell. Using separate flow cell will make the result difficult to compare because of the artefact made in each flow cell, environment conditions, etc. One way to solve this is by multiplexing, which is giving uniqe tags or barcodes to each samples, and mix them together to be read in a single flow cell. This is only limited by the number of unique barcode label available.

General Workflow
1. Mapping to Reference
Mapping of sequence reads to the genome will produce continuous exon islands of mapped read separated by introns. If a reference genome is not available, it can be generated through HTS or we can use available transcript evidence to build gene models and use this models as references. Or use de novo transcript assembly.

2. Quantification
Once the read has been mapped to the genome, exons can be predicted by the islands of expressions. Novel exons can be predicted by annotating the sequence with current database. Splice events can be predicted from sequence using mate pairs or by sequencing across junctions.

3. Normalisation
Because the library contain different numbers of sequences, we might expect that some RNA have more reads in one sample, resulting in under or over representation of the transcripts. To scale the comparison, typically the read was expressed as read per million library read (RPM) or in other words, it is the transcription proportion. But, because longer transcripts accumulate more than the smaller ones, data need to be adjusted and is scaled to reads per kilobase million.

4. Identification of Differentially Expressed Transcript
It is similar with Microarray technology. But the distribution of measurement DE-Seq

Advantages vs Disadvantages
Overall, microarray and RNA-seq compared quite well. Microarray are limited by the properties of DNA hybridisation, its relatively inexpensive, its mature and established, but it is limited by the design of the array. Meanwhile, RNA-seq offer the ability to discover novel transcripts with high sensitivity (because it counts unique molecule, not the signal background ratio). Other than that, RNA-seq is not limited ny design and therefore it can develop rapidly as knowledge goes further

MasMatin

A journey of dreams, and beyond...

Sunday, 24 April 2016

FGT Part 7 - RNA Sequencing

0 comments :

Post a Comment

About Me

Categories

Popular Posts

Pages

Blog Archive