Friday, 29 April 2016

,

SDSSB Comments 9 - Design and Aesthetics


“...art is about asking questions, questions that may not be answerable” (Maeda, 2012)

Synthetic Biology has been able to bring different species together: artists, designers, scientists, and engineers. I think, an important discussion by Agapakis (2013) and Ginsberg (2014) is the differences in “design mindset” of these species in their respective fields. Both argues that designing synthetic biology within the paradigm of industrialization will limit its future into the so called “myopic & monolithic” consumptive industrial biotechnology. Agapakis (2013) explains that, while synthetic biology brings analytic science to technology and innovation, design will bring technology to society. Opening dialogue between these species would provoke more questions and discussion to where the future of synthetic biology will head on. Explorative imagination of art and design (bioartist) would open new ideas and possible futures for synthetic biology (Yetisen, 2015).

Another keypoints from the papers is the idea that synthetic biology will have different, or maybe its own definition of design. While engineers always long for standardization and predictability, we cannot ignore the fact that designing a complex living system will not be fully predictable. Acknowledging the unpredictability of designing living system will bring another perspective on design, the “speculative design”, that will likely to be working in the social and environmental context of the real world:

“we should approach the design of biological systems with more humility and with design principles that are more biological, emphasizing not control but adaptability, not streamlining but robustness, and not abstraction but complexity” Agapakis (2013)

An interesting part is on p.xviii of Ginsberg (2014):

“Some people assumed that our aim is outreach: a public relations activity on behalf synthetic biology to beautify, package, sanitize, and better communicate the science.”

It is a proof that some people involved in synthetic biology (especially with certain political/industrial standing) still view the translation of science and technology is one way, like the central dogma mentioned in Agapakis (2013). Art and design truly can bring dialogues and more future possibilities to synthetic biology. But, when synthetic biology is heavily commercialised, will this bioart, the “expressions of discord and controversy” (Yetisen, 2015), be heard by those with policy making power?

Additional references

Maeda, J. 2012. How art, technology and design inform creative leaders. TED Talk. Available:    https://www.ted.com/talks/john_maeda_how_art_technology_and_design_inform_creative_leaders

Continue reading SDSSB Comments 9 - Design and Aesthetics
,

SDSSB Comments 8 - Past and Futures


Brown et al. (2006) gave us a good example of how scientific expectations, both hype and dissapointment, have shaped the history of scientific development in the case of Hematopoietic Stem Cell (HSC), which was stated as one of the most valuable stem cell in bioeconomy. I think, in 2006, Brown et al. addreses the change of interest from HSC towards human Embryonic Stem Cell (hESC) which promises a lot of things. From the history of HSC, we can understand that a there were a lot of expectation when a new information/technology was founded (even though it is still poorly understood), but dissapointment came when the expectation was not met, Of course, there were a lot of political and economical background in the announcement of new technologies. The same goes with Nordman & Rip (2009) on the ethical aspect of nanotechnology, where more pressing development receive less attention with the more “hype-ning” futuristic issues. Thus, the history (both hype and dissapointment) might shape future decision or trends of the emerging field.

Danielli’s prediction of the future biology is quite fascinating, and accurate. Interestingly, this perspective on the future seems to be responded differently in comparison when the human genome project was announced. A perspective which at the time seems futuristic has been responded with so much hype when the human genome was announced. But, what was achieved in 2000 was a beginning of the genomic era, yet overly hyped by the press release authors. It is good to bear in mind that the claims and hopes stated in the press release might take longer to became reality.

It is not easy to make balanced expectation to certain technological advances due to personal bias. But as Nordman & Rip (2009) proposes, a more interdisciplinary discussion and “reality check” might help us to get balanced expectation. Predicting the future is of course an important issue for both policymakers and business, but will “promissory capitalization” or “biovalue” shaped our scientific discovery trends in the future? Would it limit our creativity to explore? So how should we think and chose to innovate in the future? In this case, I would like to quote Joi Ito’s (2012) idea on compass over maps:

“The idea is that in a world of massive complexity, speed, and diversity, the cost of mapping and planning details often exceeds the cost of just doing something–and the maps are often wrong”

 References

Ito, J. 2012. Compasses Over Maps. MIT Media Lab Blog. Available from: http://blog.media.mit.edu/2012/07/compasses-over-maps.html
Continue reading SDSSB Comments 8 - Past and Futures
,

SDSSB Comments 7 - Synthetic Biology and the Public Good


Calvert and Emma (2013) starts the discussion with an interesting argument: “science is part of society and society is part of science”. Even so, the paradigm lies to distinct between the scientist, engineers, and policy makers, with the public. Indeed, both scientist, engineers, policy makers, and industry regards “the public” as an important entity to dealt with. Joly and Rip (2007) reported how public opinions have important influence in the development of science and policy by using the case of genetically modified vines by INRA. Another report (Hayden, 2014) shows that synthetic biology business firms really depends on the “public acceptance” of their product. In this particular case, the public was boldly positioned as “the consumer”. Thus, there is a perspective that the development of a disruptive technology, like Synthetic Biology, desperately trying to get “public acceptance”.

Calvert and Emma (2013) and also Wickson et. al. (2010) argues for a perspective change in the position of public within the development of a new and disruptive technology. Calvert and Emma (2013) suggest that changing the frame from “public acceptance” to “public good”. This is achieved by recognizing the public as a heterogeneous groups of citizen engage them in the development of the technology. Synthetic Biology as public goods should be developed through ongoing concern and dialogue with public interest.

But of course, in order to achieve this two way dialogue, as implied in Wickson et al. (2010), we need citizens which actively engage to democratise science and technology development. It is therefore a reminder for us how scientific literacy is needed to realize the ideal public engagement in the development of Synthetic Biology. In reality, there are a lots of part in the world that this still cannot achieved (where citizen can actively engaged in scientific development and policies) due to a lot of circumstances. It is therefore, I think is important for the academia, which was entrusted by the public as the “agent of change”, to be modest and responsible for their act and innovation to contribute as public goods.
Continue reading SDSSB Comments 7 - Synthetic Biology and the Public Good
,

SDSSB Comments 6 - Governance and Regulation


The 1975 Asilomar conference was presented in the unique press narration, entertaining and satire, in the popular Rolling Stones magazine. The conference aims to come up with an agreed regulation on the new disruptive technology. The emerging recombinant DNA technology was predicted, and has been proofed, to have great impact on today’s biotechnology, with risks that is also considerable. Michael Rogers, a press, told the story of the congress somewhat like a group of nerds discussing about the end of the world in an isolated conference. What shocked me though, was the response in page 39: “But what about the press?” (Rogers, 1975). Its as if the scientist and the public was from a diferent species.

Hurlbut et al., (2015) reflects on the 1975 Asilomar Conference to critics the upcoming NAS-NAM plan to CRISPR, the disruptive gene-editing technology, for its ethical, legal, and social implication. Hurlbut argues that the Asilomar conference is an example where an important regulation on a disruptive technology does not involves the opinion of public. Thus, the governance of gene editing technology, and the NAS-NAM plan, should be more democratic. In order to achieve that, the discussion should take note four themes: Envisioning futures, distribution, trust, and provisionality (page 3).

I agree that science policy should involve the wider public, because we all have rights and would be affected by the impact of the technology. Scientist could be depicted as arrogant, paranoid, and enclosed in his “research world”. But I think the science culture from Asilomar 1975 and today’s academia has changed. The interdisciplinarity of today’s academic have brought critical minds to address new technologies and challenges. Good education has given scientific literacy to the public, which is a keypoint for public contribution in the policy making. To  govern a technology with considerable uncertainty in both applications and implications, a thorough discussion between politician, scientist, and public should be well designed. Decision should be made through thorough analysis by world leaders, with the expertise of scientist and taken account the public opinion.
Continue reading SDSSB Comments 6 - Governance and Regulation
,

SDSSB Comments 5 - Bioethics


Knowledge is value-neutral. The value depends on its user. Or is it? Douglas & Savulescu (2010) addresses three issues regarding concerns in Synthetic Biology: (1) it is playing god (?), (2) the distinction between living things and machines, and (3) knowledge misuse, which could leads to bioterrorism or warfare. On playing God, I think as long as it is within the reach of human knowledge, then it is not in the domain of God. It is true that the “openness” of Synthetic Biology could lead to many safety risks, but comparing them to the nuclear warfare is too much. Became paranoid or embrace the possibilities? Proceed with caution, develop risk reduction strategies, but don’t let fear limit our creativity.

What I found more interesting is:

“...that we will misjudge the moral status of the new entities that synthetic biologist may produce” (Douglas & Savulescu (2010, p. 689)

Human has always tried to define and categorize what is living being and what is not, what is their rights and moral status. What is a person and what is the value of life? Harris (1999) choses that a “creature capable of valuing its own existence” as a person, and thus explain its right to exist. And what interesting is, that individual have different moral significance: from potential, pre-person, to actual person. So, how do we know other being than human value their existence? Is it right to give gradual moral significance? What about animals and the creations of synthetic biology?

Regan (1985) argues that theories for animal rights (indirect duties, utilitarism, contractiarism) should be applicable to human rights too. If not, then its wrong. Regan views that all subject of life have inherent value, which is the value as individual to deny discrimination and weighting benefit cannot be used to violate the rights.

So, as synthetic biologists, how are we going to address the moral status of our “creation”? To be honest, I don’t know where to stand. Can logic judges what is right and what is wrong? Is it the time to hear what our heart speaks? Should we question our humanity?

Continue reading SDSSB Comments 5 - Bioethics
,

SDSSB Comments 4 - Synthetic Biology as Open Science?


I envy Drew Endy’s vision on Synthetic Biology. I personally think that IGEM and the Biobricks Foundation starts because of Endy’s personal will to open biology and make it easier to engineer, because he himself was not a “life sciences-trained” academia. Nevertheless, in Endy’s plenary talk (2008), there are two solutions to make this dream happens: (1) involves more people, and (2) development of better tools. And I think it did well. The growing SynBio community has driven innovation to accessible tools and open repositories such as wetware.org. The need for ‘standard exchange format’ has been solved as the SBOL through collaborative attempt from the community members (Galdzicki et al., 2014).

The promise of Synthetic Biology was one of the driving power of DIY-Bio movement in the past decade. But, it’s not just the affordable tools or the “easyness” of Synthetic Biology that drives the DIY-Bio movement. The concept of boundary (Meyer, 2013, p129): between amateurs and professional, big-bio and small-bio, is an important drive for the rise of open biology movement. These DIYBio were the expression of breaking this boundaries.

Current DIYBio communities were mostly born from the previous established hackerspaces or makerspaces, which are mostly have background outside biology. As Jorgensen (2012) said, “...the press had a tendency to overestimate our capabilities and underestimate our ethics”, due to limited access to technology and different regulations around the world, I wonder how many DIYBio group actually did Synthetic Biology?

DIYBio community was shaped by its members, each with different background and visions but shared through Do-It-With-Others (DIWO) principles. What I found interesting from Meyer (2013), is that the European DIYBio states that they found themselves different from the US community. So, how does the geography affects the cultural differences between each DIYBio movement? Were there really a different view between the US, EU, and Asia communities? Will it affect the practice of sharing and openness, or even safety and security approach, in each DIYBio communities
Continue reading SDSSB Comments 4 - Synthetic Biology as Open Science?
,

SDSSB Comments 3 - Ways of Owning


In the last decade, systems and synthetic biology has advanced biology with novel ideas and applications which interests the public, the enterprises, and the academia. As a “hot” topic, Nelson (2014) reported the current issues we have today, the two cultures which debates wether this domain should be “publicly owned” or “privately owned”. But, the question to be asked are (1) what can be patented? and (2) how it will affect the society and innovation.

Calvert (2008) gives an insight on the commodification (the transformation of goods into object of trade) of biological entities. As also stated in Pottage, a commodity should be well defined before disclosured as patent. The problem with the biology, and life itself, that it is dynamic and complex. The reductionism of biology into its molecular parts doesn’t answer emergence properties of living systems, which is the goods that we seek to be commodified. But, Systems Biology which address the holistic interaction of biological systems doesn’t seem to suit the patenting system. Synthetic Biology in the other hand, thorugh its modularity and “predictability” are more suitable for patents.

Pottage (2009) gives insight on how intelectual property became important issue for lots of segment by using Venter’s patent on protocell. As a minimal genome chassis, protocell would a potential core technology to be used as a platform in synthetic biology. Interestingly as it says in p173, the patent may be aimed to gain control to all minimal genome technology. What I understand from the paper, patent enables inventor to disclose (making known to public) and protects their invention in the market (p167). But, restrictice licensing in core technology may results in “Tragedy of anti commons” and became a hindrance in innovation and Venter’s patent need to be given more attention for this.

Protecting Intelectual Property is important for scientist and innovators. In this era where science and technology became very valuable for business, I see that research outputs in academia tends to put lots of efforts in patenting. Will this perspective changes our research trends in the future? I personally believe that the collaborative power of the “crowd” and open source licensing are the powerful drive to innnovate Systems & Synthetic Biology in the future.
Continue reading SDSSB Comments 3 - Ways of Owning
,

SDSSB Comments 2 - Systems Biology and Science Policy


As science and technology deemed important in the progression of humanity, scientific findings moved from “individual artist” in their own laboratory into today’s global scientific society with its culture and policies. Science has come to be an important thing: it is a country’s asset, the driver of new business and industries, and a way to make a living for academia. Therefore, a science policy can affect different aspects not just in the scientific society.

As implied in Macilwain (2011), the investigation to understand how living things work has come to a change of approach, evolved from the reductionist view into the more holistic view of Systems Biology. It promises to bring more sophisticated and complete understanding of human biology, enabling advances in predictive, personalized, preventive and participatory medicine (Hood et al., 2009). This has led the funding bodies to “invest” in Systems Biology, creating new centers around the globe.

The sad story of the MIB (Bain et al., 2014) shows how science policy have a big impact in the development of a research field and the people working in it. Indeed, modelling the Yeast cell is a key to understand how a human cell works, and ultimately “how to battle cancer”. Living cells are not the easiest to work with, but promises need to be fulfilled to the funding agencies. It is then, through research grant reports, the policy makers decide which sector need to be pushed forward or not.

Todays science policy maker are government and Industry. They want the output of research which benefits the country or the business. But the nature of science itself is uncertainty. Scientists jump into uncharted waters, trying to get new knowledge for humanity. But this knowledge may not be beneficial in a clear way. Even “failures” give important information for science to progress.

Does our science policy are the best way to make progress in Systems Biology? Working on systems biology to model living systems indeed have uncertainty aspects of the outcome, but if the science policy is to fund research with a “clear beneficial output”, then it may limit the possibility to explore the “uncharted waters” of biology. Does our science culture (where PI and researchers employed compete for science funding) is already established or it needs to be revolutionized?
Continue reading SDSSB Comments 2 - Systems Biology and Science Policy
,

SDSSB Comments 1 - From Breeding Experiments to Synthetic Biology


Rather than just using philosophical context to define and “limit” a field, I find it more interesting to find what drives the scientific society to gave birth to a new field, both political and technological. It is why, to understand more about Systems and Synthetic Biology, we have to take a look back at the history of its root: Molecular Biology.
The two papers: Abir-Am (1997) and Bud (1998) gave a rather different perspective in the history of Molecular Biology, but both agreed that the foundation of recombinant DNA technology in the 60’s will hold an important point to the development of the Biotechnology Age. What interesting though, the two papers (especially Abir-Arn (1997)) gave a view of how the World Wars and the Cold War play an important role to boost the development of life sciences. Abir-Am (1997) proposes the history of molecular biology in three phases, each influenced by the big “wars”, and how transdiciplinary exact-science has transformed biology into the new age of Biotechnology (chemistry, physics, and mathematics/computer science). Meanwhile, I think Bud (1998) is more conservative, referring Biotechnology came from the early fermentation technologies and the development of new genetic  techniques bring out the “New Biotechnology”. Nevertheless, the dynamic change of science and technology demands upgrade in the research facility, which leds to the new proposal for a new laboratory, and will always happen in the future.
At the end, what drives the new age of molecular Biology today was not the wars anymore, but the business and industry. I wonder if there are political reasons why the authors wrote the papers? On the last paragraph of Bud (1998), I wonder if he state that the genetics-based biotechnology were inspired by traditional biotechnology so does not need to have extra control and therefore give more flexibility for companies to develop their industries in the field?
Continue reading SDSSB Comments 1 - From Breeding Experiments to Synthetic Biology

Sunday, 24 April 2016

,

FGT Part 9 - Genomic Profiling

Many diseases are associated with genomic changes in the genome. For example is cancer and genetic disorder. Changes in the genome can be in the form of gaining or loss of genetic material, or rearrangement of whole chromosomes or smaller section of the chromosome. But, difference in genomes between individual is also an important source of diversity. Therefore, it is interesting to try to profile genomic differences in different individual.

One example is profiling breast cancer. This profiling technique involves classification analysis. What is done is that we take many breast cancer samples from different patients, profile them, and group cells together in terms of their expression by microarrays. This way we will get a classification of different groups of patients whose disease shares common properties. Then, we use combination of data (genotyping info) to determine how gene expression changes and figure out what causes the change.

Early application of genome profiling take a lot of samples from each cell line representing differences in genetic diversity, let these cells grow in culture → treat them with drugs (die or survive)→ connect response of drug to an expression profile for each cell line.

Spectra-paratyping
- attach fluorescent label to chromosomes
- selection for the fluorescent label?→ chromosome will lose it?

Leukaemia example
- chromosomal translocation that generates a fusion protein at the juntcion point responsible for the disease state

- here one want to know where the junction point is

Array Comparative Genome Hybridization (aCGH)
ACGH make use of microarrays, mostly done in dual-channel array (almost obsolete, except special applications like aCGH). The idea is to put two probes onto one array, hybridise them, and look at the differences between them. So, we are comparing the sample with unmodified genome (control) as reference. The goal is to find regions of change which common in samples. We expect 1:1 ratio when control & tumour chromosome length is the same. By using order normalised measurement along the chromosome, we can detect loss/gain by looking for shift in the ratio. We might expect for example 1.5 fold increase when gained 1 chromosome for diploid cell, or 2 fold decrease when lost 1 chromosome. 

This technology is very cheap so it is good for population study, and is widely accessible with relatively good resolution. In population studies, we might have lots of samples and lots of genotypes which from that we can see emerging patterns. But, when understanding cancer, the technology become problematic. Because in cancer, genome gets destabilised. and we get some patterns that are random and others patterns that started the change initially. So, how to identify the latter changes This will bring lots of data together). We need to find 'change points' (patterns that change initially and drive disease).

Single Nucleotide Polymorphism (SNP) Array 
SNP Arrays uses the same array technology, but instead of printing oligonucleotides that represent parts of the transcript, oligos represents genomic changes (SNPs). Therefore, we can identify copy number of variations in a population. 

Affymetrix SNP data enables (1) identification of copy number variants using Comparative Genomic Hybridization. (2) Ploidy status, and (3) Loss of heterozygosity, where one parental chromosome is missing causes to duplication of other parental chromosome

Exome Sequencing
Exome sequencing look at SNPs in exonic regions. In assumption that only coded transcript (protein encoding) which have SNPs may lead to changes (might be wrong). Therefore, disease associated SNPs mostly happens there. - identify exons and sequence them→ compare to a reference genome. - in case one has a library of SNPs: can look up difference between reference and sequenced exons in a database → gives confidence if SNPs are credible or error from sequencing
- need the reference!
- kind of a hybride between microarray and sequencing
- cutting down necessary sequencing by 20-fold→ concentrate on exonic regions

Conclusion
- aCGH: cheap, measure lots of samples but relatively low resolution
- SNP Arrays: good resolution but expensive
- Exome sequencing:  more info but more expensive

Proteomics -2D Differential Gel Electrophoresis

The technique separates proteins according to two independent properties in two discrete steps: (1) Isoelectric focusing (IEF), which separates proteins according to their isoelectric points (pI), and (2) SDS-polyacrylamide gel electrophoresis (SDS-PAGE), which separates them according to their molecular weights (MW).

The power of 2-D electrophoresis as a biochemical separation technique has been recognized since its introduction. Its application, however, has become increasingly significant as a result of a number of developments in separation techniques, image analysis, and protein characterization.

2-D Fluorescence Difference Gel Electrophoresis (2-D DIGE) is a variant of two-dimensional gel electrophoresis that offers the possibility to include an internal standard so that all samples—even those run on different gels—can easily be compared and accurately quantitated.


Continue reading FGT Part 9 - Genomic Profiling
,

FGT Part 8 - Emerging Technologies


Technology is what drives Functional Genomics, it allows us to as new question int the genome level. It is possible because technology allows us to ask questions in parallel, through high throughput technology, giving more power to infer new insights.

But, how do we evaluate the robustness of technology?
Different risk and benefits conditions are in each stage of technology life cycle:

1. Adopt Early: R & D Stage
While a technology still in the research and development stage, it really depends on external funding to survive. It could rise from industry or academia. But investing in this stage have high risk but also promises high benefit if the technology success. Because, to adopt in this stage means that you have to invest in order to gain early access to the technology. But, because the technology is new, analysis process hasn't been developed. This gives a really challenging opportunity to solve problems which other have not been able to answer and developed analysis methods before competition. It really risky because it may fail and disappear because nobody is interested.

2. Adopt in Ascent
Well, you still have the opportunity to became an early adopter, but the technology is more available and analysis processes are rapidly maturing. The risk become lower because the technology is less likely to fail because it has passed the R & D phase.

3. Adopt in Maturity
Risk are low because the technology is widely available, methods are well established, and commercial developer are stable because they have good income. Many problems have been addressed both in terms of technology development and biological application.

4. Adopt in Decline
Generally it is a bad idea because the access to technology might be lost any time. Most problems should be already answered, or probably better alternatives has been developed. Technology development and expertise is declining and makes the value of the technology become lower.

Current Status of FGT
Expression Microarray, SNP Microarray, are in the mature phase but not yet in decline, although its slowing. Standard RNA-seq and ChIP-seq are in ascent towards mature. Mass spectometry are coming out of R & D phase and into ascent. ChIP-on-chip, Roche454, Helicos, and SOLiD are in decline and some are discontinued.  

Emerging Technology
1. High-Throughput Platform: L-1000
The L-1000 platform
Microarray profiling costs a lot of money which limits the amount of samples and replicates. Even though its cheaper than RNA-seq, but it is still a constraint. Learning from microarray platform, basically not all data are changed accross the dataset, we have already identified gene cluster that changes. This means that we don't have to measure all the genes, but we can measure one genes as a proxy for the representative geneset. 

The L-1000 is a bead based technology, it is based on microarrays, but in this case each bead encapsulates nucleotides for running the test. What awesome is that the technology runs on plates (NOT microarrays), which means that they can run in mass and makes them very inexpensive. This high-throughput approach is very valuable for pharmaceutical companies who want to compare all the drugs in their drug collection and work out relationships among these drugs. This is actually an old idea the idea of scaling up and running in parallel is new. Data processing of the platform is very similar to microarray: normalisation, gene expression, clustering. It breaks the record on GEO, it has already submitted lots of data (almost as much data as the entire collection of all the world's record on human profiling in the last ten year) in a very short time (only appeared in a few experiments)!

2. New ChIP-seq & Related Technologies
This new technology are developed to probe the epigenome. Two of this technology are:

a. ChIP-Exonuclease

ChIP-exo overcome the problem found in ChIP-seq where the sequence binding in the TF can only be localised within the purified fragment. In ChIP-seq, antibody will pulls out the protein of interest, purify it, sequence, and analysed to identify the binding peak, then determine exomer motif. The resolution of the technology is relatively small, it cannot tell where the protein exactly binds if one has two binding sites in close proximity. The TF sites is only a few bp, but the fragment are typically few hundreds bp. ChIP-Exo address the problem by using single-stranded exonuclease digestion of ChIP fragments and effectively localises the exact site of TF binding. The exonuclease runs from 5' to 3' and chops DNA until region to which TF is cross-linked and protects DNA. 

b. Conformation Capture Technologies
Conformation capture can analyze folded structure in the chromosome. This 3D structure is hard to measure. The technology captures the spatial connection between bits of chromatin in the genome, ligate and capture, then sequence them. This technology makes an artificial joint between bits of material that are connected and joined together by a common complex.

The technology address the N^2 problem of the genome. This problem means, if any point in the genome can potentially connect to any point in the genome, we will have to do N^2 amount of sequencing to visualize each connection. This is too much sequencing to be done! So, how do we reduce the complexity? If we put in only known regulation sequences, can we see how they connect to each other? If one knows these, one can infer the 3D structure of the genome, which is only 1% of the genome, and its possible to sequence them at a reasonable resolution. In essence, the technology focuses on regulatory regions to captures 3D structure of the genome. 

3. Single Cell Technologies
Right now, we are very interested to analyze a single cell. Why? Because, all the technology above utilises a mixture of cell as a weighted average as sample. This means, one contamination of cell type could change transcription profile and leads to false conclusion. Therfore, by profiling one cell, we can really measure transcript change in a single cell.

But, single cell sequencing are still limited by the affinity of reverse transcriptase for it's template and suffer from amplification biases. There is only 10^5 - 10^6 molecules per cell expressing 10k genes. This small amount need to be amplified. 

Microfluidics allows for the separation of individual cells into wells and process them, one cell at a time, take RNA from single cells, generate probes, then allow reaction for sequencing to be carried in parallel using only tiny amount of volumes. This is ideal for processing a small amount of RNA from 1 or few cell. Microfluidics allow single cell amplification efficiency to be increased. But then, the data is very challenging because only few/tiny amount of transcripts per cell, and the technology still struggles to get representative amplification (rare transcripts are never enough amplified).

Single cell technologu allow us to make new discoveries about cells, with very high resolution. This could reveal a new era in biology. 

For example, the analysis of 'transcription bursting' of single cells which go through cell cycles. In this case, the genes involved in cell cycle became very active and then inactive again. This made statistical analysis to judge for differentially expressed genes, and it turned out to be the cell cycle related genes. Measuring this could make predictions and adjust to the cell's cycle for their measurements.

To control the quality of the platform, spike-in controls (RNAs with known concentrations) can be used to measure quality of amplification. Another approach is to remove amplification bias by counting unique molecules instead of reads. What it means is, how many molecules were in the starting population that was amplified (it represents molecules in that cell)? This perspective gets rid amplification bias. For example, molecules started at the same ratio, one has X and other has 2X. This could have distortion because one amplifies more than another, it collapse down to measure individual molecules in individal cells within individual reactions. The strategy above remove amplification bias by generating a barcode for individual reverse transcription that generates an individual molecule in a ind. rxn from an ind. cell→  can get barcodes and deconvolve information during analysis. single cell has a 90.9% correlation to the averaged, mixed cell population


Drop-seq
- problem: have to sort cells by microfluidic system with flow chambers, expensive, difficult to do
- flow chamber where cells are capture in individual, barcoded beads→ rxn takes place where reverse transcription synthesis is carried out + amplification inside the droplets
- have individual cell and chamber with barcoded oligonucleotides, join those and do synthesise in the droplet→ then, break up and sequence a lot→ know which cell came from which droplet because of barcode
- can generate a transcriptional profile for many individual cells in parallel

Spatial, in situ hybridisation
-  break up cell, do in situ hybridisation on individual components inside 'dots' representing individual cells→ can spatially resolve what cells are next to each other on a dish and predict gene expression

Conclusion
Rapid progress are being made in single cell amplification. Best approaches for this is the composite of available technology: RNA-seq, microfluidics, novel cDNA labelling, etc. Future technology may be a multiomics approach (RNA, DNA, Chromatin, Protein). We already have a technology for measuring the accessibility of the chromatin. So can we then ask which region are accessible for TFs? Can we measure single-cell methylation? The future technology will enables us to ask integrated question on single cells.



Continue reading FGT Part 8 - Emerging Technologies
,

FGT Part 7 - RNA Sequencing


RNA sequencing is the alternative to microarray. RNA sequencing measures population of RNA by generating cDNA with adaptors.

Why choose RNA seq over microarray?

In functional genomics, we are not only interested to see differential expression of genes, but also to see combination of exons which give rise to RNA population. This is difficult because most mRNA expressed never cover entire exonic region. At the moment, we are able to see level of expression, but we do not know splice variants and its interaction with transcription levels. We cannot just see how genome behave by just looking at expression levels.

Other than that, we are also interested to see the importance of non-coding RNA, which is also relevant to the coding RNA. One of the example is microRNA (miRNA), miRNA is a short (~22 nucleotides) which roles is in interference pathway. miRNA controls the expression of mRNA by promoting cleavage, destabilize poly-A tail (increase degradation speed), and make ribosom binding less efficient. But, we need special treatment to measure miRNA.

So, currently, microarray technology cannot answer those questions. This is because, microarray is limited by design and therefore not able to detect novel genes. If the rate of novel discovery of genes is too rapid, microarray will have trouble to keep up in the design.

And, RNA seq could solve those problems.

How it works?

RNA seq was born from the SAGE (Serial analysis of gene expression). In SAGE, mRNA was processed into cDNA tags, which are short sequence of oligos which correspond to certain mRNA. These tags (then called expressed sequence tags or EST) was concatenated, amplified, and sequenced. The result then was mapped to a reference genome, and the tags was counted.

Before attempting RNA seq, RNA sample need to be prepared, Because 99% of the RNA population in the cell came from rRNA, it needs to be removed. this can be done by removing by depletion or poly-A selection for mRNA using magnetic gel beads. For mRNA analysis, cDNA was generated using cDNA primer. The resulting primers than given adaptor and also barcode (for multiplexing). And sequenced!

What do you need to consider when designing RNA-seq experiment?
The aim of the experiment is to find differentially expressed genes. Therefore, experiments must be designed to accurately measure both the counts of each transcript and the variances that are associated with those numbers. The primary thing we need to consider is the same as microarray: (1) The number of replicates in order to estimate within- and among-group variation, (2)

1. Sequence length
First to consider is the sequence length, or how long a read needs to be generated. We need the reads to be long enough because small reads will give high number of hits when referenced to the genome. Around 35-50 bp is long enough to analyze complex genome. Smaller length give more time to reconstruct to be reconstructed, but longer reads can cost more money. Longer reads needs to be considered when analysing splice sites.

2. Sequencing Depth

The depth of sequencing means how many reads or rounds of sequencing need to be done. The depth requirement was estimated by knowing the predicted amount of trancript of interest (is it has low number or high number). Variation due to the sampling process makes a large contribution to the total variance among individuals for transcripts represented by few reads. This means that identifying a treatment effect on genes with shallow coverage is not likely amidst the high sampling noise. More reads will increase sensitivity of the sequencing to detect rare species. But it is limited by the number of unique molecules, so if no more unique molecule present, then no more sequencing by synthesis can happen.

3. Single or Paired End Sequencing
Using Paired End Sequencing could give information on the (1) length and (2) exon prediction. By knowing the starting and ending of a paired sequencing, we can determine exact size of the fragment. This is simply because if we know that a paired end reads correspond to exons, they should be next to each other in the genome.

4. Creating Library: Random run or multiplexing?
The next generation sequencing platform sequence the sample in a flow cell. Using separate flow cell will make the result difficult to compare because of the artefact made in each flow cell, environment conditions, etc. One way to solve this is by multiplexing, which is giving uniqe tags or barcodes to each samples, and mix them together to be read in a single flow cell. This is only limited by the number of unique barcode label available.

General Workflow
1. Mapping to Reference
Mapping of sequence reads to the genome will produce continuous exon islands of mapped read separated by introns. If a reference genome is not available, it can be generated through HTS or we can use available transcript evidence to build gene models and use this models as references. Or use de novo transcript assembly.

2. Quantification
Once the read has been mapped to the genome, exons can be predicted by the islands of expressions. Novel exons can be predicted by annotating the sequence with current database. Splice events can be predicted from sequence using mate pairs or by sequencing across junctions.

3. Normalisation
Because the library contain different numbers of sequences, we might expect that some RNA have more reads in one sample, resulting in under or over representation of the transcripts. To scale the comparison, typically the read was expressed as read per million library read (RPM) or in other words, it is the transcription proportion. But, because longer transcripts accumulate more than the smaller ones, data need to be adjusted and is scaled to reads per kilobase million.

4. Identification of Differentially Expressed Transcript
It is similar with Microarray technology. But the distribution of measurement DE-Seq

Advantages vs Disadvantages 
Overall, microarray and RNA-seq compared quite well. Microarray are limited by the properties of DNA hybridisation, its relatively inexpensive, its mature and established, but it is limited by the design of the array. Meanwhile, RNA-seq offer the ability to discover novel transcripts with high sensitivity (because it counts unique molecule, not the signal background ratio). Other than that, RNA-seq is not limited ny design and therefore it can develop rapidly as knowledge goes further






Continue reading FGT Part 7 - RNA Sequencing
,

FGT Part 5 - Design of Microarray Experiments


Design consideration:
-Identify the key aims
constraining factors
protocols for sample isolation and processing
decide analysis
decide validation
aim to randomise nuisance factors


1. Replication
averaging replicates will give better estimates of the mean. replicates allow statistical inferences to be made.

Biological vs Technical Replication. Techincal ccome from the same sample i ndifferent chips. biological came from different samples. replicates is a scale between biological and technical

3. Level of Inference
Always compromise between precision and generality
what level do conclusion need to be made --> to just the technical sample, to all experiment in cell lines, to  all mices?
More general solution inferences capture more variance
more variablity mena more rep;licates

4. Stastitical issues
a. Level of variability
statistically significant does not always mean biologically significant
b. Multiple testing and False Discovery Rate (FDR)
Usually applies T-Test for each probesets. For each test, P-Values are the probabilities that the test would produce a result as least as extreme assuming the null hypothesis are true. We expect 5% chance that the test result in false positives for multiple test. The FDR was applied to avoid high false positives. Which accounts for the number of test applied.
c. Effect size
How large of a change we want to detect
d. Power
Our ability to discover truth. More replication more power

Common Design Principles
1. Single Factor
varying single factor at once. example with ot wothout drug. for dual channel place comparison of interest near each other. short time can be treatesd on a single factor experiment

-Paired Samples
Microarray experiments with paired designs are often encountered in a clinical setting where for example, samples are isolated from the same patients before and after treatment. Describe the reasons that it might be attractive to employ paired design in microarray experiment!

reduces variability in biological replicates
still captures variability with respect to response between patients

-Pooling vs Amplification
Mutiple isolation are pooled to give enough biological material of the expression level
gives more robust estimation of the expression level
but it can be dominated by one unusual samples
pool only when necessary and consider amplification as alternative
making sub pools is a compromise, ex: pool 15 into 3 x 5
amplificaiton is alternative to overcame limitation due to sample availability
but its not possible to introduce amplification without bias

-Dual Channel Dye Swaps

-Missing measurement

-Practical Design
-Usually limited by cost and sample availability
-consider other experiment for informal estimation parameters
-usually 3-5 replicate for well known strain
or 30-200 for human population inference
consider extendable desing or pilot experiment

Experimental Design Biological questions: Which genes are expressed in a sample? Which genes are differentially expressed (DE) in a treatment, mutant, etc.? Which genes are co-regulated in a series of treatments? Selection of best biological samples and reference Comparisons with minimum number of variables Sample selection: maximum number of expressed genes Alternative reference: pooled RNA of all time points (saves chips) Develop validation and follow-up strategy for expected expression hits e.g. real-time PCR and analysis of transgenics or mutants Choose type of experiment common reference, e.g.: S1 x S1+T1, S1 x S1+T2 paired references, e.g.: S1 x S1+T1, S2 x S2+T1 loop & pooling designs many other designs At least three (two) biological replicates are essential Biological replicates: utilize independently collected biosamples Technical replicates: utilize often the same biosample or RNA pool
Continue reading FGT Part 5 - Design of Microarray Experiments
,

FGT Part 4 - Identifying Differentially Gene Expression in Microarray

Describe the strengths and weakness of filtering on fold change to identify differentially expressed genes from gene expression microarray data

Fold Change
Fold Filtering

When analysing a microarray gene expression dataset it is important to assess the general quality of the data. Describe three methods by which data quality can be assessed. For each method indicate how low and high quality data can be distinguished. 
Check Spike-In
Visual inspection of distribution using scatter plots
Check internal control genes
Check replicate variability

Describe how you might use a scatter plot or MA (MvsA) plot to visually assess the quality of a microarray gene expression data?
M = log2(R/G), log ratio intensity, which means difference between log intensity.
A = 1/2log2(RG) average log intensity
Assume M=0 because most of the gene are not different.
If different apply normalisation

Non-parametric statistical tests can be used as an alternative to parametric statistical test for the identificationof differentially expressed genes in microarray gene expression profiling experiments. Describe in principle how a non-parametric test might be performed and indicate one advantage and one disadvantage of using such test as alternative to parametric test!
Parametric & Non parametric test

Biological consideration
Pooling

Volcano Plot are usually used in the analysis and interpretation of gene expression experiments. What is volcano plot and how it can be used to aid identification of differentially expressed genes?

Describe how functional enrichment analysis can be applied to the results of a microarray experiment. Briefly outline the principle of underlying the calculation of functional enrichment statistics!
Continue reading FGT Part 4 - Identifying Differentially Gene Expression in Microarray
,

FGT Part 3 - Data Normalization for Microarray Data



Why is it important to normalize microarray gene expression data before carrying out data analysis?

The goal of microarray experiment is to identify or compare gene expression pattern through the detection of the expressed mRNA levels between samples. Assuming that the measured intensities for each arrayed gene represent its relative expression level, biologically relevant patterns of expression could be identified by comparing measured expression levels between different states on a gene-by-gene basis.

In microarray experiment, RNA was isolated from the samples (which could be from different tissues, developmental stages, disease states, or drug treated samples), labeled, hybridized to the arrays, washed, and then the intensity of the fluorescent dyes (which is the hybridized target-probes) was scanned. This results in an image data (a grayscale image) which then analyzed to identify the array spots and to measure their relative fluorescence intensities (for each probe-sets).

Every step in transcriptional profiling experiments can contribute to the inherent ’noise’ of array data.
Variations in biosamples, RNA quality and target labeling are normally the biggest noise introducing steps in array experiments. Careful experimental design and initial calibration experiments
can minimize those challenges.

Because of the nature of the process (the biochemical reactions and optical detections), subtle variations between arrays, reagents used, and environmental conditions may lead to sligthly different measurements for the samples. These variations give effects to the measurements: (1) Systematic Variaton which affect a large number of measurement simultaneusly and (2) stochastic components or noise, which are totally random. While noise cannot be avoided, it can be reduced, systematic variation could lead to differences in the shape and center of distribution from the measured data. When these effects are quite significant, gene expression analysis could results in false conclusion because of the variation compared does not result from biological reasons, but systematic error.

Therefore, normalisation would adjust systematic error / variation caused by the process. Thus, by adjusting the distribution of the measured intensities, normalization facilitate comparison and thus enables further analysis to select genes that are significantly differentially expressed between classes of samples. Failure to correctly normalise will invalidate all analysis of the data.

What kind of normalisation could be applied to a dataset?

Normalisation removes systematic biases from data. Normalisation usually applies a form of scaling to the data:

1. It could do scaling by physical grid position to remove spatial biases

Scaling by grid position was caused because there are significant difference in between grid positions. This problem could usually inspected visually. We expect that the intensity in between grid positions should be random, so when we see a patch or pattern between the grid that have different intensities, we could scale those grids up or down to match with other grids. This is also called fit surface or per pin normalisation, and sometimes occurs in dual channel approach.

2. It could remove intensity dependent biases
It uses Loess regression. Consider excluding elements which are not necessary Flagged absent across all experiments. Basically try to transform the data into more linear trend.

3. It could scale intensity values to remove per-chip variation
Per Chip Scaling. Log transform it. and do scaling. You could scale by its mean/median to normalize, make all the mean the same. but sometimes it does not address difference in shape distribution. Linear scaling does not correct distribution differences. Other powerful method is the Quantile Normalisation. this type of normalisation is powerful, and I will talk more about it in below.

4. Per gene normalisation
Uses distances. Genespring commonly assumes this normalisation has been applied.

Quantile Normalisation
Quantile normalisation replace the first value of chips with the first value of the reference until all values have been replaced. This will cause all values to have the same (reference distribution). Of course the probe sets at intensity position could be different for each sample. This assume that most of the genes will not expressed, so the distribution of the data should be quite similar.



Continue reading FGT Part 3 - Data Normalization for Microarray Data

Saturday, 23 April 2016

,

FGT Part 2 - A Closer Look to Microarray Technology


Describe the general properties of the complexity of mRNA populations? Why is it important to consider these properties when planning a gene expression experiment

A population of mRNA extracted from a cell contains not just a lots of different RNA species, but also each with different level of expression. Most of the highest expressed genes are housekeeping genes, which are always expressed in high level to maintain cell activity. Other highest expressed genes are the characteristic of the cell, for example, mitochondrial genes are most expressed in metabolically active cells, or developmental pluripotency associated 5A (Dppa5a) in pluripotent cell. Meanwhile, most of the genes have low expression levels, because they are not expressed.

Hybridisation allows RNA expression levels to be measured

So, in an mRNA population, the most abundant molecules will be dominated by several genes. The problem with this is, sometimes the genes that we find or interested in, are expressed in low levels. In gene expression experiment, what we want to do is comparing mRNA populations between samples, and see which gene are expressed differentially. So, we will probably see similar mRNA abundance distribution even in different cells. And with this, we can also see if there are contaminations (because we expect similar trends).

When the change happens in a lower abundance level (compared to the housekeeping genes for example), it will be hard to detect those. This is true especially when dealing with microarray experiment.

Therefore, we must consider giving the right amount of probe in microarray experiment, because giving more probe to microarray increases the intensity level, but also increses the background intensity.

This happens because the basic of microarray is hybridization. So, a probe-set will complement with its target (specific binding). But, even if the array are able to separate those species through its complementary with certain mRNA, non-specific binding will occur randomly. And this non-specific binding will statistically, caused by the most abundant number of mRNA species in the sample, which contribute to overall background. Increasing probe will also increase the chance of non-specific binding. Therefore it limits the detection level of microarray experiment to measure lower level genes.

You cannot put more probe to detect lower expression in microarray, it just increases background. When you want to detect something rare, we can make more probe of that at lower abundance, but when we do that we are also labelling all the other guys which is not interesting, and all give background.

So, one thing to consider is that microarray relies on the balance between specific binding and non-specific binding. Because microarray measure specific signal against a background of non-specific hybridisation. DNA hybridisation kinetics limit the amount of signal that can be obtained from weakly expressed genes. Labelling more probe label does not generally increases detection because common trancsripts always obtain most of the label.

But nevertheless, sometimes we can see a high number of Trancription Factor in an mRNA population, which connected to a genes.

Outline how PMA (Present, Marginal, Absent) calls are generated from Affymetrix!

To probe genes, oligonucleotides of length 25 bp are used (2). Typically, a mRNA molecule of interest (usually related to a gene) is represented by a probe set composed of 11±20 probe pairs of these oligonucleotides. Each probe pair is composed of a perfect match (PM) probe, a section of the mRNA molecule of interest, and a mismatch (MM) probe that is created by changing the middle (13th) base of the PM with the intention of measuring non-speci®c binding.

In Affymetrix Array, each probesets refer to one specific mRNA or genes. These probesets are a set of 25-ish oligonucleotide which complement to specific mRNAs or genes or predicter exons. The probeset pair composed of two types, which are the Perfect Match probe (PM), and mismatch probe (MM). There is only one single nucleotide differences in the MM. The idea is that the PM will only pair with specific-binding, and the MM will pair with non-specific binding. Therefore, the hybridisation intensity of the mismatch probe is subracted from the perfect match probe to get the correct intensity of specific hybridisation.

In Affymetrix array, the resulting hybridisation scanned into an image file (DAT) which then summarized into a CEL file. This CEL file is the average reads of all the probesets. Then, summary algorithm was generated to create two types of information, one is the number of the signal (the intensity) and the other is the probability of confidence that the gene is expressed or a flag.

What does each call tell you about the expression level of a given gene?

The PMA flag indicates confidence that a particular measurement is from the signal (match) and not from the background (mismatch). This is done by comparing the signal between the PM and MM. Basically it just say how many PM value is higher (brighter) than the corresponding MM value. If tou pick a random intensities, it should result in 50%. Then, we use Fisher's rank sign test to generate p-value. And we can use this significance value from PMA calls as threshold of confidence whether it is:

  • P = present, high value, real measurement of expression
  • M = marginal, intermediate value, probably expressed
  • A = absent, small value, no evidence that signal is different from the background

So, standard image analysis results in image value (signal intensity) and the confidence that the value is from the perfect match signal.

Even so, designing mismatches are not so clever because for example, humans have a lot of Single Nucleotide Polymorphisms. The idea of match/mismatch probes could end up not working because the mismatch probe almost hybridises as well as the perfect match probe. Resulting in subtracting both from each other generate no signal, thus leads to false conclusion. Hence, today we use statistical approaches like RMA (quantile normalisation) which only utilise the PM probe.


How could you use a PMA call alongside fold change filtering of gene to identify deferentially expressed genes?
To de®ne a measure of expression representing the amount of the corresponding mRNA species it is necessary to summarize probe intensities for each probe set. Several
Fold change takes the ratio of a gene's average expression levels under two conditions. It is usually calculated as the difference on the log2 scale. 
Let x ij be the log-transformed expression measurement of the i th gene on the j th array under the control (i = 1,n and j = 1,m0), and y ik be the log-transformed expression measurement of the i th gene on the k th array under the treatment (k = 1,m1). We define https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-10-402/MediaObjects/12859_2009_Article_3132_IEq1_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-10-402/MediaObjects/12859_2009_Article_3132_IEq2_HTML.gif .
Fold change is computed by
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-10-402/MediaObjects/12859_2009_Article_3132_Equ1_HTML.gif
(1)
As for the traditional t test, it is usually calculated on the log2 scale to adjust for the skewness in the original gene expression measurements. The t statistic is then computed by
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-10-402/MediaObjects/12859_2009_Article_3132_Equ2_HTML.gif
(2)
where https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-10-402/MediaObjects/12859_2009_Article_3132_IEq3_HTML.gif is the pooled variance of x ij and y ik . Comparing (1) and (2), it is obvious that fold change and t statistic are based on two contradicting assumptions. 
The underlying assumption of fold change is that all genes share a common variance (on the log2 scale), which is implied by the omission of the variance component in (1). On the other hand, the inclusion ofhttps://static-content.springer.com/image/art%3A10.1186%2F1471-2105-10-402/MediaObjects/12859_2009_Article_3132_IEq3_HTML.gif in (2) suggests that t test assumes gene-specific variances. 
In order for a gene to be flagged as DE, the double filtering procedure would require the gene to have extreme test scores under the common variance assumption as well as under the gene-specific variance assumption. It is analogous to using the intersection of the rejection regions defined by fold change and t statistic.
Assuming a common variance for all the genes apparently is an oversimplification. The assumption of gene-specific variances, however, leads to unstable estimates due to limited replicates from each gene. A more realistic assumption might lie in between the two extremes, i.e., modeling gene variances by a mixture of two components, one being a point mass at the common variance, another being a continuous distribution for the gene-specific variances. Under this mixture variance assumption, a DE gene could have a large fold change or a large t statistic, but not necessarily both. Taking intersection of the rejection regions flagged by fold change and t statistic, as is adopted by the double filtering procedure, might not be the best strategy under the mixture variance assumption.

MAS5 vs RMA
A significant challenge with Affymetrix expression data is to provide an algorithm that combines the signals from the multiple Perfect-Match (PM) and Mismatch (MM) probes that target each transcript into a single value that sensitively and accurately represents its concentration. 

MAS5.0 does this by calculating a robust average of the (logged) PM-MM values [1]; increased variation is observed at low signal strengths and is at least in part due to the extra noise generated by subtracting the MM values from their PM partners [2].
A number of alternatives (e.g. RMA [3]) have been proposed that ignore the MM values, and consequently do not suffer from this source of variation. RMA successfully reduces the variance of low abundance transcripts and has been shown, using controlled datasets in which known quantities of specific mRNAs have been added to a common reference pool, to better distinguish differentially expressed transcripts from those that are unchanging [24].

  • MAS5 normalises each array independently and sequentially; RMA as the name suggests (robust multi-array) uses a multi-chip model
  • MAS5 uses data from mismatch probes to calculate a "robust average", based on subtracting mismatch probe value from match probe value
  • RMA does not use the mismatch probes, because their intensities are often higher than the match probes, making them unreliable as indicators of non-specific binding
  • RMA values are in log2 units, MAS5 are not (so values are not directly comparable)

Detection calls

In addition to expression summaries, the Affymetrix software also generates a p-score that assesses the reliability of each expression level. This is produced by using a signed rank test to consider the significance of the difference between the PM and MM values for each probeset [9]. Informally, MAS 5.0 can be seen to return two values, the first, an estimate of transcript concentration, and the second, a measure of how much the software 'believes' the first. Of potential confusion is the fact that this value is referred to as the 'detection p-value', and is subsequently used to generate a 'detection call', which flags the transcript as 'Present', 'Marginal' or 'Absent' (P/M/A)
Continue reading FGT Part 2 - A Closer Look to Microarray Technology

Thursday, 21 April 2016

,

FGT Part 1 - Introduction


Today, we have accomplished to sequence genomes from living beings. Well, not all, but at least we have a quite comprehensive sequence for human and mice.

So, does that mean that we know how life works? How living things were encoded by ACGTs?

Well, not really. Sequencing the genome is just the start. What's next is to understand what it means. And, that is what Functional Genomics is all about.

Right now, we are taking new tools to understand what the genome does. We now have high throughput technologies that can generate whole lots of data and allows us to ask genome scale questions. In other terms, we can screen what happens, what changes, in the genome in different conditions. We can ask, "Can we find a gene that's important for process X?"

Truly, discovery in functional genomics is limited and driven by technology development. It drives and limits our ability to ask question in mass scale. And it's what limiting our knowledge right now. So, in this post, I will talk to you about several Functional Genomic Technologies that are available right now, how to use it, and how to handle the data generated.

Microarray Technology (Gene Expression Array) - The Foundation for Functional Genomics
To start with, functional genomic technologies allow us to do large scale measurement of transcription. Suppose you want to identify different genes expressed by human cancer line in response to a drug treatment. So, what you need to do is: (1) treat the cancer cell lines with drug and compare it with untreated one, (2) purify the cells, (3) extract the RNA and make its cDNA/RNA, so, in each tubes we will have ten thousands of RNA molecules. Then, we want to know how many of each different RNA is expressed in comparison with control. 

What we can do is using Microarray technology to do transcriptional profiling. DNA microarray uses collection (library) of DNA spotted in a solid surface, where each spot contains different specific DNA sequence called probes. This probes can be short section of genes or other part of the genome.

First, we couple the RNA samples (target) with labels, which could be a radioactive labels (in the old times) or a fluorescent tags/dye. Then, we hybridize the target with probes. Because of the complementary base pairing, the sample will hybridize with its correct probe pair, and we then wash off the non-specific bindings. Then, using a high resolution image analysis, we can capture the intensity strength of the hybridized probes, and tried to finish the puzzle: (1) which probes expressed differently, and (2) intensity changes: which goes up and which goes down. Then, we can analyse the result summary and statistics. This way, we can make a comparative statement on what is going on in the treated and untreated. 

Overview of DNA microarray process (wikipedia)

Spotted vs Oligonucleotide Microarrays 
The difference between spotted and oligonucleotide microarray is the probe used in analysis. In spotted array, the probes are oligonucleotides, cDNA, or small fragment of PCR product that correspond to mRNA. Oligonucleotide microarrays uses short sequences which are designed to match known or predicted open reading frames.

So, basically, the spotted microarray are more specific and usually used in-house, where it can be modified for each experiments. So, it is better to use spotted microarray if you already know which genes are predicted to change or targeted. Meanwhile, the oligonucleotide can give whole genome results and usually used to screen what is happening in the genome. 

Two-Channel vs One-Channel Detection
There are different ways to label and see the differences between samples: two-channel or one-channel.

In two channel detection, each of the two sample were labelled using different fluorophore dyes (usually its Cy3 and Cy5), mixed together, and hybridized in a single microarray. This way, relative intensities of each fluorophore can be used to analyze up-regulated and down-regulated genes.

In single channel detection, you don't mix it. So it's one microarray per sample. But, this only indicate relative abundance when comparing with different samples. Why? Because each samples will encounter protocol or batch specific bias during amplification, labelling, and hybridization. 

So, looks like its obvious to choose the two channel detection right? Actually it's not. Why? Its because if you want to use multiple comparisons, then you will need more combination for two channel detection while you only need one array per sample using the one channel. And microarrays are still helluva expensive. In conclusion, it is not feasible. 

In fact, one of the most used microarray platform today is the Affymetrix Gene Chip (Hope to make the tutorial too), which is the one-channel detection platform. There are other advantages in using one-channel detection: (1) because each array is exposed to one sample, aberrant sample would not affect raw data of other sample, and it is more flexible to compare with other experiment as long batch effect is accounted for.   




A Rapidly Changing Field: Why You Should Learn Some R Programming

Both strategy would have complication in processing the raw data, but the statistical approach for each platform have been rapidly developed in the recent years. The field was rapidly developing which makes it a bit hard to make a good graphical interface software (it will be outdated a soon as new things published). Rather, people will just upload their code scripts (usually in R language) through open source platform. That is why, basic R language and maybe UNIX command will be necessary to keep updated with this technology. In the coming tutorial, we will use Bioconductor and R to analyze example of Affymetrix data.

The Gene Expression Omnibus (GEO) 
Right now, lots of data has been generated, and they are stored in the gene expression omnibus. Have a look :)

Key Microarray Technologies
Here are some key applicaiton of microarray technology, and we will discuss it more in the next part of this post.

  • Expression Arrays (3')
    • Probe: cDNA oligo dT primed
    • Target: 3' Expressed Sequence Tag (EST)
  • Exon Arrays
    • Probe: cDNA random primed
    • Target: EST / Exon fragments
  • Array comparative genomic hybridization (aCGH)
    • Probe: Genomic DNA
    • Target: Genomic Fragments
  • ChIP-on-ChIP / epigenetic modification
    • Probe: Affinity purified genomic DNA fragments
    • Target: Genomic Fragments
  • SNP arrays
    • Probe: Genomic allelic variants
    • Target: Genomic Sequence
  • Protein arrays
    • Probe: Antibody/Protein binding protein
    • Target: Protein fragments
  • Antibody arrays
    • Probe: Protein Samples
    • Target: Antibodies!


High-Throughput Sequencing (HTS): A Close Relative
Nucleic-acid based arrays are quite mature, but in the coming years, low-cost high-throughput sequencing technologies will be used as alternative (or replace?) the technique. Some early technologies were Serial Analysis of Gene Expression (SAGE) shown below which basically generates cDNA from RNA population, tags them, and concatenate them for sequencing. Then, we can count the number of cDNA tags and map them to genome. But, we will see some of the newest technology in the next part.




Closing for the First Part
Really, functional genomic allows us to ask genome scale questions, but to make it tell you something biologically relevant, you cannot just relies on statistics alone, it is a combination between measurement and annotation.

Next: Microarray Technologies!

Continue reading FGT Part 1 - Introduction