From One Genome, Many Types of Cells. But How?

New York Times
Published: February 23, 2009

One of the enduring mysteries of biology is that a variety of specialized cells collaborate in building a body, yet all have an identical genome. Somehow each of the 200 different kinds of cells in the human body — in the brain, liver, bone, heart and many other structures — must be reading off a different set of the hereditary instructions written into the DNA. [With all due regards, "mystery" is usually meant as something without a rational explanation. The question of "How?" (one genome leads to differentiation) has at least one explanation, published in a peer-reviewed science journal on June 20, 2008 - see the first of the 12 pages reproduced below - with free full text and corollary material available here - February 24, 2009]

The system is something like a play in which all the actors have the same script but are assigned different parts and blocked from even seeing anyone else’s lines. The fertilized egg possesses the first copy of the script; as it divides repeatedly into the 10 trillion cells of the human body, the cells assign themselves to the different roles they will play throughout an individual’s lifetime.

How does this assignment process work? The answer, researchers are finding, is that a second layer of information is embedded in the special proteins that package the DNA of the genome. This second layer, known as the epigenome, controls access to the genes, allowing each cell type to activate its own special genes but blocking off most of the rest. A person has one genome but many epigenomes. And the epigenome is involved not just in defining what genes are accessible in each type of cell, but also in controlling when the accessible genes may be activated.

In the wake of the decoding of the human genome in 2003, understanding the epigenome has become a major frontier of research. [I pointed out in my "Google Tech Talk" YouTube that "epigenetics" and "epigenomics" is a somewhat unfortunate terminology, since "Epigenetics" (see Wikipedia) was coined by Waddington in 1942 and ever since has been redefined countless times. Moreover, both "epigenetics" and "epigenomics" is a domain-name of an (excellent!) company in Germany, and thus the use of term not only adds a further layer of confusion, but may also infringe on trademarking. This are reasons why International HoloGenomics Society was formed (by 60+ Founders and hundreds of members from 4 Continents, see "Founders") where "HoloGenomics" is defined as "Genomics plus EpiGenomics expressed in Informatics" - AJP]

Since the settings on the epigenome [and also on the genome - needing a holistic approach - AJP] control which genes are on or off, any derangement of its behavior is likely to have severe effects on the cell.

There is much evidence that changes in the epigenome [and also on the genome] contribute to cancer and other diseases. The epigenome [and also the genome] alters with age — identical twins often look and behave a little differently as they grow older because of accumulated changes to their epigenomes [and to their genome] . Understanding such changes could help address or retard some of the symptoms of aging. And the epigenome [with the genome] may hold the key to the dream of regenerative medicine, that of deriving safe and efficient replacement tissues from a patient’s own cells.

Because the epigenome is the gateway to understanding so many other aspects of the cell’s regulation, some researchers have criticized the “piecemeal basis” on which it is being explored and called for a large epigenome project similar to the $3 billion program in which the human genome was decoded. At present the National Institutes of Health has a small, $190 million initiative, called the Epigenome Roadmap, with the money going to individual researchers.

As is often the case, academic researchers oppose a large, centralized project if the money seems likely to come out of their grants. But it is also true that such projects often fail unless carefully timed and thought out.

Definitely this is a genome-sized thing, and I believe it will have benefits beyond what are foreseen at present,” says Richard A. Young, a biologist at the Whitehead Institute in Cambridge. But Steven Henikoff of the Fred Hutchinson Cancer Research Center in Seattle says the present methods for studying the epigenome are not yet ready to be scaled up. “It’s too early to mount a technology development that would be large scale,” he says. [In the opinion of this Founder of the International HoloGenomics Society, both Young and Henikoff are right. As explained in the Recursive Principle, we are at the brink of gracefully let go two obsolete axioms (the Central Dogma and the JunkDNA misnomer) that have held back progress for over half a Century. Thus, Dr. Young is absolutely correct in the enormity of the challenge. On the other hand, Dr. Henikoff is also right that at this point an "epigenome" project may fail "unless carefully timed and thought out". Even the immature terminology (where there is no consensus what a "gene" is, or some hold-outs are still sticking to the "JunkDNA" misnomer) indicate that the theory of (holistic) genome function must similarly "harden" as quantum mechanics needed to be developed when the old axiom of physics (that the smallest unit of the elements, the atoms would not split - and they did split). It would have been way too early and outright dangerous to start releasing nuclear energy without some core-understanding of nuclear physics. Thus, a key feature of International HoloGenomics Society is that Founders are entirely cross-disciplinary; ranging from clinical genomists to perhaps the greatest mathematician alive (UK, Trinity College, Cambridge), and including a Program Administrator of the National Science Foundation. As Genomics became Informatics, Neural Net specialists are also among the Founders - and this would be a crucial time - were the Founders asked... - to corroborate efforts of NIH with much more mathematics and computation-savvy agencies as NSF and DOE. (Sabbatical of the NSF Program Officer who is an IHGS Founder eliminates a conflict of interest) - AJP]

The epigenome consists of many million chemical modifications, or marks as they are called, that are made along the length of the chromatin, the material of the chromosomes. The chromatin includes the double-stranded ribbon of DNA and the protein spools around which it is wound. Some of the marks that constitute the epigenome are made directly on the DNA, [this is one example of conceptual confusion; if the "epigenome" is a "second layer", is it, or isn't it separate from the DNA? As Francis Collins openly called for, time is to "rethink long held beliefs" of what "genes" are, what "Junk DNA" isn't, etc; with definitions much-much more solid than in traditional biology. That is, if we ever want to "get mathematical enough to write software for hologenomic applications" - AJP] but most are attached to the short tails that stick out from the protein spools. Marks of a certain kind generally extend through a large region or domain of the DNA that covers one or more genes. They are recognized by chromatin regulator proteins that perform the tasks indicated by each kind of mark.

In some marked domains, the regulators cause the DNA to be wound up so tightly that the genes are permanently inaccessible. The center and tips of the chromosomes are sites of such repressive domains. So is one of the two X chromosomes in every woman’s cells, a step that ensures both male and female cells have the same level of activity of the X-based genes.

In other domains, the marks are more permissive, allowing the gene regulators called transcription factors to find their target sites on the DNA. The transcription factors then recruit other members of the complex transcription machinery that begins the process of copying the genes and making the proteins the cell needs. A third kind of domain must be established ahead of the transcription machinery to let it roll along the DNA and transcribe the message in the underlying gene.

Only a handful of domains are known so far, so it is something of a puzzle that more than 100 kinds of marks have been found in the epigenome, along with specialist protein machines that attach or remove each mark. Some biologists think so many marks are needed to specify a few kinds of domain because the system is full of backups. [It is utterly dangerous to draw a direct parallel of the hologenome with computers, even by "borrowing terms" such as "backup". At the dawn of neural net research, von Neumann wrote the book of "The Computer and the Brain" to conclude on the last page that the mathematical language of computers and that of the brain are profoundly different. It took this scientist 25 years to identify the intrinsic mathematical language of cerebellar neural networks (tensor geometry) - that is light-years apart from Boolean Algebra. Now, there is a tide of understanding hologenomic structuro-functional properties of the hologenome in terms of fractal geometry - that has very little to do with either von-Neumann computers or even with classical parallel computing - AJP]

The epigenome’s role in marking up the genome seems to have been built on top of a more ancient packaging role. The packaging would have been needed by one-celled organisms like yeast that keep their genome in a special compartment, the nucleus. For multi-celled organisms to evolve, the chromatin’s packaging system presumably adapted during the course of evolution to index the genome for the needs of different types of cell.

The DNA packaging system alone is an extraordinary technical feat. If the nucleus of a human cell were a hollow sphere the size of a tennis ball, the DNA of the genome would be a thin thread some 24 miles long. The thread must be packed into the sphere with no breakages, and in such a way that any region of it can be found immediately.

The heart of the packaging system is a set of special purpose proteins known as histones. Eight histones lock together to form a miniature spool known as a nucleosome. The DNA twists almost twice round each nucleosome, with short spaces in between. Some 30 million nucleosomes are required to package all the DNA of ordinary cells.

For years, biologists assumed that the histones in their nucleosome spools provided a passive framework for the DNA. But, over the last decade, it has become increasingly clear that this is not the case. The histone tails that jut out from the nucleosomes provide a way of marking up the genetic script. Although one kind of mark is attached directly to the bases in the DNA, more than a hundred others are fixed onto specific sites on the histones’ tails. When the DNA has to replicate, for cell division, the direct marks pass only to the two parent strands and all the nucleosomes are disassembled, yet the cell has ingenious methods for reconstituting the same marks on the two daughter genomes. The marks are called epigenetic, and the whole system the epigenome, because they are inherited across cell division despite not being encoded in the DNA.

How is the structure of the epigenome determined? The basic blueprint for the epigenomes needed by each cell type seems to be inherent in the genome, but the epigenome is then altered by other signals that reach the cell. The epigenome is thus the site where the genome meets the environment. [A "definition" that is different from the previous. According to recent research, prions can directly bind with DNA - though prions are often cited as "epigenomic factors". The epigenome should be the layer in between environment and DNA - thus the recent prion-finding directly contradicts to the provisional definition, since the "second layer" is not there - AJP]

The organization of the epigenomes seems to be computed from information inherent in the genome [How is "genome information" defined? Is the definition "borrowed" from the good old Shannon Information Theory? How can one "compute organization from information"? What do we mean by "organization"? Structural properties, or properties of access/retrieval? What is meant by "computing" in the context of hologenome?- AJP]. “Most of the epigenetic landscape is determined by the DNA sequence,” says Bradley Bernstein, a chromatin expert at Massachusetts General Hospital. The human genome contains many regulatory genes whose protein products, known as transcription factors, control the activity of other genes. It also has a subset of master regulatory genes that control the lower-level regulators. The master transcription factors act on each other’s genes in a way that sets up a circuitry. The output of this circuitry shapes the initial cascade of epigenomes that are spun off from the fertilized egg.

The other shapers of the epigenome are the chromatin regulators, protein machines that read the marks on the histone tails. Some extend marks of a given kind throughout a domain. Some bundle the nucleosomes together so as to silence their genes. Others loosen the DNA from the nucleosome spools so as to ease the path of the transcription machinery along a gene.

Biologists had long assumed that once the chromatin regulators had shaped an epigenome, their work could not be undone because a cell’s fate is essentially irreversible. But a remarkable experiment by the Japanese biologist Shinya Yamanaka in June 2007 underlined the surprising power of the master transcription factors.

By inserting just four of the master regulator genes into skin cells, he showed the transcription factors made by the genes could reprogram the skin cell’s epigenome back into that of the embryonic cell from which it had been derived. The skin cell then behaved just like an embryonic cell, not a skin cell. Until then, biologists had no idea that the epigenome with its millions of marks could be recast so simply or that transcription factors could apparently call the shots so decisively.

But subsequent research has shown the chromatin regulators are not pushovers. Only one in a million of the skin cells treated with the four transcription factors reverts fully to the embryonic state. Most get stuck in transitional states, as if the chromatin regulators are resisting a possibly cancerous change in the cell’s status. “The take-home story is that yes, the transcription factors are really critical players in determining cellular state, but epigenetics [definition?- AJP] is important, too,” Dr. Bernstein said.

The ideal of regenerative medicine is to convert a patient’s normal body cells first back into the embryonic state, and then into the specific cells lost to disease. But to prepare such cells safely and effectively, researchers will probably need to learn how to control and manipulate the chromatin of the epigenome as well as the transcription factors that shape cell identity.

The treatment of many diseases may also lie in drugs that manipulate the epigenome. Rett syndrome, a form of autism that affects girls, is caused by a mutation in the gene for an enzyme that recognizes the chromatin marks placed directly on the DNA. At least in mice, the neurons resume normal function when the mutation is corrected. In several forms of cancer, tumor-suppressor genes turn out to have been inactivated not by mutation, the usual known cause, but by the incorrect placement of marks that invite chromatin regulators to silence the genes.

Drugs developed by Peter A. Jones of the University of Southern California reverse the chromatin silencing of these antitumor genes. Two have recently been approved by the Food and Drug Administration for a blood malignancy, myelodysplastic syndrome.

Besides governing access to the genome, the epigenome also receives a host of signals from the environment. A family of enzymes called sirtuins monitors the nutritional state of the cell, and one of them removes a specific mark from the chromatin, providing a possible route for the genome to respond to famine conditions. Accumulating errors in the epigenome’s regulation [Is the epigenome regulated, or it regulates the "genes", both (or neither)? -AJP] could allow the wrong genes to be expressed, a possible cause of aging.

A principal new technique for studying the marks on an epigenome is to break the chromosomes into fragments, which are then treated with antibodies that bind to a specific mark. The DNA fragments so designated are decoded and matched to sites on the human genome sequence. This provides a genome-wide map of how a particular mark is distributed in a particular epigenomic state. The CHiP-seq maps, as they are called, have been very useful but are far from capturing the full detail of the epigenome, a dynamic structure that can change in minutes.

Individual researchers have made considerable progress but may not be able to assemble the comprehensive set of epigenomic marks and states that would be most useful to those developing new approaches to disease and aging. “I think the effort needs to be organized,” Dr. Young said. “It would benefit from being larger than it is.”

[I regard Nicholas Wade as one of the best - if not the best - among science writers in our postmodern times of Genomics - that is, Post-ENCODE genomics, that we at IHGS call "HoloGenomics". Comments are meant not at all as criticism of his heroic brilliance to cope with the "Genome Revolution" of our times. If anything, since he indicates that NYT will publish a much-needed series of articles, comments are to flash out the need not only to roll out the fresh data (that are so many, that e.g. has not even been room to mention methylation even once). I hope with my comments that it will dawn on readers that an entire new era is upon us. "Unprepared" one might say. In some sense, yes. For instance, the old model of NIH funded proposals for experimentation - guaranteed to result in (some) data (useful or not). When I published my work modeling fractal (thus, recursive) growth of a Purkinje neuron (1989), my ongoing NIH grant was terminated - since recursion violated - at that time - both the "Central Dogma" and "Junk DNA" now obsolete axioms. NSF and DARPA were much friendlier to Neural Net research (the Cold War needed sonar pattern-detectors of Soviet submarines) - and I could use Neural Nets at NASA for automated "reconfiguration" of supersonic F15 flights (it was known since 1983 that a masterful pilot could land an F15 fighter on one wing - see the breath-taking and now declassified YouTube- but his luck depended only on his counter-intuitive brilliance - the process was not automated). With NASA, I also put together a proposal for NIH-NSF-NASA cooperation - but there was no emergency compelling "business unusual". Now there is. As the global economy is "spiraling crash-bound on one wing", as a Panel on January 22nd elaborated (see Churchill Club YouTube) the "Genome Based Economy" needs to be ignited with after-burner super-strength; as Dietrich Stephan stressed, starting with massive, genome-based prevention-program to avoid a health-care crash. Some of us are actually quite well prepared (see "Originator Founders" of IHGS, started as PostGenetics Society; becoming the first organization to officially abandon the Junk DNA misnomer as a scientific notion, 8 months prior to the US government doing so). Some of us never believed either the "Central Dogma" nor the "Junk DNA"  obsolete dogma, and worked a decade or two as preparation for today. "The Principle of Recursive Genome Function" was aimed precisely at the very problem now in the limelight of NYT: "Jacob reminisced in his Memoirs about '... one of the oldest problems in biology: in organisms made up of millions, even billions of cells, every cell possesses a complete set of genes: how then, is that all the genes do not function in the same way in all tissues? [14], This profound question is examined below" (see in between Figs. 1. and 2. of "The Principle"., Feb. 24, 2009]