Newsletter of HoloGenomics

Genomics, Epigenomics integrated into Informatics:

HoloGenomics

A Compilation by Andras J. Pellionisz, see Contact, Bio and References here

Any/all IP-related Contact is solely through

Attorney Kevin Roe, Esq. FractoGene Legal Department

155 E Campbell Ave, Campbell, CA 95008

Secured contact to Dr. Pellionisz

regarding Academic, Board, Non-Profit activities:

andras_at_pellionisz_dot_com or cell Four-Zero-Eight - 891- Seven - One - Eight - Seven

Skip to HoloGenomics Table of Contents


Genome is Fractal? - "Yeah, for sure!"

(Eric Schadt, Double-degree mathematician, Ph.D. in Biomathematics, Sept 15, 2014)

listen here

Mendelspod interview with Eric Schadt, Director of the $600 M Institute of Genomics and Multiscale Biology, NYC (Sept. 15, 2014)

Q [Theral Timpson, Mendelspod]: I have read that you are a Ph.D. in bio-mathematics?

A [Eric Schadt, Ph.D. in bio-mathematics]: Yes, bio-mathematics.

Q: I have recently met a Hungarian-American scientist, András Pellionisz, and he says that we need to bring math into biology and genetics and he says that

THE GENOME IS A FRACTAL.

Do you buy any of that?

A [Eric Schadt]: YEAH, FOR SURE!

[What is the significance of Eric Schadt' confirmation of FractoGene (the utility derived from fractal genome growing fractal organisms)? Credentials of Eric Schadt, with double-degree mathematician, Ph.D. in bio-mathematics, a sterling record of Merck, Pacific Biosciences and now Director of $600 M "Mount Sinai Institute for Genomics and Multi-scale Biology", plus his non-biased (straight as an arrow academic and personal integrity), would be extremely difficult to beat globally to form an independent professional judgement, based on top-command on both bio-mathematics and information theory & technology. Two times seven years after the Human Genome Project (2000-2007 Encode-I first concluding that "Junk DNA is anything but" and the Central Dogmatism proven to be one of the most harmful mistakes "of the history of molecular biology", followed by the wilderness of 2007-2014. From 2000 to 2014 genomics essentially existed "a new science without valid or even universally agreed upon definitions of theoretical axioms". Characteristically, even Eric Lander heralded globally the "nothing is true of the most important fundamental assumptions" (but put in 2009 the Hilbert-fractal of the genome, just two weeks after George Church invited to his Cold Spring Harbor Meeting Dr. Pellionisz). Detractors had to swallow their (sometimes ugly) words - with the only "alternative theory" of "mathematically solid and software-enabling FractoGene" a random sample of metaphors that "genome regulation is turning genes on and off", or that "the genome is a language" (found not to be true twenty years ago, see Flam 1994).

Eric Schadt's academic endorsement of FractoGene consistently goes back to 2010 (if the theory that fractal genotype is found to be experimentally linked to fractal phenotype, "it would be truly revolutionary"). Well, from 2011 compromised fractal globule has been linked by scores of top-notch independent experimental studies worldwide to cancer, autism, schizophrenia and a slew of auto-immune diseases.

What will the "academic endorsement" result in? First, like in case of Prof. Schadt, leading academic centers are likely to gain intellectual leadership to schools of advanced studies, where non-profit applications (see below already over a thousand) are streamlined by a thought leader of non-linear dynamics as the intrinsic mathematics of living systems. Second, (and IP augmented by trade secrets since last CIP in 2007) is likely to result in a for-profit application-monopoly (in force over the US market till mid-March of 2026)]


[Dr. Pellionisz is legally permitted to practice Compensated Professional Services (Analysis, Advisorship, Consultantship, Board Membership, etc) as long as there is no "Conflict of Interest", through Secured Contact (see above).

Communication regarding Intellectual Property of any kind, including but not limited to patents, trade secrets, know-how associated with Dr. Pellionisz must be strictly gated by "Attorney Kevin Roe, Esq. FractoGene Legal Department" (see above)]

Skip to Most Recent News (2014-2012)


Archieves

2014

2011-2013

2010

2009

2008

2007 Post-Encode

2007 Pre-Encode

2006

2005

1972-2004


The Decade of Genomic Uncertainty is Over

The FractoGene Decade (2002-2012)

Pellionisz' FractoGene, 2002 (early media coverage)


Pellionisz' "FractoGene" patent, priority date 2002, patent issued in 2012 (see 2002 priority date, 2007 CIP filing in Google Patents 8,280,641 , and also recursive fractal iteration utility disseminated in peer-reviewed paper and Google Tech Talk YouTube (Is IT Ready for the Dreaded DNA Data Deluge?), both in 2008, presented in September 2009 in Cold Spring Harbor. The issued patent is in force till late March, 2026. The invention drew utility from RELATING genomic- and organismic fractal properties. "Methods" were as described in the body of application, plus ~750 pages of "Incorporation by Reference" ("should be treated as part of the text of the application as filed", see US Law 2163.07(b)). State of Art Methods beyond CIP of Oct. 18, 2007 are handled as "Trade Secrets", as customary in the strongest combination of Intellectual Property Portfolios.

"Evidence for" and/or "Consistent with"??

As evident from the title of the paper above, authors clearly refer to "evidence". Other, and after the initial decade an escalating number of particular authors of independent experimental investigations consider their results "consistent with" the fractal organization found either in the genome, and/or in physiological/pathological (e.g. cancerous) organism(s).

With the significance of claims rapidly gaining a very different value ("evidence for" becoming extremely precious, while "consistent with" generally regarded as almost meaningless) authors are respectfully requested to clarify their (sometimes unclear or ambiguous) claims if they consider themselves in the valuable category of "providing evidence for" - or almost meaningless "consistent with" general class. Clarification to HolGenTech_at_gmail_dot_com will help proper citation, if any. - Dr. Pellionisz

By 2012, independent researchers arrived at the break-through consensus, overdue since 2002. First, ENCODE 2007, followed by ENCODE 2012 replaced the mistaken axioms of "Junk DNA" and "Central Dogma" by the "nolo contendere assumption" of "The Principle of Recursive Genome Function, 2008" (requiring the experimentally found "nearest neighbor organization" of the Hilbert-fractal of genome, at the later date of 2009). The independent illustration above of both the genome as well as organisms exhibiting fractal properties put the challenge plainly in their RELATION. Methods, as e.g. relating genomic fractal defects to the fractality of tumors in the genome disease of cancer, constitute secured intellectual property:


Eric Lander (Science Adviser to the President and Director of Broad Institute) et al. delivered the message
on Science Magazine cover (Oct. 9, 2009) to the effect:

"Mr. President; The Genome is Fractal !"


"Something like this (disruptions in the fractal structures leading to phenotypic change)" were shown to be true (starting in 2011 November, see top-ranking independent experimentalist's publications, cited below).

"Yeah, of course" - it is now "truly revolutionary".

There are only two question for everyone:

(a) "What is in it for me?"

(b) "What is the deal?"


Proof of Concept (Clogged Fractal Structure Linked to Cancer) was already available
at the Hyderabad Conference (February 15, 2012)
Dozens of additional Independent Experimental Proof of Concept Papers were cited in
Hyderabad Proceedings


The genome is replete with repeats. If the fractal structure is compromised
(see laser beam pointing at where the "proximity" is clogged)
syndromes are already linked to cancer(s), autism, schizophrenia, auto-immune diseases, etc.


Table of Contents

(Aug 10) Genome researchers raise alarm over big data
(July 25) The case for copy number variations in autism
(July 25) Intricate DNA flips, swaps found in people with autism
(July 25) The mystery of the instant noodle chromosomes
(July 22) Can ‘jumping genes’ cause cancer chaos?
(July 21) Why you should share your genetic profile [the Noble Academic Dream and the Harsh Business Climate]
(July 20) Why James Watson says the ‘war on cancer’ is fighting the wrong enemy
(July 19) National Cancer Institute: Fractal Geometry at Critical Juncture of Cancer Research
(July 15) Apple may soon collect your DNA as part of a new ResearchKit program
(July 10) Sequencing the genome creates so much data we don’t know what to do with it
(July 07) The living realm depicted by the fractal geometry, (endorsement of FractoGene by Gabriele A. Losa)
(July 03) Google and Broad Institute Team Up to Bring Genomic Analysis to the Cloud
(June 19) GlaxoSmithKline, Searching For Hit Drugs, Pours $95M Into DNA 'Dark Matter'
(June 09) Recurrent somatic mutations in regulatory regions of human cancer genomes (Nature Genetics, dominant author Michael Snyder)
(May 22) Big Data (Stanford): 2015 Nobelist Michael Levitt (multi-scale biology) endorses the Fractal Approach to new school of genomics
(Apr 15) Eric Schadt - Big Data is revealing about the world’s trickiest diseases
(Apr 15) IBM Announces Deals With Apple, Johnson And Johnson, And Medtronic In Bid To Transform Health Care
(Apr 09) An 'evolutionary relic' of the genome causes cancer
(Mar 31) Time Magazine Cover Issue - Closing the Cancer Gap
(Mar 31) We have run out of money - time to start thinking!
(Mar 27) The Genome (both DNA and RNA) is replete with repeats. The question is the mathematics (fractals)
(Mar 21) On the Fractal Design in Human Brain and Nervous Tissue - Losa recognizes FractoGene
(Mar 16) Cracking the code of human life: The Birth of BioInformatics & Computational Genomics
(Feb 26) Future of Genomic Medicine Depends on Sharing Information - Eric Lander to Bangalore
(Feb 25) Genetic Geometry Takes Shape (and it is fractal, see FractoGene by Pellionisz, 2002)
(Feb 19) The $2 Trillion Trilemma of Global Precision Medicine
(Feb 11) BGI Pushing for Analytics
(Feb 10) Who was next to President Obama at the perhaps critical get-together (2011)?
(Feb 03) Round II of "Government vs Private Sector" - or "Is Our Understanding of Genome Regulation Ready for the Dreaded DNA Data Tsunami?"
(Jan 31) Houston, We've Got a Problem!
(Jan 27) Small snippets of genes may have big effect in autism
(Jan 27) Autism genomes add to disorder's mystery
(Jan 27) Hundreds of Millions Sought for Personalized Medicine Initiative
(Jan 22) SAP Teams with ASCO to Fight Cancer
(Jan 15) Human longevity-genentech ink deal sequence thousands genomes
(Jan 13) UCSC Receives $1M Grant from Simons Foundation to Create Human Genetic Variation Map
(Jan 12) Silencing long noncoding RNAs with genome-editing tools with full .pdf
(Jan 08) Who Owns the Biggest Biotech Discovery of the Century?
(Jan 07) NIH grants aim to decipher the language of gene regulation
(Jan 07) End of cancer-genome project prompts rethink: Geneticists debate whether focus should shift from sequencing genomes to analysing function
(Jan 07) Variation in cancer risk among tissues can be explained by the number of stem cell divisions
(Jan 04) Finding the simple patterns in a complex world (Barnsley: "cancers are fractals")
(2015 Jan 02) A fractal geometric model of prostate carcinoma and classes of equivalence

For archived HoloGenomics News articles see Archives above


Genome researchers raise alarm over big data

Storing and processing genome data will exceed the computing challenges of running YouTube and Twitter, biologists warn.

Erika Check Hayden

07 July 2015

The computing resources needed to handle genome data will soon exceed those of Twitter and YouTube, says a team of biologists and computer scientists who are worried that their discipline is not geared up to cope with the coming genomics flood.

Other computing experts say that such a comparison with other ‘big data’ areas is not convincing and a little glib. But they agree that the computing needs of genomics will be enormous as sequencing costs drop and ever more genomes are analysed.

By 2025, between 100 million and 2 billion human genomes could have been sequenced, according to the report1, which is published in the journal PLoS Biology. The data-storage demands for this alone could run to as much as 2–40 exabytes (1 exabyte is 1018 bytes), because the number of data that must be stored for a single genome are 30 times larger than the size of the genome itself, to make up for errors incurred during sequencing and preliminary analysis.

The team says that this outstrips YouTube’s projected annual storage needs of 1–2 exabytes of video by 2025 and Twitter’s projected 1–17 petabytes per year (1 petabyte is 1015 bytes). It even exceeds the 1 exabyte per year projected for what will be the world’s largest astronomy project, the Square Kilometre Array, to be sited in South Africa and Australia. But storage is only a small part of the problem: the paper argues that computing requirements for acquiring, distributing and analysing genomics data may be even more demanding.

Major change

“This serves as a clarion call that genomics is going to pose some severe challenges,” says biologist Gene Robinson from the University of Illinois at Urbana-Champaign (UIUC), a co-author of the paper. “Some major change is going to need to happen to handle the volume of data and speed of analysis that will be required.”

Narayan Desai, a computer scientist at communications giant Ericsson in San Jose, California, is not impressed by the way the study compares the demands of other disciplines. “This isn’t a particularly credible analysis,” he says. Desai points out that the paper gives short shrift to the way in which other disciplines handle the data they collect — for instance, the paper underestimates the processing and analysis aspects of the video and text data collected and distributed by Twitter and YouTube, such as advertisement targeting and serving videos to diverse formats.

Nevertheless, Desai says, genomics will have to address the fundamental question of how much data it should generate. “The world has a limited capacity for data collection and analysis, and it should be used well. Because of the accessibility of sequencing, the explosive growth of the community has occurred in a largely decentralized fashion, which can't easily address questions like this," he says. Other resource-intensive disciplines, such as high-energy physics, are more centralized; they “require coordination and consensus for instrument design, data collection and sampling strategies”, he adds. But genomics data sets are more balkanized, despite the recent interest of cloud-computing companies in centrally storing large amounts of genomics data.

Coordinated approach

Astronomers and high-energy physicists process much of their raw data soon after collection and then discard them, which simplifies later steps such as distribution and analysis. But genomics does not yet have standards for converting raw sequence data into processed data.

The variety of analyses that biologists want to perform in genomics is also uniquely large, the authors write, and current methods for performing these analyses will not necessarily translate well as the volume of such data rises. For instance, comparing two genomes requires comparing two sets of genetic variants. “If you have a million genomes, you’re talking about a million-squared pairwise comparisons,” says Saurabh Sinha, a computer scientist at the UIUC and a co-author of the paper. “The algorithms for doing that are going to scale badly.”

Observational cosmologist Robert Brunner, also at the UIUC, says that, rather than comparing disciplines, he would have liked to have seen a call to arms for big-data problems that span disciplines and that could benefit from a coordinated approach — such as the relative dearth of career paths for computational specialists in science, and the need for specialized types of storage and analysis capacity that will not necessarily be met by industrial providers.

“Genomics poses some of the same challenges as astronomy, atmospheric science, crop science, particle physics and whatever big-data domain you want to think about,” Brunner says. “The real thing to do here is to say what are things in common that we can work together to solve.”

Nature doi:10.1038/nature.2015.17912

[During the summer of 2015 practically all Big IT companies of the world signed up for "Genomics turned Informatics" - originally heralded by LeRoy Hood, 2002. The line-up is marked by Microsoft also joining the fray in Silicon Valley by Intel, Apple, a reorganized Google Genomics all claiming a slice of the silicon pie. The analysis will show how the present challenge is different from previous disruptive science/technology endeavors; in need of much more cohesion than at any time in the history of basic science breakthroughs translated into immediate applications - Andras_at_Pellionisz_dot_com]


Intricate DNA flips, swaps found in people with autism

Print Jessica Wright

A surprisingly large proportion of people with autism have complex rearrangements of their chromosomes that were missed by conventional genetic screening, researchers reported 2 July in the American Journal of Human Genetics1.

The study does not reveal whether these aberrations are more common in people with autism than in unaffected individuals. But similar chromosomal rearrangements that either duplicate or delete stretches of DNA, called copy number variations, are important contributors to autism as well as to other neuropsychiatric disorders. These more complex variations are likely to be no different, says lead researcher Michael Talkowski, assistant professor of neurology at Harvard University.

Talkowski’s team found intricate cases of molecular origami in which two duplications flank another type of structural variation, such as an inversion or deletion.

“This is going to become an important class of variation to study in autism, long term,” Talkowski says.

The finding is particularly important because current methods of genetic analysis are not equipped to detect this type of chromosomal scrambling. The go-to method for clinical testing — which compares chopped-up fragments of an individual’s DNA with a reference genome on a chip — can spot duplications or deletions. But this method cannot tell when a DNA sequence has been flipped or moved from one chromosomal location to another, for example.

Variations like this even confound genome-sequencing technologies. Last year, for example, researchers published the results of two massive projects that sequenced every gene in thousands of people with autism. But because these genetic jumbles often fall outside gene-coding regions, they remained unnoticed.

“The complexity of genomic variation is far beyond what current genomic sequencing can see,” says James Lupksi, professor of molecular and human genetics at the Baylor College of Medicine in Houston, Texas, who was not involved in the study. “We don't have the analysis tools to see it, even though it's right there before our very eyes.”

Complex chromosomes:

Researchers have long had hints that complex variations exist, but they had no idea how prevalent they are. In 2012, using a method that provides a rough picture of the shape of chromosomes, Talkowski and his team found pieces of DNA swapped between chromosomes in 38 children who have either autism or another neurodevelopmental disorder2.

Lupski’s team also found examples in which two duplications bracket a region that appears in triplicate3. Then last year, Talkowski and his colleagues reported one example of a chromosomal duplication that flanks a flipped, or inverted, section of DNA4.

In the new study, the researchers looked at 259 individuals with autism and found that as many as 21, or 8 percent, harbor this type of duplication-inversion-duplication pattern. And a nearly equal number of individuals have other forms of rearrangement, such as deleted segments sandwiched between duplications.

The researchers were able to reveal these complex variants by sequencing each genome in its entirety. The traditional method chops up the genome into fragments that are about 100 bases long. When mapped back to a reference genome, however, these short fragments may miss small duplications or rearrangements.

The new method instead generates larger fragments, containing roughly 3,700 nucleotides apiece. Scientists then sequence the 100 nucleotides at the ends of each fragment. When mapped back to a reference genome, the large fragments reveal structural changes. For example, when a pair of sequenced ends brackets more DNA than is found in the reference sequence, that fragment may contain a duplication.

Because the approach generates multiple overlapping fragments, researchers also end up with about 100 pieces of sequence that include the junctions, or borders, of the rearranged fragments. The abundance of overlapping sequences provides significantly more detail than the standard method, which covers each nucleotide only a few times.

“The researchers have a found a more novel way to sequence and dug in to an insane degree — it’s work that almost no one else would want to try to attempt, because it’s so difficult,” says Michael Ronemus, research assistant professor at Cold Spring Harbor Laboratory in New York, who was not involved in the study. “The findings give us a sense of how common these things might be in human genomes in general.”

Whether these rearrangements are important contributors to autism and neurodevelopmental disorders is still an open question — one that Talkowski and his colleagues are gearing up to address. The genomes they sequenced came from the Simons Simplex Collection, a database that includes the DNA of children with autism and their unaffected parents and siblings. (The collection is funded by the Simons Foundation, SFARI.org’s parent organization.)

The researchers are using their methods to sequence the genomes of the children’s relatives. This experiment will reveal whether complex variants are more common in people with autism than in unaffected family members.

Already, there are hints that the rearrangements contribute to autism risk in some individuals. Overall, the variants in the study duplicate 27 genes, introduce 3 mutations and in one case fuse two genes together. (The particular genes involved depend on where the mix-up occurs in the genome.) Sequencing studies have tied one of the duplicated genes, AMBP, to autism. And a regulatory gene that is disrupted by the rearrangement, AUTS2, also has strong links to the disorder.

News and Opinion articles on SFARI.org are editorially independent of the Simons Foundation.

References:

1: Brand H. et al. Am. J. Hum. Genet. 97, 170-176 (2015) PubMed

2: Talkowski M.E. et al. Cell 149, 525-537 (2012) PubMed

3: Carvalho C.M. et al. Nat. Genet. 43, 1074-1081 (2011) PubMed

4: Brand H. et al. Am. J. Hum. Genet. 95, 454-461 (2014) PubMed


The case for copy number variations in autism

Print Meredith Wadman

17 March 2008

Following a series of papers in the past two years, what seems irrefutable is that copy number variations ― in which a particular stretch of DNA is either deleted or duplicated ― are important in autism1,2.

Already, "CNVs are the most common cause of autism that we can identify today, by far," notes Arthur Beaudet, a geneticist at the Baylor College of Medicine in Houston.

What confronts researchers now is uncovering when and how CNVs influence autism. Do these variations cause the disease directly by altering key genes, or indirectly, in combination with other distant genes, or are they coincidental observations with no link to the disease?

The answer seems to be all of the above.

"In some cases these CNVs are causing autism; in some they are adding to its complexity; and in some they are incidental," says Stephen Scherer, director of the Center for Applied Genomics at The Hospital for Sick Children in Toronto. "We need to figure out which are which."

In February, Scherer published the latest CNV paper identifying 277 CNVs in 427 unrelated individuals with autism3. In 27 of these patients, the CNVs are de novo, meaning that they appear in children with autism, but not in their healthy parents.

Among the key findings in that paper are de novo CNVs on chromosome 16, at the same spot previously identified by a report published in January by Mark Daly and his colleagues.

Hot spots:

Different teams have documented a few of these 'hot spots' on the genome where CNVs are seen in up to one percent of people with autism ― and virtually never in those without it.

There are intriguing suggestions that CNVs uncovered at these hot spots may not be autism-specific. For example, three of the patients found to have a duplication on chromosome 16 in the January paper have been diagnosed with developmental delay and not autism.

A laundry list of other CNVs have only been identified in a single, individual with autism, making it difficult to tag them as a cause of the disease.

"[When] people publish big lists of regions, there's an implicit thing that if my kid has this, it's going to have autism," says Evan Eichler, a Howard Hughes Medical Institute investigator at the University of Washington in Seattle. But, "there's no proof," he notes.

To replicate lone findings in other individuals with autism, some researchers are trying to screen much larger samples of individuals with autism.

"Screening 5,000 families instead of 500 would really be of huge benefit," says Jonathan Sebat of the Cold Spring Harbor Laboratory in New York. Sebat and Mike Wigler propelled the field forward last year with a a high-profile list of de novo CNVs4. Their team is gearing up to scan 1,500 families with just one affected child ― in whom de novo mutations are more likely to turn up.

Scherer's group is screening the most promising CNVs from their February paper ― those they identified in two or more unrelated people, or that overlap with a gene already suspected in autism ― in a larger sample of nearly 1,000 patients.

Complex scenarios:

The team is drilling down to find smaller changes: deletions or duplications shorter than 1,000 bases in length. But the answers are unlikely to be simple.

For instance, Scherer found one 277 kilobase deletion at the tip of chromosome 22 in a girl with autism. Another team had reported in 20065 that mutations in this region cause autism in several families by crippling one of the body's two copies of the gene coding for SHANK3, a protein that is crucial for healthy communication between brain cells. In the same girl, however, Scherer also found something new: a duplication of a chunk of genome on chromosome 20 that is five times as big as the deletion on chromosome 22.

If the chromosome 22 deletion hadn't already been documented ― and if Scherer's study hadn't resolved down to 277 kilobases ― it would have been easy to assume that the chromosome 20 duplication was entirely responsible for the girl's autism.

As it stands, however, "probably some of the genes that are being duplicated on chromosome 20 are adding complexity to her autism," Scherer says, noting that the girl's symptoms include epilepsy and abnormal physical features.

The fact that the same hot spot has been implicated in different cognitive disorders adds to the complexity. A given CNV "is not always associated just with autism," says Eichler. "That's what messing with people's minds."

Eichler raises another issue that researchers need to resolve: nomenclature.

Copy number variations are a subset of a bigger category of mutations called structural variations. These include other changes such as inversions and translocations of large chunks of sequence, which don't lead to a net gain or loss in sequence as deletions and duplications do, but can still have significant consequences for cognitive function6.

"Copy number is not as good a term," says Eichler. "Structural variation includes inversion and translocation, [and is] a much more encompassing term."

References:

Jacquemont M.L. et al. J. Med. Genet. 43, 843-849 (2006) PubMed ↩

Weiss L.A. et al. N. Engl. J. Med. 358, 667-675 (2008) PubMed ↩

Marshall C.R. et al. Am. J. Hum. Genet. 82, 477-488 (2008) PubMed ↩

Sebat J. et al. Science 316, 445-449 (2007) PubMed ↩

Durand C.M. et al. Nat. Genet. 39, 25-27 (2006) PubMed ↩

Bonaglia, M.C. et al. Am. J. Hum. Genet. 69, 261-268 (2001) PubMed ↩

[A biophysicist to mathematicians: Please note that his article, holding the conclusion "irrefutable is that copy number variations ― in which a particular stretch of DNA is either deleted or duplicated ― are important in autism" originated in 2008 - the proverbial 7 years ago. Biophysicists are overjoyed when the eminently measurable "repeats" are "irrefutably" linked to "mysterious" diseases, such as autism, cancer and a slew of auto-immune diseases, see summary in Pellionisz (2012), Pellionisz et al (2013). Gaining a mathematical handle, indeed, is a major step towards software-enabling algorithms to engage vast computer power to unlock "genomic mysteries". However, mathematicians often drill down to find the definition of any new mathematical-looking entity. In seven years till the above article, CNV (Copy Number Variation) has not been mathematically defined in a generally accepted manner. Some "define" as a "copy" a string of bases that is composed of 1,000 bases, others define "copy" that is composed of 10,000, or 100,000, or even 1,000,000 bases. Too many "definitions" is "no definition". FractoGene is based on the universally accepted fact that the human genome is replete with repeats of different lenghts - and since Pellionisz (2009) the measurable characteristics of control versus diseased genomes are their Zipf-Mandelbrot-Fractal-Parabolic-Distribution-Curves. After the proverbial 7-years, we stand ready for deployment. Andras_at_Pellionisz_com]


The mystery of the instant noodle chromosomes

July 23, 2015

This is an example of hierarchical folded package of globule. Credit: L. Nazarov

A group of researchers from the Lomonosov Moscow State University tried to address one of the least understood issues in the modern molecular biology, namely, how do strands of DNA pack themselves into the cell nucleus. Scientists concluded that packing of the genome in a special state called "fractal globule", apart from other known advantages of this state, allows the genetic machinery of the cell to operate with maximum speed due to comperatively rapid thermal diffusion. The article describing their results was published in Physical Review Letters which is one of the most prestigious physics journals with the impact factor of 7.8.

Fractal globule is a mathematical term. If you drop a long spinning fishing line on the floor, it will curtail immediately into such an unimaginably vile tangle that you will either have to unravel it for hours, or run to the store for a new one. An entangled state like this is an example of the so-called equilibrium globule. Fractal globule is a much more convenient state. Sticking to the fishing line example fractal globule is a lump, where the line is never fastened in a knot, instead it is just curled into series of loops with no loops tangled with each other. Such a structure—a set of free loops of different sizes - can be unraveled by just pulling it by two ends.

Due to this structure of loops or crumples,, which reminds the structure of an instant noodle block, Soviet physicists Alexander Grosberg, Sergey Nechayev and Eugene Shakhnovich, who first predicted it back in 1988, named this structure "crumpled globule". In the recent years it is more often called a fractal globule. On the one hand, this new name just sounds more sophisticated and serious than "crumpled globule", but on the other hand, it fully reflects the properties of such a globule, because, like all fractals, its structure, which, in this case is represented by a set of loops of different sizes, is repeated in the small and large scale.

For a long time the predicted crumpled globule state remained a purely theoretical object. However, the results of the recent studies indicate that the chromosomes in the cell nucleus may be packed into a fractal globule. There is no consensus on this issue in the scientific community, but the specialists working in this area are much intrigued about this possibility and during the last 5-7 years there has been a flood of research on fractal globule packing of the genome.

The idea that chromatin (that is to say, a long strand consisting of DNA and attached proteins) in a cell nucleus may be organized in a fractal globule makes intuitive sense. Indeed, the chromatin is essentially a huge library containing all the hereditary information "known" to a cell, in particular, all the information about synthesis of all the proteins which the organism in principle is able to produce. It seems natural that such a huge amount of data, which should be preserved and kept readable in a predictable way, should be somehow organized. It makes no sense to make the strands consisting different parts of information entangled and knotted around each other, such an action seems akin to gluing or tying up together the volumes in a library: obviously, it makes the contents of the books much less accessible to a visitor.

In addition, it seems natural that a strand in a fractal globule has, in the absence of knots, a greater freedom of movement, which is important for the genome function: it is necessary for the gene transcription regulation that the individual parts of the genome meet each other at the right time, "activating" the signal for reading the entire system and pointing the place where the reading should start. Moreover, all of this must happen quickly enough.

"According to the existing theories if the polymer chain is folded into a regular equilibrium globule, the mean square of the chain link thermal displacement increases with time as time to the power 0.25",—says Mikhail Tamm, a senior researcher at the Department of Polymer and Crystal at the Physics Faculty of the Lomonosov Moscow State University.

According to Mikhail Tamm, he and his colleagues managed to come up with a somewhat similar theory for a link of a polymer chain folded in a fractal globule.

"We were able to evaluate the thermal dynamics inherent to this type of conformation. The computer simulations we have conducted are in good agreement with our theoretical result",—says Mikhail Tamm.

Scientists from the Lomonosov Moscow State University developed a computer modeling algorithm that allows to prepare a chromatin chain packed in a fractal globule state and to monitor the thermal processes taking place there. Importantly, they managed to model a very long chain, consisting of one quarter million units, which is the longest accessible so far.

According to Mikhail Tamm, chains in the modeling need to be long in order to get meaningful results, but modeling of long chains is usually hampered by the fact that it takes them a very long time to equilibrate, while without proper equilibration the results on thermal diffusion as well as other characteristics of the chains are unreliable.

The researchers were able to successfully solve this problem by the combination of a properly constructed software and access CPU time on the MSU supercomputer "Lomonosov", and assess the dynamics of the thermal motion in a fractal globule. It was found that the links of the chromatin chain packed in a fractal globule moves faster than in a comparable equilibrium one. Indeed, the mean square thermal displacement of the link does not grow in proportion to the time to the power 0.25 anymore, but as time to the power 0.4. It means that the movement of the links turns out to be much faster. It seems to be an additional argument in support of the fractal globule model of the chromatin.

The researchers hope that their work will help to provide better insight in the functioning of the gene storage and expression machinery in the cell nucleus.

"From the point of view of dynamics, we would like to understand what are the built-in characteristic times, what processes can occur simply due to thermal motion, and which ones inevitably require the use of active elements to speed up the functioning of DNA",—summed up Mikhail Tamm.

More information: Physical Review Letters DOI: 10.1103/PhysRevLett.114.178102


Can ‘jumping genes’ cause cancer chaos?

Category: Science blog July 10, 2015 Kat Arney

[Fig. 2. of the science article linked below]

Statistically speaking, your genome is mostly junk.

Less than two per cent of it is made up of actual genes – stretches of DNA carrying instructions that tell cells to make protein molecules. A larger (and hotly debated) proportion is given over to regulatory ‘control switches’, responsible for turning genes on and off at the right time and in the right place. There are also lots of sequences that are used to produce what’s known as ‘non-coding RNA’. And then there’s a whole lot that is just boring and repetitive.

As an example, the human genome is peppered with more than half a million copies of a repeated virus-like sequence called Line-1 (also known as L1).

Usually these L1 repeats just sit there, passively padding out our DNA. But a new study from our researchers in Cambridge suggests that they can start jumping around within the genome, potentially contributing to the genetic chaos underpinning oesophageal cancer.

Let’s take a closer look at these so-called ‘jumping genes’, and how they might be implicated in cancer.

Genes on the hop

The secret of L1’s success is that it’s a transposon – the more formal name for a jumping gene. These wandering elements were first discovered in plants by the remarkable Nobel prize-winning scientist Barbara McClintock, back in 1950. [As we know, Barbara McClintock' discovery was denied in the most unprofessional manner from 1950 till 1983 when she received her Nobel-prize. 33 years (a full generation) was so bad that Dr. McClintock could consider her lucky that she survived a systemic denial. The set-back of science by that denial was much longer than 33 years, however. Consider that science actually proceeded "to fight the wrong enemy", to borrow a phrase from Nobelist Jim Watson . How many people died misrable deaths over the negligence? Andras_at_Pellionisz_dot_com ]

They’re only a few thousands DNA ‘letters’ long, and many of them are damaged. But intact L1 transposons contain all the instructions they need to hijack the cell’s molecular machinery and start moving.

Firstly, their genetic code is ‘read’ (through a process called transcription) to produce a molecule of RNA, containing instructions for both a set of molecular ‘scissors’ that can cut DNA, together with an unusual enzyme called reverse transcriptase, which can turn RNA back into DNA.

Together these molecules act as genetic vandals. The scissors pick a random place in the genome and start cutting, while the L1 RNA settles itself into the resulting gap. Then the reverse transcriptase gets to work, converting the RNA into DNA and weaving the invader permanently into the fabric of the genome.

This cutting and pasting is a risky business. Although many transposons will land safely in a stretch of unimportant genomic junk without causing any problems, there’s a chance that one may hopscotch its way into an important gene or control region, affecting its function.

So given that cancers are driven by faulty genes, could hopping L1 elements be responsible for some of this genetic chaos?

In fact, this idea isn’t new.

More than two decades ago, scientists in Japan and the US published a paper looking at DNA from 150 bowel tumour samples. In one of them they discovered that an L1 transposon had jumped into a gene called APC, which normally acts as a ‘brake’ on tumour growth. This presumably caused so much damage that APC could no longer work properly, leading to cancer.

Because every L1 ‘hop’ is a unique event, it’s very difficult to detect them in normal cells in the body. But tumours grow from individual cells or small groups of cells, known as clones. So if a transposon jump happens early on during cancer development, it will probably be detectable in the DNA of most – if not all – of the cells in a tumour.

Thanks to advances in DNA sequencing technology, it’s now possible to detect these events – something that researchers are starting to do in a range of cancer types.

Jumping genes and oesophageal cancer

In the study published today, the Cambridge team – led by Rebecca Fitzgerald and Paul Edwards – analysed the genomes of 43 oesophageal tumour samples, gathered as part of an ongoing research project called the International Cancer Genome Consortium.

Surprisingly, they found new L1 insertions in around three quarters of the samples. On average there were around 100 jumps per tumour, although some had up to 700. And in some cases they had jumped into important ‘driver’ genes known to be involved in cancer.

The findings also have relevance for other researchers studying genetic mutations in cancer. Due to technical issues with analysing and interpreting genomic data, it looks like new L1 insertions are easily mistaken for other types of DNA damage, and may be much more widespread than previously thought.

So what are we to make of this discovery?

Finding evidence of widespread jumping genes doesn’t prove that they’re definitely involved in tumour growth, although it certainly looks very suspicious, and there are a lot of questions still to be answered.

For a start, we need to know more about how L1 jumps affect important genes, and whether they’re fuelling tumour growth.

It’s also unclear why these elements go on the move in cancer cells in such numbers: are they the cause of the genetic chaos, or does their mobilisation result from something else going awry as cancer develops for other reasons?

Looking more widely, and given that it seems to be particularly tricky to correctly identify new L1 jumps in DNA sequencing data, it’s still relatively unknown how widespread they are across many other types of cancer.

Finding the answers to these questions is vital. Rates of oesophageal cancer are rising, particularly among men, yet survival remains generally poor. As part of our research strategy we’ve highlighted the urgent need to change the outlook for people diagnosed with the disease, through research into understanding its origins, earlier diagnosis and more effective treatments.

By understanding what’s going on as L1 elements hopscotch their way across the genome, we’ll gain more insight into the genetic chaos that drives oesophageal cancer.

In turn, this could lead to new ideas for better ways to diagnose, treat and monitor the disease in future. Let’s jump to it.

Reference:

Paterson et al. Mobile element insertions are frequent in oesophageal adenocarcinomas and can mislead paired end sequencing analysis. BioMed Central Genomics. DOI: 10.1186/s12864-015-1685-z.

[It is sinking in deeper and deeper that Nonlinear Dynamics (Chaos & Fractals) are lurking behind cancer. The Old School is becoming brutally oversimplified with "genes" and "Junk DNA". Hundreds of millions are dying of the most dreadful illness ("the disease of the genome", a.k.a. "cancer") - and some may still hide in the denial that the sole cause of cancer is a handful of "genes" ("oncogenes") going wild. While the linked science article does not dip into the mathematics, their cited Fig. 2. shows an obviously "non-random" pattern - look at most of the evolving fractals. Andras _at_Pellionisz_dot_com]


Why you should share your genetic profile [the Noble Academic Dream and the Harsh Business Climate]

Fifteen years ago, a scrappy team of computer geeks at UC Santa Cruz assembled the first complete draft of the human genome from DNA data generated by a global consortium, giving humanity its first glimpse of our genetic heritage.

And then we did something the private corporation competing with us never would have done: We posted the draft on the Web, ensuring that our genetic blueprint would be free and accessible to everyone, forever.

This opened the door to global research and countless scientific breakthroughs that are transforming medicine. Today, every major medical center offers DNA sequencing tests; we can sequence anybody’s genome for about $1,000.

This is a game-changer. The era of precision medicine is upon us.

Consider the 21st century war on cancer: When a patient is diagnosed with cancer, her doctor compares her tumor’s genome to those in an enormous worldwide network of shared genomes, seeking matches that point to the best treatment strategies and the best outcomes.

This is not fantasy. UC Santa Cruz already manages more than 1 quadrillion bytes of cancer-genomics data — the world’s largest collection of genomic data from the most diverse collection of cancerous tumors ever assembled for general scientific use.

A multinational consortium of children’s hospitals is enabling members to compare each child’s cancer genome to this huge set of pediatric and adult cancer genomes. This is how we will decode cancer. It’s how we will tailor treatment to individual patients. It will save lives.

But this will come to pass only if we work together.

Competition among medical centers can make them reluctant to share data with each other. There are ethical and privacy considerations for patients. We need to overcome these challenges, build a secure network of data-sharing, and usher in the long-sought era of precision medicine.

Patients can help by asking their doctors and medical centers to share their genetic profiles — securely — with researchers around the world through the Global Alliance for Genomics and Health. The alliance has mobilized hundreds of institutions worldwide to build the definitive open-source Internet protocols for sharing genomic data. Our goal is to speed doctors’ ability to tailor treatments to the genetic profiles of individual patients.

The power of this data network will be only as strong as it is vast. The bigger the pool of samples, the greater the likelihood of finding molecular matches that benefit patients, as well as patterns that shed new light on how normal cells become malignant. Genomics can help us decode diseases from asthma and arthritis to Parkinson’s and schizophColorenia.

Fifteen years ago, when we released that first sequence of our genome, humanity’s genetic signature became open-source. I remember the feelings of awe and trepidation I experienced that day, realizing that we were passing through a portal through which we could never return, uncertain exactly what it would mean for humanity.

Today, the meaning is clear. We are finally realizing the promise of genomics-driven precision medicine.

David Haussler is professor of biomolecular engineering, director of the Genomics Institute at UC Santa Cruz, and a co-founder of the Global Alliance for Genomics and Health.

[David Haussler, a longtime colleague and friend, is one of the towering Giants of Genome Informatics . His uniquely profuse school at the Genomics Institute at UC Santa Cruz, of turning out perhaps the largest number of brilliant Ph.D. graduates (in Stanford, throughout Academia and some even in business) put the University of California at Santa Cruz (and the parent organization of The University of California System) at a special juncture of history.

There is no doubt that his Academic Dream ("let's all pitch in for free") is the Noblest goal of a High Road. We all believe in dreams and wish good luck to Dave. Incidentally, the dream of Al Gore to create a "free for all Information Superhighway" (The Internet) was based on similarly Noble Aspiration. I took part (at that time, at NASA Ames Research Center in Silicon Valley) of putting together a "Blue Book" that outlined the future of Internet - on a $2 Bn government budget. It was Bill Clinton, who released the Internet (originally a shoe-string project of defense information network, capable of surviving even if the Soviets would blow out major information-hubs like NYC, D.C., Chicago, or even Colorado Springs). The defense-backbone of Internet is now stronger than ever - but President Clinton's decision to release massive development to Private Industry exploded the $2 Bn National Budget to levels, when a few days ago valuation of just a single company (Google) catapulted, on a single day, by $17 Bn.

With "one thousand dollar sequencing, a million dollar interpretation", it is easy to do the math for the budget necessary to build a "1 million human DNA fully sequenced" for a genome-based "precision medicine".

Since the Private Sector (led by Craig Venter) announced such a plan even before the US Government floated a sentence in the "State of the Union", we are talking about a $2 Trillion ticket (one from Government, one from Private Industry, predictably with not much overlap). This makes sense, since the US Health Care System ("Sick Care System", rather, branded by Francis Collins) is in the yearly $2 Trillion range. To effectively change it, one would require commensurate funds. The promise to ask $200 Million from Congress, even if granted, would amount to 1% of the needed expenditure.

The University of California System, on a Sacramento budget and with severe restrictions in its Charter, may be unlikely to catch the tiger of Global Private Industry by the tail. One might argue that even the entire budget of NIH (a yearly $30 Bn) might be unrealistic for this colossal task. On the other hand, in Private Sector, Apple, Google, Microsoft valuation combined is already above the $1 Trillion range - and it is predicted that Google or Apple might reach that valuation alone.

Granted, e.g. Google spends on "Google Genomics" presently "on the side" - at best. However, they have already clinched a business model (see in this column) that for-profit-users of Google Genomics (such as Big Pharma that can easily afford), are already obligated to pay license fees to the Broad Institute for their proprietary software toolkit. (It infuses massive domain expertise into the art & technology of "handling data-deluge" of any kind by Google). It is interesting to note, that the amount of genomic data at Google presently amounts to a mere 1/3 of "YouTube". As I predicted in my 2008 Google Tech Talk YouTube, the problem is NOT "Information Technology", but "Information Theory".

It is predicted herein, that massive amounts will be paid for people with cancer for their extremely precious "genomic data along with medical profile". Individuals might never get a penny of it directly, just like you use Google for "free" (you pay when you buy as a result of a "click-through"). Existing business model and cash-flow is worked out through the monstrous advertising business & coupled "recommendation engines". With cancer, when you will opt for genome-based therapy, you will get a "cut" (virtual payment) if you "freely donate" your genomic data and health profile. Surely, while arriving at a deal with the advertising business is fairly straightforward, forging viable business models with the colossal Health Care System is a bit more advanced. However, it already started, see in this column that Google could even work out a business deal with the non-profit Broad Institute.

Working with Intellectual Property holdings is a breeze - Andras_at_Pellionisz_dot_com ].


Why James Watson says the ‘war on cancer’ is fighting the wrong enemy

Andrew Porterfield | May 26, 2015 | Genetic Literacy Project

Since President Richard Nixon asked Congress for $100 million to declare a “war on cancer” in 1971, hundreds of billions of dollars worldwide have been dedicated to research unlocking the mystery of the various forms of the disease, and how to treat it. But some suggest the war may be being fought on the wrong front.

To be sure, our understanding of genetics, cellular growth and cancers has grown exponentially. We know how cancer can be linked to mutations of genes that either encourage abnormal cell growth, or wreck the internal system of checks and balances that normally stymie that growth. We have narrowed the number of those genes down to several hundred. And, we know about genes that can halt abnormal development. We’re inserting them into cancerous cells in trials. Perhaps most significantly, we’re at a stage in which cancer specialists prefer to refer to cancers by genetic makeup, instead of by the traditional organ of first appearance.

But for many cancers, none of this is working. To be sure, overall cancer death rates have decreased, by 1.8 percent a year for men, and 1.4 percent a year for women in recent decades. But death rates from some cancers have remained stubbornly constant, while others have risen. Additionally, the National Cancer Institute estimates that the number of people with cancer will increase from 14 million to 22 million over the next 20 years.

The thing about war is: if you’re fighting and the enemy’s numbers are increasing (or at least not dropping very much), victory probably isn’t near.

A spreading, migrating issue

One issue might be the fact that primary tumors—cancers that first appear in the body, and are recognized by that location, be it the liver, lung, brain or colon—aren’t the reason most people die from cancer. Most people die because of cancer cells that break off from primary tumors, and settle in other parts of the body. This process of metastasis is responsible for 90 percent of cancer deaths. However, only 5 percent of European government cancer research funds, and 2 percent of U.S. cancer research funds, are earmarked for metastasis research.

So, for as much as we understand the genetics of primary, initial tumors, we know far less about the cancers that truly kill. And to James Watson–the molecular biologist, geneticist and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953–that’s a central problem with cancer research. In a recent “manifesto” published in Open Biology, Watson asked for another war:

The now much-touted genome-based personal cancer therapies may turn out to be much less important tools for future medicine than the newspapers of today lead us to hope. Sending more government cancer monies towards innovative, anti-metastatic drug development to appropriate high-quality academic institutions would better use National Cancer Institute’s (NCI) monies than the large sums spent now testing drugs for which we have little hope of true breakthroughs. The biggest obstacle today to moving forward effectively towards a true war against cancer may, in fact, come from the inherently conservative nature of today’s cancer research establishments.

Watson, who shared a Nobel Prize with Francis Crick and Maurice Wilkins for discovering the structure of DNA, is well known for his pronouncements. which often have been labeled immodest, insulting and worse. But in this case, he also may be right.

What do other scientists say?

Mark Ptashne, a cancer researcher at Memorial Sloan Kettering Cancer Center in New York, agrees that money is being misspent on the wrong kind of drugs. Cancer cells are smart enough to work around the drugs. And cancer cells that have migrated and reformed (metastasized) may be quite different from their original parent tumor cells. Still other cancers have metastasized, but from where is unknown. Finally, in the brain, most adult tumors there are metastatic. This all means that even if a treatment is effective for a primary cancer, it likely won’t be for a metastatic one.

Metastasis is extremely complicated. Very slowly, institutions are starting to look more closely at metastasis, and provide more research funding for it. But, as the Memorial Sloan Kettering Cancer Center warned, it could take a long time before treatments arise. But it’s probably going to take more than 2-5 percent of government cancer research funding.

Dig in for a long war.

Andrew Porterfield is a writer, editor and communications consultant for academic institutions, companies and non-profits in the life sciences. He is based in Camarillo, California. Follow @AMPorterfield on Twitter.

--

[Jim Watson is on record of the Royal Society at least still 2013: "Still dominating NCI's big science budget is The Cancer Genome Atlas (TCGA) project, which by its very nature finds only cancer cell drivers as opposed to vulnerabilities (synthetic lethals). While I initially supported TCGA getting big monies, I no longer do so. Further 100 million dollar annual injections so spent are not likely to produce the truly breakthrough drugs that we now so desperately need." - Andras_at_Pellionisz_dot_com ]


National Cancer Institute: Fractal Geometry at Critical Juncture of Cancer Research

[Dr. Simon Rosenfeld at National Cancer Institute is on record with an open access text (likely to be removed) that is, however, already mirrored here), as well as a compromised truncated pdf paper reflecting on a "Critical Junction" of Cancer Research. Excerpts below from the open access text (comprised by the running title of the review article "Fractal Geometry and Nonlinear Analysis in Medicine and Biology) demonstrate another endorsement of the FractoGene approach. FractoGene papers are linked here to the free full pdf files of the original peer-reviewed articles cited in the open access text. Note that 40 of the 50 references point to "fractal". Andras_at_Pellionisz_dot_com]

Conclusion

Complex hierarchy of perfectly organized entities is a hallmark of biological systems. Attempts to understand why's and how's of this organization lead inquiring minds to various levels of abstraction and depths of interpretation. In this paper, we have attempted to convey the notion that there exists a set of comparatively simple and universal laws of nonlinear dynamics which shape the entire biological edifice as well as all of its compartments. These laws are equally applicable to individual cells, as well as to biochemical networks within the cells, as well as to the societies of cells, as well as to the societies other than the societies of cells, as well as to the populations of individual organisms. These laws are blind, automatic, and universal; they do not require existence of a supervisory authority, system-wide informational infrastructure or some sort of premeditated intelligent design. In large populations of individuals interacting only by stimulus-response rules, these laws generate a large variety of emergent phenomena with self-organization and swarm intelligence being their natural manifestations.

References

Mandelbrot B (1983) The Fractal Geometry of Nature. Freeman, San Francisco.

Leonardo da Vinci. Trattato della Pittura. ROMA MDCCCXVII. Nella Stamperia DE ROMANIS. A cura di Guglielmo Manzi Bibliotecario della Libreria Barberina.

Mandelbrot B (1977) Fractals, M.B. Form, Chance and Dimension. W.H. Freeman & Company, San Francisco.

Belaubre G (2006) L’irruption des Géométries Fractales dans les Sciences.Editions Académie Européenne Interdisciplinaire des Sciences (AEIS), Paris.

Loud AV (1968) A quantitative stereological description of the ultrastructure of normal rat liver parenchymal cells. J Cell Biol 37: 27-46. [Crossref]

Weibel ER, Stäubli W, Gnägi HR, Hess FA (1969) Correlated morphometric and biochemical studies on the liver cell. I. Morphometric model, stereologic methods, and normal morphometric data for rat liver. J Cell Biol 42: 68-91. [Crossref]

Mandelbrot B (1967) How long is the coast of britain? Statistical self-similarity and fractional dimension. Science 156: 636-638. [Crossref]

Paumgartner D, Losa G, Weibel ER (1981) Resolution effect on the stereological estimation of surface and volume and its interpretation in terms of fractal dimensions. J Microsc 121: 51-63. [Crossref]

Gehr P, Bachofen M, Weibel ER (1978) The normal human lung: ultrastructure and morphometric estimation of diffusion capacity. Respir Physiol 32: 121-140. [Crossref]

Rigaut JP (1984) An empirical formulation relating boundary length to resolution in specimens showing ‘‘non-ideally fractal’’ dimensions. J Microsc 13: 41–54.

Rigaut JP (1989) Fractals in Biological Image Analysis and Vision. In: Losa GA, Merlini D (Eds) Gli Oggetti Frattali in Astrofisica, Biologia, Fisica e Matematica, Edizioni Cerfim, Locarno, pp. 111–145.

Nonnenmacher TF, Baumann G, Barth A, Losa GA (1994) Digital image analysis of self-similar cell profiles. Int J Biomed Comput 37: 131-138. [Crossref]

Landini G, Rigaut JP (1997) A method for estimating the dimension of asymptotic fractal sets. Bioimaging 5: 65–70.

Dollinger JW, Metzler R, Nonnenmacher TF (1998) Bi-asymptotic fractals: fractals between lower and upper bounds. J Phys A Math Gen 31: 3839–3847.

Bizzarri M, Pasqualato A, Cucina A, Pasta V (2013) Physical forces and non linear dynamics mould fractal cell shape. Quantitative Morphological parameters and cell phenotype. Histol Histopathol 28: 155-174.

Losa GA, Nonnenmacher TF (1996) Self-similarity and fractal irregularity in pathologic tissues. Mod Pathol 9: 174-182. [Crossref]

Weibel ER (1991) Fractal geometry: a design principle for living organisms. Am J Physiol 261: L361-369. [Crossref]

Losa GA (2012) Fractals in Biology and Medicine. In: Meyers R (Ed.), Encyclopedia of Molecular Cell Biology and Molecular Medicine, Wiley-VCH Verlag, Berlin.

Santoro R, Marinelli F, Turchetti G, et al. (2002) Fractal analysis of chromatin during apoptosis. In: Losa GA, Merlini D, Nonnenmacher TF, Weibel ER (Eds.), Fractals in Biology and Medicine. Basel, Switzerland. Birkhäuser Press 3: 220-225.

Bianciardi G, Miracco C, Santi MD et al. (2002) Fractal dimension of lymphocytic nuclear membrane in Mycosis fungoides and chronic dermatitis. In: Losa GA, Merlini D, Nonnenmacher TF, Weibel ER (Eds.), Fractals in Biology and Medicine. Basel, Switzerland, Birkhäuser Press.

Losa GA, Baumann G, Nonnenmacher TF (1992) Fractal dimension of pericellular membranes in human lymphocytes and lymphoblastic leukemia cells. Pathol Res Pract 188: 680-686. [Crossref]

Mashiah A, Wolach O, Sandbank J, Uzie IO, Raanani P, et al. (2008) Lymphoma and leukemia cells possess fractal dimensions that correlate with their interpretation in terms of fractal biological features. Acta Haematol 119,142–150. [Crossref]

Brú A, Albertos S, Luis Subiza J, García-Asenjo JL, Brú I (2003) The universal dynamics of tumor growth. Biophys J 85: 2948-2961. [Crossref]

Baish JW, Jain RK (2000) Fractals and cancer. Cancer Res 60: 3683–3688.

Tambasco M, Magliocco AM (2008) Relationship between tumor grade and computed architectural complexity in breast cancer specimens. Hum Pathol 39: 740-746. [Crossref]

Sharifi-Salamatian V, Pesquet-Popescu B, Simony-Lafontaine J, Rigaut JP (2004) Index for spatial heterogeneity in breast cancer. J Microsc 216: 110-122. [Crossref]

Losa GA, Graber R, Baumann G, Nonnenmacher TF (1998) Steroid hormones modify nuclear heterochromatin structure and plasma membrane enzyme of MCF-7 Cells. A combined fractal, electron microscopical and enzymatic analysis. Eur J Histochem 42: 1-9. [Crossref]

Landini G, Hirayama Y, Li TJ, Kitano M (2000) Increased fractal complexity of the epithelial-connective tissue interface in the tongue of 4NQO-treated rats. Pathol Res Pract 196: 251-258. [Crossref]

Roy HK, Iversen P, Hart J, Liu Y, Koetsier JL, et al. (2004) Down-regulation of SNAIL suppresses MIN mouse tumorigenesis: modulation of apoptosis, proliferation, and fractal dimension. Mol Cancer Ther 3: 1159-1165. [Crossref]

Losa GA, De Vico G, Cataldi M, et al. (2009) Contribution of connective and epithelial tissue components to the morphologic organization of canine trichoblastoma. Connect Tissue Res 50: 28-29.

Li H, Giger ML, Olopade OI, Lan L (2007) Fractal analysis of mammographic parenchymal patterns in breast cancer risk assessment. Acad Radiol 14: 513-521. [Crossref]

Rangayyan RM, Nguyen TM (2007) Fractal analysis of contours of breast masses in mammograms. J Digit Imaging 20: 223-237. [Crossref]

De Felipe J (2011) The evolution of the brain, the human nature of cortical circuits, and intellectual creativity. Front Neuroanat 5: 1-16. [Crossref]

King RD, Brown B, Hwang M, Jeon T, George AT; Alzheimer's Disease Neuroimaging Initiative (2010) Fractal dimension analysis of the cortical ribbon in mild Alzheimer's disease. Neuroimage 53: 471-479. [Crossref]

Werner G (2010) Fractals in the nervous system: conceptual implications for theoretical neuroscience. Front Physiol 1: 15. [Crossref]

Losa GA (2014) On the Fractal Design in Human Brain and Nervous Tissue. Applied Mathematics 5: 1725-1732.

Smith TG Jr, Marks WB, Lange GD, Sheriff WH Jr, Neale EA (1989) A fractal analysis of cell images. J Neurosci Methods 27: 173-180. [Crossref]

Smith TG Jr, Bejar TN (1994) Comparative fractal analysis of cultured glia derived from optic nerve and brain demonstrated different rates of morphological differentiation. Brain Res 634: 181–190.

Smith TG Jr, Lange GD, Marks WB (1996) Fractal methods and results in cellular morphology--dimensions, lacunarity and multifractals. J Neurosci Methods 69: 123-136. [Crossref]

Smith TG (1994) A Fractal Analysis of Morphological Differentiation of Spinal Cord Neurons in Cell Culture. In: Losa et al., (Eds.), Fractals in Biology and Medicine, Birkhäuser Press, Basel, vol.1.

Milosevic NT, Ristanovic D (2006) Fractality of dendritic arborization of spinal cord neurons. Neurosci Lett 396: 172-176. [Crossref]

Milosevic NT, Ristanovic D, Jelinek HF, Rajkovic K (2009) Quantitative analysis of dendritic morphology of the alpha and delta retinal ganglions cells in the rat: a cell classification study. J Theor Biol 259: 142-150. [Crossref]

Ristanovic D, Stefanovic BD, Milosevic NT, Grgurevic M, Stankovic JB (2006) Mathematical modelling and computational analysis of neuronal cell images: application to dendritic arborization of Golgi-impregnated neurons in dorsal horns of the rat spinal cord. Neurocomputing 69: 403–423.

Jelinek HF, Milosevic NT, Ristanovich D (2008) Fractal dimension as a tool for classification of rat retinal ganglion cells. Biol Forum 101: 146-150.

Bernard F, Bossu JL, Gaillard S (2001) Identification of living oligodendrocyte developmental stages by fractal analysis of cell morphology. J Neurosci Res 65: 439-445. [Crossref]

Pellionisz A, Roy GR, Pellionisz PA, Perez JC (2013) Recursive genome function of the cerebellum: geometric unification of neuroscience and genomics. Berlin: In: Manto M, Gruol DL, Schmahmann JD, Koibuchi N and Rossi F (Eds.), Springer Verlag, “Handbook of the Cerebellum and Cerebellar Disorders”. 1381-1423.

Pellionisz AJ (2008) The principle of recursive genome function. Cerebellum 7: 348-359. [Crossref]

Di Ieva A, Grizzi F, Jelinek H, Pellionisz AJ, Losa GA (2015) Fractals in the Neurosciences, Part I: General Principles and Basic Neurosciences. The Neuroscientist XX(X) 1–15.

Pellionisz A (1989) Neural geometry: towards a fractal model of neurons. Cambridge: Cambridge University Press.

Agnati LF, Guidolin D, Carone C, Dam M, Genedani S, et al. (2008) Understanding neuronal molecular networks builds on neuronal cellular network architecture. Brain Res Rev 58: 379–99. [Crossref]


Apple may soon collect your DNA as part of a new ResearchKit program

By Andre Revilla — May 7, 2015

Building a database of the human genome, mostly in an effort to study it, is nothing new. Since we first gained the ability to study DNA, scientists have been keen to study as many samples as possible, in an effort to discover more about disease in the human body, and degenerative disorders such as Parkinson’s disease. Now Apple is joining groups ranging from Google to the U.S. government in expressing an interest to collect a library of DNA samples.

Apple will be teaming up with scientists to collect DNA as part of its ResearchKit program, which launched in March. The program would collect consumers health information through a secure portal, with the added opportunity for users with certain conditions to take part in a number of clinical studies. According to the MIT Technology Review’s report, Apple has two currently planned studies, one at the University of California in San Francisco, and the other with Mount Sinai Hospital in New York.

Related: Apple offering medical trials through Research Kit

Users would participate by spitting and returning the completed kit to an Apple-approved laboratory. The report reads, “The data would be maintained by scientists in a computing cloud, but certain findings could appear directly on consumers’ iPhones as well.” Integrating apps that partner with DNA collection on a platform as popular as iOS would place Apple in a good position to lead the charge in a new realm of genetic databasing.

“Nudging iPhone owners to submit DNA samples to researchers would thrust Apple’s devices into the center of a widening battle for genetic information,” the MIT review states.

The studies are aimed at investigating 100 or so “medically important disease genes.” The future of the connected world is fascinating, and as the review points out, could see us swiping our genetic information at pharmacies to receive information on the drugs we’re picking up. Apple has not given a comment on the report.

Building a database of the human genome, mostly in an effort to study it, is nothing new. Since we first gained the ability to study DNA, scientists have been keen to study as many samples as possible, in an effort to discover more about disease in the human body, and degenerative disorders such as Parkinson’s disease. Now Apple is joining groups ranging from Google to the U.S. government in expressing an interest to collect a library of DNA samples.

Apple will be teaming up with scientists to collect DNA as part of its ResearchKit program, which launched in March. The program would collect consumers health information through a secure portal, with the added opportunity for users with certain conditions to take part in a number of clinical studies. According to the MIT Technology Review’s report, Apple has two currently planned studies, one at the University of California in San Francisco, and the other with Mount Sinai Hospital in New York.

Related: Apple offering medical trials through Research Kit

Users would participate by spitting and returning the completed kit to an Apple-approved laboratory. The report reads, “The data would be maintained by scientists in a computing cloud, but certain findings could appear directly on consumers’ iPhones as well.” Integrating apps that partner with DNA collection on a platform as popular as iOS would place Apple in a good position to lead the charge in a new realm of genetic databasing.

“Nudging iPhone owners to submit DNA samples to researchers would thrust Apple’s devices into the center of a widening battle for genetic information,” the MIT review states.

The studies are aimed at investigating 100 or so “medically important disease genes.” The future of the connected world is fascinating, and as the review points out, could see us swiping our genetic information at pharmacies to receive information on the drugs we’re picking up. Apple has not given a comment on the report.

[There is a veritable "feeding frenzy" around "DNA Data Banks", "DNA API", as well as the inevitable trend that for actual user-friendly applications of genomic data high-powered mobile devices will be used (e.g. the new iPhone with up to 256 GIGAByte of flash memory!!!). There are several contenders in a horse-race for the above highly lucrative goals, separately and especially if it is possible together. Google Genomics publishes about their "DNA API" (they are not telling details). There is hardly any question that Google is super-expert in such API from a computing viewpoint. However, a most logical company eminently suitable for this cardinal role could be Illumina - the strongest USA data-source of genomic information. Illumina, however, with its presently known priorities may not have this crucial item on its agenda & schedule. It may be regrettable, since such asset could very significantly boost the valuation of Illumina - making it more resistant to any further "hostile take-over attempt by Roche/Genentech". (Genentech would also be very suitable for the above role(s), but as a fully owned subsidiary of the genomically leading Big Pharma (Roche) it seems unlikely that Roche is going to push this agenda).

This leaves a most interesting and very suitable company, whenever Google Genomics will trigger "Apple Genomics". (Somewhat unlikely, since Apple makes most cardinal business decisions super-secretly, though the visibly half/ready Apple HQ2 makes the world wonder how the cash-mega-rich Apple is going to expand its horizon.). What are the pro-s and con-s of Apple launching an "Apple Genomics"?

No company in the world could possibly beat Apple in "user-friendly design of advanced computer systems". The new line of "wearables" (iWatch) already compels Apple to massively expand its API to accomodate the myriads of sensors, detectors and personal data collection and storage. This is a huge plus, as well as Apple could emerge (after some rather feeble forays into Old School Genomics many years ago) as the undisputed hardware/software integrator in the historical R&D "explosion" of New School Genomics. A further positive factor is that Apple and Illumina are already on record of attempting trying out this new field. (If Illumina would ever submit to a M/A, imho a merger of Illumina with Apple might make more sense than Illumina under Roche).

Some factors lessen the likelihood of a major business decision. One is that "Calico" already drains resources - though the pursuit of "ethernal youth" and "practical user-friendly applications of today's genomic data" represent no real internal competition.

Perhaps the the most serious challenge is that Apple is not famous for the cross-disciplinary domain-expertise of genomics AND informatics. This challenge, however, can be very easily and quickly overcome in the highly incestuous Silicon Valley. andras_at_pellionisz_dot_com ]


Sequencing the genome creates so much data we don’t know what to do with it

The Washington Post

Robert Gebelhoff, July 7

Get ready for some incomprehensibly big numbers. [Not really - in my 2008 Google Tech YouTube, see slide below, I pointed out at the proverbial 7 years ago, that the "Genome information exploded over 25 orders of magnitude", but a Googel is defined by 100 zeros. Thus, the IT (Information Technology) is definitely ready (though we are talking about billions of dollars). The problem was very clear even at my 2008 YouTube. "Information Theory" was not ready (some still don't have it), to interpret even a single full human DNA. Note, that the entire DNA of both Dr. Jim Watson and Dr. Craig Venter had been sitting on the shelves (hard drives, rather) for years - but without software-enabling algorithmic approaches (such as FractoGene) "crunching A,C,T,G-s amounted to billions of dollars wasted". - Andras_at_Pellionisz_dot_com]



Scientists are predicting that genomics — the field of sequencing human DNA — will soon take the lead as the biggest data beast in the world, eventually creating more digital information than astronomy, particle physics and even popular Internet sites like YouTube. [Okay, take "particle physics" as probably the best example. Would anyone waste billions of dollars in building a super-collider (generating myriads of trajectories) - before Quantum Theory was developed? That effort needed the entire Coppenhagen Group working busily for many decades to build as an entirely new chapter in physics, mathematics and even in philosophy?? - Andras_at_Pellionisz_dot_com]

The claim, published Tuesday in a PLOS Biology study, is a testament to the awesome complexity of the human genome, but it also illustrates a pressing challenge for the 15-year-old field. As genomics expands at an exponential rate, finding the digital space to store and manage all of the data is a major hurdle for the industry.

[The rumors were true: Scientists edited the genomes of human embryos for the first time]

Michael Schatz, co-author of the study and a professor at Cold Spring Harbor Laboratory in New York, called the data challenge one of the most important questions facing biology today.

"Scientists are really shocked at how far genomics has come," Schatz said. "Big data scientists in astronomy and particle physics thought genomics had a trivial amount of data. But we're catching up and probably going to surpass them."

[Worm spends four years burrowing through man’s brain (but at least we’ve sequenced its genome)]

To give some idea as to the amount of data we're talking about, consider YouTube, which generates the most data of any source per year — around 100 petabytes, according to the study. A petabyte is a quadrillion (that's 10 followed by 15 zeroes) bytes, or about 1,000 times the average storage on a personal computer.

Right now, all of the human data generated through genomics — including around 250,000 sequences — takes up about a fourth of the size of YouTube's yearly data production.[We do not have major problems with YouTube, do we? It even generates money. Do not get scared of what Information Technology can do (think of meteorology, war games, above mentioned nuclear physics, financial data and calculations). Get scared of the scarsity of software-enabling Information Theory to interpret a single genome! - Andras_at_Pellionisz_dot_com]. If the data were combined with all the extra information that comes with sequencing genomes and recorded on typical 4-gigabyte DVDs, Schatz said the result would be a stack about half a mile high.

[If you could print out the whole Internet, how many pages would it be?]

But the field is just getting started. Scientists are expecting as many as 1 billion people to have their genomes sequenced by 2025. The amount of data being produced in genomics daily is doubling every seven months, so within the next decade, genomics is looking at generating somewhere between 2 and 40 exabytes a year.

A exabyte — just try to wrap your mind around this — is 1,000 petabytes, or about 1 million times the amount that can be stored on a home computer. In other words, that aforementioned stack of DVDs would easily start reaching into space.

[The triumph of genomic medicine is just beginning]

The study gives a good illustration of how the microscopic details of human genetics rival the complexity of the far-reaching science of the universe. The mountain of data used to analyze human DNA is so large that Schatz jokes people will eventually have to substitute the term "astronomical" with a more appropriate word: "genomical."

"With all of this information, something new is going to emerge," he said. "It might show patterns of how mutations affect different diseases."

IBM's Watson Genomics initiative, for example, is crunching data on the entire genomes of tumors, with the hope of generating personalized medicine for cancer patients.

[Personalized cancer vaccines have already helped treat three patients]

At some point, scientists might be able to save space by not storing sequences in full, similar to the way data is managed in particle physics, where information is read and filtered while it is generated. But at this point, the study says, such data cropping isn't as practical because it's hard to figure out what future data physicians will need for their research — especially when looking at broader human populations.

Right now, most genome research teams store their data through on-site hard drive infrastructure. The New York Genome Center, for example, is generating somewhere between 10 to 3o terabytes of data a day and storing it in an on-site system. They move old data they don't regularly use to cheaper and slower storage.

[The ultimate irony of mindless data-hoarding is likely to be, that the information will be most efficiently stored in DNA. Full circle, spending billions, but accomplishing what exactly? We know since Thomas Kuhn that "knowledge never automatically transpires into understanding - Andras_at_Pellionisz_dot_com]

"At this point, we're continuously expanding file storage," said Toby Bloom, deputy scientific director at the center. "The biggest hurdle is keeping track of what we have and finding what we need."

Organizations like Bloom's are eyeing the possibility of moving the data to cloud storage, but she said that's currently not as cost effective as expanding their physical storage infrastructure.

But size is not the only problem the field faces. Biological data is being collected from many places and in many different formats. Unlike Internet data, which is formatted relatively uniformly, the diverse sets of genomic data makes it difficult for people to use them across datasets, the study says.

Companies like Amazon and Google are developing the infrastructure to put genomic data on public clouds, which would be especially helpful for smaller centers with limited IT staff, but could also help foster collaboration.

Google recently announced a partnership with the Broad Institute of MIT and Harvard aimed at providing its cloud services for scientists combined with a toolkit developed by the institute that can be used to analyze the data. The concept is to put a bunch of the world's genomic data on Google's servers, where scientists from all over can collaborate on a single platform.

"It's extremely likely to see (the cloud model) going forward," Schatz said. "It just makes more sense.""[Do not forget that according to Google, for-profit users, like Big Pharma, must pay license fees to Broad Institute, a Charitable Organization :-) Andras_at_Pellionisz_dot_com ]


The living realm depicted by the fractal geometry (endorsement of FractoGene by Gabriele A. Losa)

[Excerpts] In some recent reports, rather exciting, it has been argued that there is a trend towards a “Unified Fractal Model of the Brain” [46]. These authors suggested that the amount of information necessary to build just a tiny fraction of the human body, that is, just the cerebellum of the nervous system, was a task for which 1.3% of the information that the genome [in the form of "genes", insert by AJP] could contain was totally insufficient. “Fractal genome grows fractal organism; yielding the utility that fractality, e.g. self-similar repetitions of the genome can be used for statistical diagnosis, while the resulting fractality of growth, e.g. cancer, is probabilistically correlated with prognosis, up to cure” [47].

The brain is now accepted as one of nature’s complete networks [48], while the hierarchical organization of the brain, seen at multiple scales from genes to molecular micronetworks and macronetworks organized in building neurons, has a fractal structure as well [49] with various modules that are interconnected in small-world topology [50]. The theoretical significance is that the fractality found in DNA and organisms, for a long time “apparently unrelated,” was put into a “cause and effect” relationship by the principle of recursive genome function [47].


[46] Pellionisz A, Roy GR, Pellionisz PA, Perez JC (2013) Recursive genome function of the cerebellum: geometric unification of neuroscience and genomics. Berlin: In: Manto M, Gruol DL, Schmahmann JD, Koibuchi N and Rossi F (Eds.), Springer Verlag, “Handbook of the Cerebellum and Cerebellar Disorders”. 1381-1423.

[47] Pellionisz AJ (2008) The principle of recursive genome function. Cerebellum 7: 348-359.

[49] Di Ieva A, Grizzi F, Jelinek H, Pellionisz AJ, Losa GA (2015) Fractals in the Neurosciences, Part I: General Principles and Basic Neurosciences. The Neuroscientist 20(4) 403-417.

[50] Pellionisz A (1989) Neural geometry: towards a fractal model of neurons. Cambridge: Cambridge University Press.

[In the recent series of top-level endorsements of FractoGene ("Fractal Genome Governs Growth of Fractal Organisms"), Gabriele Losa is the most established leader of "fractals in biology and medicine". Dr. Losa organized a series of International Meetings in Switzerland, published in four volumes. Thus, acknowledgement by Dr. Losa that the already rather large field of studying fractality of the DNA or fractality of the organisms, simply overlooked their "cause and effect" relationship reminds us to a saying by Mandelbrot himself "to see things that is everybody is looking at but nobody notices". "FractoGene" could not be published since it reversed BOTH of the cardinal axioms of Old School Genomics (the "Central Dogma" and "Junk DNA" misnomers that Dr. Mattick labeled as "the biggest mistake in the history of molecular biology").

The most striking of my revelation was the utility of my discovery in 2002. My FractoGene discovery also reversed the "utility". In the Old School, the only useful (tiny) parts of the DNA were believed the "genes" (protein-coding segments, amounting to less than 1% in the human, and even with the "genes" the function of "introns" was either entirely denied, or the "non-coding" introns were misrepresented as "spacers" to separate "genes").

My discovery deployed a measurable utility derived from the fact that has always been at the plain sight; that both the DNA and the organisms it governs are "replete with repeats". In a "cause and effect" relationship the statistical correlation of repeats (fractals) of DNA and the organisms it governs yielded precious utility for diagnosis, and the probabilistic predictions of the relationship of fractals yielded prognosis. The "Best Methods" were amply "incorporated by reference" by thousands of pages of literature, both on fractals (e.g. Mandelbrot, Losa, etc), and advanced textbooks of statistical and probabilistic mathematics.Thus, 8,280,641 (now issued after an over-a-decade struggle with the US Patent Office, costing me over a $1 M of personal money) was submitted as a patent to establish priority date (Aug. 1, 2002, because of USPTO delays 8,280,541 is in force till 2026, late March).

Once the regular patent was submitted, peer-reviewed scientific publications ensued. An invited Keynote Lecture in 2003, a peer-reviewed scientific publication (with the late M.J. Simons, 2006), where the latter went on record both with citing the original "heureka" diagram of the FractoGene discovery (Fig. 3.), as well as made theoretical predictions. These theoretical predictions were later verified by independent experimental biologists. Once the most recent CIP to the 2002 filing was done (2007), FractoGene was presented in the peer-reviewed scientific publication "The Principle of Recursive Genome Function" (2008), along with wide public dissemination by Google Tech Talk YouTube (2008).

The Principle of Recursive Genome Function was immediately accepted (2009 in Cold Spring Harbor by an invitation by Prof. George Church, without objection by the participants, most notably by Jim Watson). Two weeks after the Cold Spring Harbor presentation, Eric Lander (and a dozen co-workers) put the Hilbert-fractal on the cover of Science Magazine, amounting to a message of the Science Adviser to Obama "Mr. President, the Genome is Fractal!)

Now, after the proverbial 7-year delay, FractoGene is now endorsed e.g. by the top (double-degree) biomathematician (Eric Schadt), fresh Stanford Nobelist (in multi-scale biology, Michael Levitt) - and now by the top-expert in "fractals in biology & medicine" (Prof. Gabriele A. Losa). While non-profit academics compromise only their literacy of published science by NOT citing any/all of the above references (publicly available for free download). However, as Genome Informatics is becoming intertwined with Intellectual Property (representing occasionally very substantial efforts, e.g. since 1989 against a massive head-wind and documented losses), for-profit users are advised to consider infringements. andras_at_pellionisz_dot_com ]


Google and Broad Institute Team Up to Bring Genomic Analysis to the Cloud

By Christina Farr

JUNE 24, 2015

Google has teamed up with one of the world’s top genomics centers, the Broad Institute of MIT and Harvard, to work on a series of projects it claims will propel biomedical research.

For the first joint project, engineers from both organizations will bring “GATK,” the Broad Institute’s widely-used genome analysis toolkit, onto Google’s cloud service and into the hands of researchers.

“The limiting factor is no longer getting the DNA sequenced,” said Dr. Barry Starr, a Stanford geneticist and a contributor to KQED. “It is now interpreting all of that information in a meaningful way.”

The Broad Institute alone analyzed a massive 200 terabytes of raw data in a single month. In the past decade, the institute has genotyped more than 1.4 million biological samples.

Google isn’t the only tech company vying to use cloud-based technology to store and analyze this massive volume of genetic information. This is a point of competition between Google, IBM, Amazon, and Microsoft. ["Competition" of Google, IBM, Amazon, Microsoft? Does not sound at all like an "Open Source Non-Profit Charity". This horserace will largely depend on the Intellectual Property acquired from New School Genome Informatics - andras_at_pellionisz_dot_com]

But Google is now the only public cloud provider to offer the GATK toolkit as a service. By making the software available in the cloud, researchers can run it on large data-sets without access to local computing — and that frees up both time and resources.

“GATK was already available to researchers and tens of thousands have used the software to analyze their data,” said Starr. “Google adds the power of being able to handle much more data at a time.”

Google Genomics’ product manager Jonathan Bingham told KQED two groups will benefit most from this partnership: small research groups who lack sophisticated computing, and any individual who wants to analyze large genomic data sets without needing to download them.

“Broad Institute has got a tremendous amount of expertise working with large numbers of biological samples and huge volumes of genomic data,” Bingham explained. “Meanwhile, Google has built the infrastructure and tools to process and analyze the data and keep it secure.”

The toolkit will be available for free to nonprofits and academics. Businesses will need to pay to license it from the Broad Institute.

Some genetics experts say this announcement is evidence that the health industry is increasingly willing to embrace cloud computing. In the past, health organizations have been hesitant due to concerns about compliance and security.

“This suggests that the genomics industry has moved beyond the cloud debate,” said Jonathan Hirsch, president and co-founder of Syapse, a Silicon Valley-based company that wants to bring more genomics data into routine clinical use.

“It is OK for researchers and clinicians to do genomics work in the cloud, and trust that cloud provider’s hardware and software.”

In the future, Bingham said there may be opportunities to work on projects to further our genetic understanding of cancer and diabetes.

But for now, he said, the organizations are focused on “general purpose” tools that aren’t specific to a disease and can be used by researchers everywhere.


GlaxoSmithKline, Searching For Hit Drugs, Pours $95M Into DNA 'Dark Matter'

GlaxoSmithKline wants to better understand biology so it can discover more medicines, like every other drugmaker. It also wants to quit wasting money on drug candidates that look promising in the lab, but flop years later when given to hundreds or thousands of real people.

Today, London-based GSK is betting that one way around the problem will come from “the living genome” or what some call the “dark matter” of the genome. These mysterious stretches in the genetic instructions don’t contain genes that provide code for making proteins, but they do appear to provide important controls over what genes do in different cells, in different states of health and disease, and in response to different environments.

Rather than invest in its own labs which have been downsized and re-organized in many ways, GSK is investing $95 million over the next five years, and potentially that amount and more over the subsequent five years, in a new nonprofit research center in Seattle called the Altius Institute for Biomedical Sciences. The institute, which stands for “higher” in Latin, is led by John Stamatoyannopoulos, a professor of genome sciences at the University of Washington. He was a leader in the international ENCODE consortium that published a batch of influential papers in the journal Nature in 2012. The findings elevated the importance of regulatory regions in the genome, and even raised some thoughtful questions about the basic definition of a “gene.”

Stam, as he is known for short, will lead a team of 40-80 molecular biologists, chemists, and computer scientists who will seek to find meaning in regions of the genome that control what they call “the cell’s operating system.” GSK is hoping that this understanding of gene control will help it find better molecular targets for drugs, and help it select the right compounds, right doses, target tissues, and all kinds of other aspects critical in drug R&D.


While the breathtaking advances in faster/cheaper DNA sequencing are making it possible to compare genomes from many people to look for differences that play a role in wellness and disease, Altius isn’t focused so much on the underlying sequences on their own. It will not set up a factory-style efficient genome sequencing center—it will contract that work out to others. The Altius group plans to use, and continuously improve technologies around imaging, chemistry, and computation to extract meaningful information from what Stamatoyannopoulos calls “the living genome.”

“The problem is that the genome only encodes some upstream potentiality, and doesn’t read out what the organism is actually doing,” Stamatoyannopoulos said. “It’s packaged in different ways in different cells…we are reading how the cell is working, and using the genome as a scaffold for all the things it does.” Looking at the downstream manifestation of the genome, in cells, he said, “is going to be much more relevant to clinical medicine.”

Lon Cardon, a senior vice president of alternative discovery and development at GlaxoSmithKline, said he and his team were fascinated by the ENCODE consortium’s series of publications starting in September 2012. “The light went on for us,” he said. Historically, pharma has looked at molecular targets as “static” entities, when the reality is much more fluid and dynamic in different cell and tissue types. Better understanding of what the targets are doing in live cells is essential to fundamental R&D challenges, Cardon said.

At the time of the ENCODE team’s public pronouncements, genomics leader Eric Lander at the Broad Institute likened it to Google GOOGLE Maps. The earlier Human Genome Project, he told The New York Times, “was like getting a picture of Earth from space. It doesn’t tell you where the roads are, it doesn’t tell you what traffic is like at what time of the day, it doesn’t tell you where the good restaurants are, or the hospitals or the cities or the rivers.” He called ENCODE a “stunning resource.”

The scientific consortium has continued to march ahead the past several years, but opinions are mixed on whether regulatory regions of the genome are ready for prime time in drug discovery.

“The maps being created from these efforts are absolutely helping lock into cell specific regulatory networks that when combined with methylation data and eQTL [expression quantitative trait loci] data can be very powerful in tuning you into causal regulators that are important for disease,” said Eric Schadt, the director of the Icahn Institute for Genomics and Multiscale Biology in New York.

David Grainger, a partner at Index Ventures in London, said, “John Stam clearly has a record of doing exciting stuff, and I’m sure he will do so again in Altius. Whether any of that will translate into value for a drug developer, only time will tell. Genomics and the control of gene expression would not necessarily have been an area I would have chosen for what is, in effect, company-funded blue skies research. But I look forward to them proving me wrong.”

GSK, like its industry peers, has been experimenting not just with different scientific approaches to discovery, but with various models for financing creative, motivated teams outside of its own walls. It has a corporate venture capital fund (SR One) that invests in biotech startups, a tight relationship with a venture firm (Avalon Ventures) that builds startups it might buy, and it tried (and closed) a number of internal centers for excellence. The idea of a big drug company putting big resources behind a semi-independent nonprofit institute isn’t exactly new—Merck & Co. did something similar in 2012 when it enlisted Peter Schultz to run the California Institute for Biomedical Research in San Diego.

In the past, pharma companies might have just written a check to sponsor research at an academic center like the University of Washington, sit back, and hope for good results to flow back to the company. But those arrangements haven’t borne much fruit. GSK could have just acquired as much of the intellectual property and technology as it could, and brought it in-house, but it was afraid that it might slow things down in a fast-moving field, Cardon said. In all likelihood, it will be easier to recruit the people it wants into a new organization with startup-like focus and urgency. Speed is of the essence in a field going through exponential advances in technology. “We want to stay ahead of that game,” Stamatoyannopoulos said.

While staying small and nimble, the institute will get some big company advantages. Altius will be able to use some of GSK’s fancy instruments, like imaging, chemistry, and robotics tools that it couldn’t possibly corral in an academic institution.

The institute and the company expect to have what sounds like an open-door relationship. Some GSK scientists will be able to go on periodic leaves from their regular job to go work at the Seattle institute, taking what they learn back to the mother ship. Scientists at the institute say they have retained their academic freedom, in the right to publish all of their discoveries without prior review of GlaxoSmithKline, with one exception–when the work applies to proprietary compounds of the parent company.

Clearly, GSK is hoping for a return on its investment. The company is getting the first shot at licensing discoveries from Altius, and the right to spin companies out of it. The knowledge from Altius, ideally, should influence decision-making with a number of its experimental drugs.

The new center is expected to get up and running later this year in offices just north of Seattle’s famed Pike Place Market. Stamatoyannopoulos said he will retain his faculty position at the UW Genome Sciences department, and continue to oversee grant work he has there, including some of the ENCODE consortium efforts. The institute will have its own board of directors, and its own scientific advisory board, but it isn’t yet naming names or even saying how many members will be in each group. The agreement between the institute and the company covers a 10-year term, with $95 million of company support for the basic science and technology exploratory phase in the first 5 years and with additional funding in the latter years for specific drug discovery/development projects. The second half of the collaboration is expected to provide funding on par with first five years, but could be even bigger, Stamatoyannopoulos said.

Incidentally, Stamatoyannopoulos said he and his team don’t use the “dark matter” analogy anymore when describing their work on the regulatory regions of the genome, mainly because they have shed light on where that regulatory DNA is. But there’s still plenty of mystery. “There of course is an enormous amount to learn–but now we have the flashlights and searchbeams,” Stamatoyannopoulos said in an e-mail. “I usually use ‘living genome’ to distinguish from research that focuses just on DNA sequence (the ‘dead genome’), which doesn’t change, while the cell’s regulatory network does back flips in response to its environment or a drug.”

Luke Timmerman is the founder and editor of Timmerman Report, a subscription publication for biotech insiders.

["The Principle of Recursive Genome Function" was published in a peer reviewed scientific publication seven years ago (also popularized on Google Tech Talk YouTube, visited by more than seventeen thousand viewers) and a full free pdf of the peer reviewed paper is available for everyone (see list of publications). While maintaining an obsolete view that genome only encodes some upstream potentiality, and doesn’t read out what the organism is actually doing is the prerogative of any scientist - though any Editor who is convinced otherwise should not let this misimpression spread -any peer-reviewed scientific publication should demonstrate and acknowledge the knowledge of existing literature on the crucial matter of "Recursive Genome Function". The above two articles clinch the trend that Big IT and Big Pharma fiercely compete now for the "high ground". This columnist is already on the Board of USA and India-based Companies, and is available. andras_at_pellionisz_dot_com]


Recurrent somatic mutations in regulatory regions of human cancer genomes (Nature Genetics, dominant author Michael Snyder)

[Popular journalist coverage:

Stanford Team IDs Recurrently Mutated Regulatory Sites Across Cancer Types

Jun 08, 2015 | a GenomeWeb staff reporter]

To identify the regulatory mutations, Mike Snyder's laboratory at Stanford first established an analysis workflow for whole-genome data from 436 individuals from the TCGA. They used two algorithms, MuTect and VarScan 2, to identify SNVs from eight different cancer subtypes.

Next, they annotated the mutation set with gene and regulatory information from the gene annotation project Gencode and RegulomeDB, a database of regulatory data that includes data on transcription factors, epigenetic marks, motifs, and DNA accessibility.

Overall, they found that mutations in coding exons represented between .036 percent and .056 percent of called mutations for each cancer type, while mutations in putative regulatory regions represented between 31 percent and 39 percent of called mutations for each cancer type. The large fraction of regulatory mutations, "underscores the potential for regulatory dysfunction in cancer," the authors wrote.

The team identified a number of recurrently mutated genes and regulatory regions, and they replicated a number of known findings of recurrent mutations in driver genes, including mutations in the coding regions of TP53, AKT1, PIK3CA, PTEN, EGFR, CDKN2A, and KRAS.

They also identified recurrent mutations to the known TERT promoter gene and recurrent mutations in eight new loci in proximity of, and therefore potential regulators of, known cancer genes, including GNAS, INPP4B, MAP2K2, BCL11B, NEDD4L, ANKRD11, TRPM2 and P2RY8.

In addition, they found positive selection for mutations in transcription factor binding sites. For instance, mutations in the binding sites of CEBP factors were "enriched and significant across all cancer types," the authors wrote. In addition, they found enrichment for mutations in transcription factor binding sites that were either likely to "destroy the site or increase affinity of the site for transcription factor binding," the authors wrote. Such mutations could either inactive tumor suppressor genes or activate oncogenes.

"Overall, we expect that many regulatory regions will prove to have important roles in cancer, and the approaches and information employed in this study thus represent a significant advance in the analysis of such regions," the authors wrote.

---

ABSTRACT OF ORIGINAL PAPER: Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus–specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

["Seven years of hesitation" is famous in science. So is for The Principle of Recursive Genome Function (Pellionisz 2008) and the illustration of (fractal) recursive misregulation as the basis of cancer (Pellionisz Google Tech YouTube, 2008). The double paradigm-shift (reversal of both axioms of Old School Genomics) is now validated by first class, independent experimental results. While the Principle of Recursive Genome Function is not widely quoted after 7 years, Dr. Snyder et al. (2007) was among the first pioneers to go on record abolut the need of re-definition of genes and genome function. Now, with clear evidence that intergenic and even intronic non-coding sequences, in a recursive mode are responsible for the most dreaded genome regulation disease (cancer), it seems difficult to find alternative comprehensive theoretical framework for genome (mis)regulation. Andras_at_Pellionisz_dot_com]


Big Data (Stanford) 2015: Nobelist Michael Levitt (multi-scale biology) endorses the Fractal Approach to new school of genomics

Big Data at Stanford (2015) leveled the field of post-ENCODE genomics. On one hand, the insatiable demand for dwindling resources to generate Big (and Bigger) Data clearly ran into financial and data-privacy constraints. This was rather clear from presentations by NIH (putting the $200 M nose of the camel into the $2 Trillion "Precision Medicine Initiative" by sequencing AND interpreting the full DNA of up to 2 million humans, with 1 million people by the government's effort, a questionably overlapping another 1 million by an alternative private effort). In some rather sharp contrast, NSF answered a question if it leaves paradigm-shift genomic R&D to either strategic DARPA projects or for the Private Sector, could refer to a $28 M NSF program ("INSPIRE") that seems insufficient and rather hard even to qualify for. On the other hand, several start-up companies showed up (e.g. DNA Nexus, Seven Bridges, SolveBio, YouGenix - one CEO is a new member of the International Hologenomics Society), all eager to ramp-up their genome interpretation business much quicker than the already committed Big IT (Google Genomics, Amazon Web Services, IBM-Watson, Samsung, Sony, Apple, Siemens, SAP etc). In the forefront are, therefore key algorithms (just as "search engine algorithms" determined in the Age of Internet which company will emerge as a leader). From this viewpoint, it may be remarkable that FractoGene, already on record with no opposition by Nobelist Jim Watson upon presentation in Cold Spring Harbor, 2009, and already enjoying repeated support by "multi-scale biologist" Eric Schadt, at Big Data 2015 was endorsed by Nobelist Michael Levitt (Stanford, "multi-scale biology"). Dr. Levitt provided an unsolicited public endorsement as a "very good idea".


Eric Schadt - Big Data is revealing about the world’s trickiest diseases

Technically Brooklyn

April 16, 2015

If you learned about cystic fibrosis during biology class in high school, it was probably described as an inevitable condition of those whose genes included a specific set of mutations. It was thought to be inevitable because no one had ever found anyone with those mutations that didn’t have it. On the other hand, no one was checking people’s genes to see if they had the mutations when they didn’t show symptoms.

During the 2015 Lynford Lecture at NYU Poly, Mt. Sinai Hospital’s Eric Schadt explained how a big data methodology revealed a remarkable truth: When scientists look at large sets of genomic data of broad pools of test patients, they find small numbers of people with the genetic markers that would make them genetically predisposed to various diseases, and yet they weren’t symptomatic.

The remarkable finding here is that genetics do not necessarily represent an individual’s fate and somehow these individuals’ bodies worked out ways around their genetic disadvantages.

Schadt refers to these people as “heroes” and he believes that by studying them the medical profession can find new strategies of care for patients who are symptomatic.

Schadt is the director of the Icahn Institute for Genomics and Multiscale Biology, among other appointments, at Mt. Sinai. His talk served both as an exploration of a data-driven approach to determining strategies of care, an argument for a network-oriented approach to determining multiple interventions against disease as well as an argument for encouraging non-expert investigation of biological problems.

For this latter point, we have the example of Suneris, a company whose completely novel approach to stopping bleeding was discovered by a college freshman, not a doctor.

Here are some other compelling points from Schadt’s talk:

Bias. A huge stumbling block in the healthcare system is the bias toward acute care. Acute care is treating problems. That’s what hospitals are set up to treat and that’s what they get paid the best to deal with. It is not, however, what is best for patients.

Lots of apps, lots of data. A lot of data is getting collected by something like 50,000-100,000 mobile apps that in one way or another relate to health. With all this data, it’s possible to start getting very serious about targeted, specific prevention strategies for individuals that treat them as a whole person.

Locus of power. In 5-10 years time, there will be far more data about your health outside of medical centers than inside them.

Massive new studies powered by apps. Mt. Sinai just launched an app in collaboration with Apple to study asthma sufferers and help them manage their condition as they did so. It’s in the App Store. Within six hours of announcing it with Tim Cook, Mt. Sinai had enrolled several thousand people, a number that would take traditional studies years to achieve.

Informed consent. Schadt called the informed consent process built into the app its “crowning achievement.” Subsequent testing showed that users who went through their informed consent dialogue understood what they were agreeing to better than people who went through an informed consent process with another person.

Data yields surprises. By building a complete model based on multiple networks and developing it to the point that they were able model how different genes might express themselves under different conditions and different treatments, Mt. Sinai scientists were able to find a drug that was indicated for a wildly other use relating to irritable bowel syndrome. Big data makes it possible to find treatments by just running different inputs through models, regardless of indication or researcher assumptions.

[Eric Schadt is a double-degree mathematician, with Ph.D. in Biomathematics from UCLA. Started to turn "Big Pharma" (Merck) towards Information Technology. Later became the Chief Scientist of the Silicon Valley genome sequencing company Pacific Biosciences, to interpret genome information. In 30 minute compute time identified Haiti epidemic strain. With $600 M, established the Mount Sinai Center of Genomics and Multiscale Biology in Manhattan. Moved North to suburb (454), now lectured in Brooklyn. The almost 2 hour long video could be a Ph.D. thesis on the challenges of the sick-to-health-care IT-led paradigm shift. Not only abandons obsolete "gene/junk" dogma, but now also considers obsolete the "pathways" concept. Strong supporter of the fractal approach - expected to analyze parallel self-similar recursions. There are too many highly relevant comments in Eric's lecture. Suffice to mention that in BGI (China) for every single genome analyzer there are about 50 (fifty) software developers. In the USA this number is 1-3 (about twenty times less). Another bullet-point mentions that very soon there will be a lot more health-data OUTSIDE, not within the hospitals. As an NYU Medical Center professor, I can state with some authority that such "data center" will not be in Manhattan (real estate is way too expensive). Likewise, in the article below (IBM-Apple), in Silicon Valley it is actually very easy to tell where it will be located (hint: I have worked for some years as a Senior Research Council Advisor of the National Academy to NASA Ames Research Center. "Next door" is one of the busiest Internet-hub...) andras_at_pellionisz_dot_com]


IBM Announces Deals With Apple, Johnson And Johnson, And Medtronic In Bid To Transform Health Care

IBM Almaden Research Center, Silicon Valley, California

Apple Second Campus, Silicon Valley

Forbes, April 15, 2015

Experts in health care and information technology agree on the future’s biggest opportunity: the creation of a new computational model that will link together all of the massive computers that now hold medical information. The question remains: who will build it, and how?

IBM IBM -0.61% is today staking its claim to be a major player in creating that cloud, and to use its Watson artificial intelligence – the one that won on the TV game show Jeopardy – to make sense of the flood of medical data that will result. The new effort uses new, innovative systems to keep data secure, IBM executives say, even while allowing software to use them remotely.

“We are convinced that by the size and scale of what we’re doing we can transform this industry,” says John Kelley, Senior Vice President, IBM Research. “I’m convinced that now is the time.”

Big Blue is certainly putting some muscle into medicine. Some 2,000 employees will be involved in a new Watson-in-medicine business unit. The Armonk, N.Y.-based computing giant is making two acquisitions, too, buying Cleveland’s Explorys, an analytics company that has access to 50 million medical records from U.S. patients, and Dallas’ Phytel, a healthcare services head of IBM’s Life Science company that provides feedback to doctors and patients for follow-up care. Deal prices were not disclosed.

It is also announcing some big partnerships:

• Apple AAPL -0.47% will work to integrate Watson-based apps into its HealthKit and ResearchKit tool systems for developers, which allow the collection of personal health data and the use of such data in clinical trials.

• Johnson & Johnson JNJ -0.81%, which is one of the largest makers of knee and hip implants, will use Watson to create a personal concierge service to prepare patients for knee surgery and to help them deal with its after effects.

• Medtronic MDT -1.14%, the maker of implantable heart devices and diabetes products, will use Watson to create an “internet of things” around its medical gadgets, collecting data both for patients’ personal use and, once it’s anonymized, for understanding how well the devices are working. Initially, the focus is on diabetes.

IBM’s pitch is that it will be able to create a new middle layer in the health care system – linking the old electronic records systems, some of which have components dating back to the 1970s, with a new, cloud-based architecture, because of its deep breadth of experience.

And there is no doubt that there is a need for data science that can parse the explosion of information that will soon be created by every patient. Already, there is too much information for the human brain. “If you’re an oncologist there are 170,000 clinical trials going on in the world every year,” says Steve Gold, VP, IBM Watson.

The question is how ready Watson is to take on the challenge. IBM isn’t the only one that sees opportunity here. The billionaire Patrick Soon-Shiong is aiming to create a system to do many of the same things with his NantHealth startup. Flatiron Health, a hot startup in New York, is creating analytics for cancer. The existing health IT giants, Cerner and Epic, both certainly have their eyes on trying to capture some of this new, interconnected market, lest it make them obsolete.

So far, Watson has been a black box when it comes to healthcare. IBM has announced collaborations with Anthem, the health insurer, and medical centers including M.D. Anderson, Memorial Sloan-Kettering Cancer Center, and The Cleveland Clinic. There are lots of positive anecdotal reports, but so far the major published paper from Watson is a computer science paper published by the Baylor College of Medicine that identified proteins that could be useful drug targets.

“I think that ultimately somebody’s going to figure out how to integrate all these sources of data, analyze them, sort the signal to noise, and when someone can do that, it will improve the health care system,” says Robert Wachter, the author of The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age and associate chair of medicine UCSF.

“Does this do that tomorrow? No. But do we need to create the infrastructure to do that? Yes. And are they probably the best-positioned company with the best track record to do this? I think so.”

–Sarah Hedgecock contributed reporting to this story.

[This is a global "game changer", since what I predicted over a decade ago now actually happened. It is "news" but not a "surprise". IBM has long targeted the "health care" traditional market, but with genomics, it was Google Genomics, Amazon Genomics and IBM cloud-genomics that prepared for changing, by means of Information Technology, the $2 Trillion (USA) "IT matters for your health". The IBM announcement includes Apple (informally), but all others, plus global IT companies like Samsung, Sony, Panasonic, BGI, Siemens, SAP (etc) are also in the ring. Information Technology, however, is not even the hardest challenge (see my 2008 YouTube, Information Theory is the bottleneck). As for IT1 (Information Technology), it appears that "multicore" (beyond 128) is not the way to go - "cloud computing" is the name of the game today. However, for instance IBM Research at Almaden (Silicon Valley) points out, the "techie-challenge" is far deeper; it is the "non-von-Neumann computing architecture" (with their prototype SYNAPSE-chip, with over a million neurons and gezillion connections among them, to learn e.g. pattern recognition by neural net algorithms - with power consumption an order of magnitude smaller than what a smart phone battery provides). Science-wise (minus the chip) such a neuronal network model was built a generation ago. As the above long lecture by Eric Schadt shows, however, the gap between the "medical establishment" and the "genome informatics specialists" is visibly stunning. - andras_at_pellionisz_dot_com]


An 'evolutionary relic' of the genome causes cancer

Pseudogenes, a sub-class of long non-coding RNA (lncRNA) that developed from the genome's 20,000 protein-coding genes but lost the ability to produce proteins, have long been considered nothing more than genomic "junk." Yet the retention of these 20,000 mysterious remnants during evolution has suggested that they may in fact possess biological functions and contribute to the development of disease.

Now, a team led by investigators in the Cancer Research Institute at Beth Israel Deaconess Medical Center (BIDMC) has provided some of the first evidence that one of these non-coding "evolutionary relics" actually has a role in causing cancer.

In a new study in the journal Cell, publishing online today, the scientists report that independent of any other mutations, abnormal amounts of the BRAF pseudogene led to the development of an aggressive lymphoma-like disease in a mouse model, a discovery that suggests that pseudogenes may play a primary role in a variety of diseases. Importantly, the new discovery also suggests that with the addition of this vast "dark matter" the functional genome could be tremendously larger than previously thought - triple or quadruple its current known size.

"Our mouse model of the BRAF pseudogene developed cancer as rapidly and aggressively as it would if you were to express the protein-coding BRAF oncogene," explains senior author Pier Paolo Pandolfi, MD, PhD, Director of the Cancer Center and co-founder of the Institute for RNA Medicine (iRM) at BIDMC and George C. Reisman Professor of Medicine at Harvard Medical School. "It's remarkable that this very aggressive phenotype, resembling human diffuse large B-cell lymphoma, was driven by a piece of so-called 'junk RNA.' As attention turns to precision medicine and the tremendous promise of targeted cancer therapies, all of this vast non-coding material needs to be taken into account. In the past, we have found non-coding RNA to be overexpressed, or misexpressed, but because no one knew what to do with this information it was swept under the carpet. Now we can see that it plays a vital role. We have to study this material, we have to sequence it and we have to take advantage of the tremendous opportunity that it offers for cancer therapy."

The new discovery hinges on the concept of competing endogenous RNAs (ceRNA), a functional capability for pseudogenes first described by Pandolfi almost five years ago when his laboratory discovered that pseudogenes and other noncoding RNAs could act as "decoys" to divert and sequester tiny pieces of RNA known as microRNAs away from their protein-coding counterparts to regulate gene expression.

"Our discovery of these 'decoys' revealed a novel new role for messenger RNA, demonstrating that beyond serving as a genetic intermediary in the protein-making process, messenger RNAs could actually regulate expression of one another through this sophisticated new ceRNA 'language,'" says Pandolfi. The team demonstrated in cell culture experiments that when microRNAs were hindered in fulfilling their regulatory function by these microRNA decoys there could be severe consequences, including making cancer cells more aggressive.

In this new paper, the authors wanted to determine if this same ceRNA "cross talk" took place in a living organism—and if it would result in similar consequences.

"We conducted a proof-of-principle experiment using the BRAF pseudogene," explains first author Florian Karreth, PhD, who conducted this work as a postdoctoral fellow in the Pandolfi laboratory. "We investigated whether this pseudogene exerts critical functions in the context of a whole organism and whether its disruption contributes to the development of disease." The investigators focused on the BRAF pseudogene because of its potential ability to regulate the levels of the BRAF protein, a well-known proto-oncogene linked to numerous types of cancer. In addition, says Karreth, the BRAF pseudogene is known to exist in both humans and mice.

The investigators began by testing the BRAF pseudogene in tissue culture. Their findings demonstrated that when overexpressed, the pseudogene did indeed operate as a microRNA decoy that increased the amounts of the BRAF protein. This, in turn, stimulated the MAP-kinase signaling cascade, a pathway through which the BRAF protein controls cell proliferation, differentiation and survival and which is commonly found to be hyperactive in cancer.

When the team went on to create a mouse model in which the BRAF pseudogene was overexpressed they found that the mice developed an aggressive lymphoma-like cancer. "This cancer of B-lymphocytes manifested primarily in the spleens of the animals but also infiltrated other organs including the kidneys and liver," explains Karreth. "We were particularly surprised by the development of such a dramatic phenotype in response to BRAF pseudogene overexpression alone since the development of full-blown cancer usually requires two or more mutational events."

Similar to their findings in their cell culture experiments, the investigators found that the mice overexpressing the BRAF pseudogene displayed higher levels of the BRAF protein and hyperactivation of the MAP kinase pathway, which suggests that this axis is indeed critical to cancer development. They confirmed this by inhibiting the MAP kinase pathway with a drug that dramatically reduced the ability of cancer cells to infiltrate the liver in transplantation experiments.

The Pandolfi team further validated the microRNA decoy function of the BRAF pseudogene by creating two additional transgenic mice, one overexpressing the front half of the BRAF pseudogene, the other overexpressing the back half. Both of these mouse models developed the same lymphoma phenotype as the mice overexpressing the full-length pseudogene, a result which the authors describe as "absolutely astonishing."

"We never expected that portions of the BRAF pseudogene could elicit a phenotype and when both front and back halves induced lymphomas, we were certain the BRAF pseudogene was functioning as a microRNA decoy," says Karreth.

The investigators also found that the BRAF pseudogene is overexpressed in human B-cell lymphomas and that the genomic region containing the BRAF pseudogene is commonly amplified in a variety of human cancers, indicating that the findings in the mouse are of relevance to human cancer development. Moreover, say the authors, silencing of the BRAF pseudogene in human cancer cell lines that expressed higher levels led to reduced cell proliferation, a finding that highlights the importance of the pseudogene in these cancers and suggests that a therapy that reduces BRAF pseudogene levels may be beneficial to cancer patients.

"While we have been busy focusing on the genome's 20,000 coding genes, we have neglected perhaps as many as 100,000 noncoding genetic units," says Pandolfi. "Our new findings not only tell us that we need to characterize the role of all of these non-coding pseudogenes in cancer, but, more urgently, suggest that we need to increase our understanding of the non-coding 'junk' of the genome and incorporate this information into our personalized medicine assays. The game has to start now—we have to sequence and analyze the genome and the RNA transcripts from the non-coding space."

[The game had started at least by 2002 (13 years ago), when FractoGene was submitted, but is ready now with key IP (8,280,641 in force with Trade Secrets to improve Best Methods as of the last CIP in 2007 - that is 8 years ago. andras_at_pellionisz_dot_com]

[What is the equivalent to the "Flat Earth Society" in the "Junk DNA Upholding Blogspace", grave concern about their untenable dogma is quite revealing. While unable to identify the proper DOI there, question is raised if the press release represents the views of the authors. For those behind paywall, here is a verbatim paragraph from the paper [AJP]:]

"Pseudogenes were considered genomic junk for decades, but their retention during evolution argues that they may possess important functions and that their deregulation could contribute to the development of disease. Indeed, several lines of evidence have associated pseudogenes with cellular transformation (Poliseno, 2012). Our study shows that aberrant expression of a pseudogene causes cancer, thus vastly expanding the number of genes that may be involved in this disease. Moreover, our work emphasizes the functional importance of the non-coding dimension of the transcriptome and should stimulate further studies of the role of pseudogenes in the development of disease."


Time Magazine Cover Issue - Closing the Cancer Gap

[We are beyond "the point of no return". As is widely known, potent (and expensive) cancer therapies might be next to ineffective for one person with cancer medically characterized as the same as in the other person (for whom the same therapy could be dramatically effective). The emerging "precision medicine" in cancer already reached "the point of no return". The Time Magazine Cover Story does not qualify as "good news or bad news" its box "Less than 5% of the 1.6 million Americans diagnosed with cancer each year can take advantage of genetic testing" - it clearly indicates to me that 5% is actually "a point of no return". Granted that reimbursed for genomic testing by some insurance companies is "a struggle" and the 5% percentage is unquestionably low, the wide dissemination e.g. by Time Magazine (also with its title) shows that there is no other way to go, and the question is a matter of realization by the public that "science delivers" - of course with proper time/money allocation. The news above (on non-coding "pseudogenes" - by dogmatics held way too long as "junk DNA for the purpose of doing nothing" (Ohno, 1972) - as a lid is also blown away. andras_at_pellionisz_doc_com]


We have run out of money - time to start thinking!

Dr. Harold Varmus to Step Down as NCI Director

A Letter to the NCI Community

March 4, 2015

To NCI staff, grantees, and advisors:

I am writing to let you know that I sent a letter today to President Obama, informing him that I plan to leave the Directorship of the National Cancer Institute at the end of this month.

I take this step with a mixture of regret and anticipation. Regret, because I will miss this job and my working relationships with so many dedicated and talented people. Anticipation, because I look forward to new opportunities to pursue scientific work in the city, New York, that I continue to call home.

The nearly five years in which I have served as NCI Director have not been easy ones for managing this large enterprise—one that offers so much hope for so many. We have endured losses in real as well as adjusted dollars; survived the threats and reality of government shutdowns; and have not yet recovered all the funds that sequestration has taken away. This experience has been especially vivid to those of us who have lived in better times, when NIH was the beneficiary of strong budgetary growth. As Mae West famously said, "I’ve been rich and I’ve been poor, and rich is better."

While penury is never a good thing, I have sought its silver linings. My efforts to cope with budgetary limits have been guided by Lord Rutherford’s appeal to his British laboratory group during a period of fiscal restraint a century ago: "…we’ve run out of money, it is time to start thinking." Rather than simply hold on to survive our financial crisis without significant change, I have tried with essential help from my senior colleagues to reshape some of our many parts and functions. In this way, I have tried to take advantage of some amazing new opportunities to improve the understanding, prevention, diagnosis, and treatment of cancers, despite fiscal duress.

This is not the place for a detailed account of what we have achieved over the past five years. But a brief list of some satisfying accomplishments serves as a reminder that good things can be done despite the financial shortfalls that have kept us from doing more:

The NCI has established two new Centers: one for Global Health, to organize and expand a long tradition of studying cancer in many other countries; and another, for Cancer Genomics, to realize the promise of understanding and controlling cancer as a disorder of the genome.

Our clinical trials programs (now called the National Clinical Trials Network [NCTN] and the NCI Community Oncology Research Program [NCORP]) have been reconfigured to achieve greater efficiencies, adapt to the advent of targeted drugs and immunotherapies, and enhance the contributions of community cancer centers.

Research under a large NCI contract program in Frederick, Maryland, has been redefined as the Frederick National Laboratory for Cancer Research (FNLCR), with more external advice, a large new initiative to study tumors driven by mutant RAS genes, and greater clarity about FNLCR’s role as a supporter of biomedical research.

In efforts to provide greater stability for investigators in these difficult times, we have established a new seven year Outstanding Investigator Award; are discussing new awards to accelerate graduate and post-doctoral training; and are planning to provide individual support for so-called "staff scientists" at extramural institutions.

To strengthen the NCI-designated cancer centers, we are awarding more supplements to the centers’ budgets to encourage work in high priority areas; helping centers to share resources; and working with the center directors to develop more equitable funding plans.

The NCI has attempted to improve the grant-making process in various ways at a time when success rates for applicants have reached all-time lows:

We have engaged our scientists to identify inadequately studied but important questions about cancer—so-called Provocative Questions—and have provided funds for many well-regarded applications to address them.

We have pioneered the use of a descriptive account of an applicant’s past accomplishments, moving away from mere listings of publications, to allow a fairer appraisal of past contributions to science.

Our program leaders now make more nuanced decisions about funding many individual grants, considering a wide range of highly rated applications, not simply those with scores above an arbitrary pay-line.

And we have maintained NCI’s numbers of research project grants, despite the limits on our budget, while continuing to emphasize the importance of balancing unsolicited applications to do basic cancer research against an increasing call for targeted programs to deliver practical applications.

Of course, it is still too early to judge the long-term consequences of most of these actions. But we do know that many good things have happened in cancer research over the past five years as a result of existing investments:

Our understanding of cancer biology has matured dramatically with the near-completion of The Cancer Genome Atlas and with results from other programs that depend on genomics and basic science, including work with model systems.

Many new targeted therapies have been tested in clinical trials, and several have been approved for general use.

Remarkable clinical successes against several kinds of cancers have been reported with immunological tools—natural and synthetic antibodies, checkpoint inhibitors, and chimeric T cell receptors.

More widespread use of a highly effective vaccine against human papilloma viruses (HPV) and the several cancers they cause has been encouraged by further studies and by an important report from the President’s Cancer Panel.

Radiographic screening for lung cancers in heavy smokers—validated by a large-scale trial just after I arrived at the NCI—has now been endorsed for wide-spread use and for reimbursement by Medicare and other insurers.

New computational methods, such as cloud computing and improved inter-operability, are advancing the dream of integrating vast amounts of molecular data on many cancers into the daily care of such cancers.

Some of these advances are now essential features of the President’s recently announced Precision Medicine initiative that will focus initially on cancer.

Such accomplishments have been possible only because the NCI has been able to recruit and retain exceptional people during my years here; I am grateful to all of you. I am also grateful to the many selfless individuals who have made our advisory groups stronger than ever and to the cancer research advocates who regularly remind me—as well as Congress and the public—about the importance of our work to human welfare.

So what is next?

In my remaining few weeks in this position, I will continue to do the NCI Director’s job with customary energy, despite my inevitable status as a "lame duck." I will also schedule a Town Hall meeting to review some of the things that have happened during my tenure here—revisiting the ambitions I announced when I accepted the job and answering questions.

As I just learned today, the White House has approved the appointment of my chief deputy and close friend, Doug Lowy, to serve as Acting Director of the NCI, beginning on April 1st. This gives me enormous pleasure, because Doug—along with Jim Doroshow, the NCI’s Deputy Director for Clinical and Translational Research—made many of NCI’s recent accomplishments possible; is a distinguished scientist, who was recently honored by the President with a National Medal for Technology and Innovation for his work on human papilloma virus vaccines; and is a remarkably congenial person to work with. The NCI will be in excellent hands.

Finally, when I return to New York City full time on April 1st, I will establish a modestly sized research laboratory in the Meyer Cancer Center at the Weill-Cornell Medical College and serve as a senior advisor to the Dean. In addition, I plan to assist the recently founded New York Genome Center as it develops its research and service functions and helps regional institutions introduce genomics into cancer care.

While I look forward to these new adventures and to leading a life concentrated in one place, I know I will miss many of the people, authorities, and ideas that make the NCI Directorship such a stimulating and rewarding position.

With deep respect and gratitude to the entire NCI community,

Harold Varmus

Posted: March 4, 2015

----

--http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4199368/

Genome Res. 2014 Oct; 24(10): 1559–1571.

doi: 10.1101/gr.164871.113

PMCID: PMC4199368

Systems consequences of amplicon formation in human breast cancer

Koichiro Inaki,1,2,9 Francesca Menghi,1,2,9 Xing Yi Woo,1,9 Joel P. Wagner,1,2,3 Pierre-Étienne Jacques,4,5 Yi Fang Lee,1 Phung Trang Shreckengast,2 Wendy WeiJia Soon,1 Ankit Malhotra,2 Audrey S.M. Teo,1 Axel M. Hillmer,1 Alexis Jiaying Khng,1 Xiaoan Ruan,6 Swee Hoe Ong,4 Denis Bertrand,4 Niranjan Nagarajan,4 R. Krishna Murthy Karuturi,4,7 Alfredo Hidalgo Miranda,8 and Edison T. Liucorresponding author1,2,7

1Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, Genome, Singapore 138672, Singapore;

2The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06030, USA;

3Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;

4Computational and Systems Biology, Genome Institute of Singapore, Genome, Singapore 138672, Singapore;

5Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada;

6Genome Technology and Biology, Genome Institute of Singapore, Genome, Singapore 138672, Singapore;

7The Jackson Laboratory, Bar Harbor, Maine 04609, USA;

8National Institute of Genomic Medicine, Periferico Sur 4124, Mexico City 01900, Mexico

corresponding authorCorresponding author.

9These authors contributed equally to this work.

Although in earlier studies the major focus was to find specific driver oncogenes in amplicons and tumor suppressor genes in common regions of loss (primarily using loss of heterozygosity mapping), progressively there emerged an understanding that more than one driver oncogene may be present in any amplicon. Moreover, each amplicon or region of copy number loss alters the expression of many adjacent genes, some with proven conjoint cancer effects (Zhang et al. 2009; Curtis et al. 2012). Thus, any cancer is likely to be a composite of hundreds to thousands of gene changes that contribute to the cancer state. Although specific point mutations contribute to adaptive evolutionary processes, recent genomic analyses from controlled evolutionary experiments in model systems suggest that copy number changes through segmental duplications and rearrangements may play a more prominent role (Chang et al. 2013; Fares et al. 2013).

--

Mutations in noncoding DNA also cause cancer

New discovery could lead to novel field of study within cancer research.

October 12, 2014 - 06:2

.

An international group of cancer researchers have completed the first ever systematic study of noncoding DNA. They found that mutations in the noncoding DNA can, despite previous beliefs, cause cancer.

Until now, scientists have only investigated 1.5 per cent of the total humane DNA. This is the part of the DNA which consists of genes. The remaining 98.5 per cent of the DNA is called noncoding DNA and resides outside of the genes.

The study, just published in Nature Genetics, shows that the majority of cancer patients have mutations in both their genes and the areas outside the genes.

The discovery could lead to a completely new field of study within cancer research and prevention.

"In the long term this may lead to better diagnoses and treatments," says co-author postdoc Anders Jacobsen from the University of Copenhagen at the department of Computational and RNA Biology.

Over the past 10 years scientists have found more and more abnormalities in DNA which lead to cancer.

Colleague is excited

Professor and Head of the Department of Genomic Medicine at Rigshospitalet Finn Cilius Nielsen did not contribute to the study, but has read it and is very excited.

He says the study shows the importance of looking into the noncoding regions of our DNA.

“It's interesting and points to the fact that we could discover clinically relevant information from the noncoding regions," says Nielsen. "Studies like this one could come up with some vital explanations for the causes of cancer," says Nielsen.

Examined 20 different cancer types

The scientists were looking at DNA mutations in 800 cancer patients with more than 20 different types of cancer.

They compared DNA from the patients' tumours with DNA from healthy tissue from the same patients. By doing so they were able to identify the differences between healthy and sick cells and the reason why the tumour had grown.

The scientists were interested in the noncoding regions of the DNA. These regions do not translate into protein as genes do -- instead, they have a different, biochemical task. They regulate how much of a particular gene is expressed. That is, if the gene is to be “on” or “off”.

“For the first time we have been able to see mutations in the noncoding DNA and how these can be the direct cause of cancer,” says Jacobsen.

Mutation gives cancer eternal life

Several mutations connected to the development of cancer were discovered by the scientists. They found that mutations in the front area of the gene which controls the length of telomeres, can trigger cancer.

Telomeres decide how many times a cell can divide and every time a cell divides the telomeres becomes shorter.

This means that at some stage the telomeres are so short that the cells can not longer divide.

However, mutations in the region before the gene TERT makes the gene hyperactive. The length of the telomeres are then extended much more than what is considered normal and a mutation like this will make the cell keep on dividing itself -- eventually forming a tumour.

“This mutation in the noncoding part of the DNA basically gives the cancer cells eternal life," says Jacobsen. "It was exciting that our research proved to have such a concrete result."

The scientists found that this mutation was the most frequent occurrence of cancer-causing mutations outside the gene.

More studies in the future

Jacobsen is convinced there will be many more studies wlooking at the noncoding DNA in the future.

"Our study shows that there's something here which needs to be looked at. With more studies, we can get a much better insight into what happens in cells when cancer occurs,” says Jacobsen. “We can learn a lot about he different cancers and their causes from this. In the long run we hope to develope new treatments.”

Nielsen agrees that there is a need for further studies in the area.

"We need more studies of this kind. I think it'll happen naturally. Within the next 10 to 15 years we'll be able to do complete genome sequencing quickly and cheaply, and then we'll naturally look at mutations in the entire genome -- rather than just in the genes," he says.

--------------

Read the original story in Danish on Videnskab.dk

[Some of us have been thinking, moreover using high-performance computers for quite some time aiming at the "NP-hard" problem of fractal pattern recognition. The first (double) disruption was to replace the mistaken dogmas of "Junk DNA" and "Central Dogma". "Genes failed us" - the very concept of "oncogenes" seemed to exclude the obvious that not only the presently 571 "oncogenes" that have already been found may include ALL genes that can potentially become "misregulated" by fractal defects also in the vast sees in the intergenic "non-coding" (not-Junk) DNA. Any qualified informatics-specialist or physicist would be mesmerized to wittness a PERSON trying to figure out nuclear particles either in fission or fusion (once the "axiom" that the atom would not split was invalidated by its splitting). How many hundreds of millions would have to face a uniquely miserable death till a global effort is directed to the informatics- and computing challenges of "genome misregulation" (a.k.a. cancer)? - andras_at_pellionisz_dot_com]


The Genome (both DNA and RNA) is replete with repeats. These are facts. The question is the mathematics (fractals) that is best suited to interpret self-similar repetitions

Isidore Rigoutsos (Greek-American mathematician) surprised the world in 2006 that the DNA (coding or not) is replete with ";yknon"-s (repetitions). Pointing out their astounding feature of "self-similarity", Pellionisz interpeted Rigoutos' "pyknon"-s as the facts that genome function must be understood in terms of fractals. In a study first shown in Cold Spring Harbor (2009), Pellionisz demonstrated for the smallest genome of a free living organism (Mycoplasma Genitaliae), that the distribution of self-similar repetitions follows the Zipf-Mandelbrot-Parabolic-Fractal Distribution Curve. (See Figure here). In two weeks, Erez Lieberman, Eric Lander (and others) put the Hilbert-fractal globule on Science cover.

Now, about 40 co-authors, with last author Rigoutsos (including the pioneer of RNA, John Mattick) published in PNAS a paper available in full here.

Just a glance at their Fig. 7 (above) will instantly convince all that microRNA-s (that are the culprit of genome regulation with dual valence), manifest "self-similar repetitions". [You may wonder what happens next, andras_at_pellionisz_dot_com]


On the Fractal Design in Human Brain and Nervous Tissue - Losa recognizes FractoGene

... the FractoGene “cause and effect” concept conceived that “fractal genome governs fractal growth of organelles, organs and organisms” Pellionisz, A.J. (2012) The Decade of FractoGene: From Discovery to Utility-Proofs of Concept Open Genome-Based Clinical Applications. International Journal of Systemics, Cybernetics and Informatics, 17-28.. The Principle of this recursive genome function (PRGF) breaks through the double lock of central dogma and junk DNA barriers Pellionisz, A. (1989) Neural Geometry: Towards a Fractal Model of Neurons. Cambridge University Press, Cambridge.. Decades of computer modeling of neurons and neuronal networks suggested that the amount of information necessary to build just a tiny fraction of the human body, i.e. just the cerebellum of the nervous system, was a task for which the 1.3% of the information that the genome could contain [as "genes"] was just totally insufficient Pellionisz, A. (2008) The Principle of Recursive Genome Function. Cerebellum, 7, 348-359., http://dx.doi.org/10.1007/s12311-008-0035-y.

... Among the main fractal peculiarities worth noticing is the process of iteration, whose powerful dynamics allows specific generators to be properly iterated at different scales (small and large) without an a priori choice, by linking efficient genetic programming in order to achieve the formation of viable biological forms and living objects Di Ieva, A., Grizzi, F., Jelinek, H., Pellionisz, A.J. and Losa, G.A. (2013) Fractals in the Neurosciences, Part I: General Principles and Basic Neurosciences. The Neuroscientist. PMID: 24362815

How to cite this paper: Losa, G.A. (2014) On the Fractal Design in Human Brain and Nervous Tissue. Applied Mathematics, 5,

1725-1732. http://dx.doi.org/10.4236/am.2014.512165

[Recognition of FractoGene by Gabriele Losa (and co-publishing in 2014) is significant since Dr. Losa in Switzerland pionereed, in a Four-Volume-Meeting-Book prior and at the Human Genome Project, providing an excellent compilation of book-chapters both on the fractality of genome, and separately on the fractality of organisms. In fact, some contributions contained pointers to both fractalities. However, just about the time "to connect the dots", "The Human Genome Project", with its historically mistaken focus on "genes" (motivated by personal enthusiasm by Jim Watson, such that by mapping all human genes, the "schizophrenia gene" should also be found) the fractal pioneering by Dr. Losa was put on a back-burner. It took another decade till FractoGene (2002) "connected the dots" that the "cause and effect" of fractal genome governs fractal growth of organielles, organs and organisms could break through the double lock of central dogma and junk DNA barriers that unfortunately still prevailed through the Losa Books (1-4). Outside that double straightjacket the enormous utility is now free to roam. "Google Alert" pointed to this Losa paper with delay - Dr. Pellionisz respectfully requests .pdf reprints of publications pertinent to FractoGene be sent ASAP to andras_at_pellionisz_dot_com for proper contemporary compilation and cross-reference. Indeed, as heralded in Google Tech Talk YouTube (2008) time is ripe for a postmodern meeting (with Proceedings). Those interested should contact Dr. Pellionisz]


CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics

[The March 16, 2015 issue of Pharmaceutical Intelligence, with the Introduction by Dr. Larry H. Bernstein, puts together an earlier assessment of the disruptive fractal approach to genomics with the new hope of genome editing. "Fractal defects" appear in an entirely new light with genome editing becoming a reality. Pharmaceutical Intelligence excerpts are edited by AJP; hyperlinks and the central email address corrected; andras_at_pellionisz_dot_com]

http://pharmaceuticalintelligence.com/contributors-biographies/members-of-the-board/larry-bernstein/

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics

[About Dr. Larry H. Bernstein] - I retired from a five year position as Chief of the Division of Clinical Pathology (Laboratory Medicine) at New York Methodist Hospital-Weill Cornell Affiliate, Park Slope, Brooklyn in 2008 folowed by an interim consultancy at Norwalk Hospital in 2010. I then became engaged with a medical informatics project called “Second Opinion” with Gil David and Ronald Coifman, Emeritus Professor and Chairman of the Department of Mathematics in the Program in Applied Mathematics at Yale. I went to Prof. Coifman with a large database of 30,000 hemograms that are the most commonly ordered test in medicine because of the elucidation of red cell, white cell and platelet populations in the blood. The problem boiled down to a level of noise that exists in such data, and developing a primary evidence-based classification that technology did not support until the first decade of the 21st century.

Part II B: Computational Genomics

1. Three-Dimensional Folding and Functional Organization Principles of The Drosophila Genome

Sexton T, Yaffe E, Kenigeberg E, Bantignies F,…Cavalli G. Institute de Genetique Humaine, Montpelliere GenomiX, and Weissman Institute, France and Israel. Cell 2012; 148(3): 458-472.

http://dx.doi.org/10.1016/j.cell.2012.01.010/

http://www.cell.com/retrieve/pii/S0092867412000165

http://www.ncbi.nlm.nih.gov/pubmed/22265598

Chromosomes are the physical realization of genetic information and thus form the basis for its readout and propagation. The entire genome is linearly partitioned into well-demarcated physical domains that overlap extensively with active and repressive epigenetic marks.

Chromosomal contacts are hierarchically organized between domains. Global modeling of contact density and clustering of domains show that inactive domains are condensed and confined to their chromosomal territories, whereas active domains reach out of the territory to form remote intra- and interchromosomal contacts.

Moreover, we systematically identify specific long-range intrachromosomal contacts between Polycomb-repressed domains.

Together, these observations allow for quantitative prediction of the Drosophila chromosomal contact map, laying the foundation for detailed studies of chromosome structure and function in a genetically tractable system.

2A. Architecture Reveals Genome’s Secrets

Three-dimensional genome maps - Human chromosome

Genome sequencing projects have provided rich troves of information about stretches of DNA that regulate gene expression, as well as how different genetic sequences contribute to health and disease. But these studies miss a key element of the genome - its spatial organization -which has long been recognized as an important regulator of gene expression.

Regulatory elements often lie thousands of base pairs away from their target genes, and recent technological advances are allowing scientists to begin examining how distant chromosome locations interact inside a nucleus.

The creation and function of 3-D genome organization, some say, is the next frontier of genetics.

Mapping and sequencing may be completely separate processes. For example, it’s possible to determine the location of a gene - to “map” the gene - without sequencing it. Thus, a map may tell you nothing about the sequence of the genome, and a sequence may tell you nothing about the map. But the landmarks on a map are DNA sequences, and mapping is the cousin of sequencing. A map of a sequence might look like this:

On this map, GCC is one landmark; CCCC is another. Here we find, the sequence is a landmark on a map. In general, particularly for humans and other species with large genomes, creating a reasonably comprehensive genome map is quicker and cheaper than sequencing the entire genome, mapping involves less information to collect and organize than sequencing does.

Completed in 2003, the Human Genome Project (HGP) was a 13-year project. The goals were:

* identify all the approximately 20,000-25,000 genes in human DNA,

determine the sequences of the 3 billion chemical base pairs that make up human DNA,

store this information in databases,

improve tools for data analysis,

transfer related technologies to the private sector, and

address the ethical, legal, and social issues (ELSI) that may arise from the project.

Though the HGP is finished, analyses of the data will continue for many years. By licensing technologies to private companies and awarding grants for innovative research, the project catalyzed the multibillion-dollar U.S. biotechnology industry and fostered the development of new medical applications. When genes are expressed, their sequences are first converted into messenger RNA transcripts, which can be isolated in the form of complementary DNAs (cDNAs). A small portion of each cDNA sequence is all that is needed to develop unique gene markers, known as sequence tagged sites or STSs, which can be detected using the polymerase chain reaction (PCR). To construct a transcript map, cDNA sequences from a master catalog of human genes were distributed to mapping laboratories in North America, Europe, and Japan. These cDNAs were converted to STSs and their physical locations on chromosomes determined on one of two radiation hybrid (RH) panels or a yeast artificial chromosome (YAC) library containing human genomic DNA. This mapping data was integrated relative to the human genetic map and then cross-referenced to cytogenetic band maps of the chromosomes. (Further details are available in the accompanying article in the 25 October issue of SCIENCE).

Tremendous progress has been made in the mapping of human genes, a major milestone in the Human Genome Project. Apart from its utility in advancing our understanding of the genetic basis of disease, it provides a framework and focus for accelerated sequencing efforts by highlighting key landmarks (gene-rich regions) of the chromosomes. The construction of this map has been possible through the cooperative efforts of an international consortium of scientists who provide equal, full and unrestricted access to the data for the advancement of biology and human health.

There are two types of maps: genetic linkage map and physical map. The genetic linkage map shows the arrangement of genes and genetic markers along the chromosomes as calculated by the frequency with which they are inherited together. The physical map is representation of the chromosomes, providing the physical distance between landmarks on the chromosome, ideally measured in nucleotide bases. Physical maps can be divided into three general types: chromosomal or cytogenetic maps, radiation hybrid (RH) maps, and sequence maps.

2B. Genome-nuclear lamina interactions and gene regulation.

Kind J, van Steensel B. Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, The Netherlands.

The nuclear lamina, a filamentous protein network that coats the inner nuclear membrane, has long been thought to interact with specific genomic loci and regulate their expression. Molecular mapping studies have now identified large genomic domains that are in contact with the lamina.

Genes in these domains are typically repressed, and artificial tethering experiments indicate that the lamina can actively contribute to this repression.

Furthermore, the lamina indirectly controls gene expression in the nuclear interior by sequestration of certain transcription factors.

Mol Cell. 2010; 38(4):603-13. http://dx.doi.org/10.1016/j.molcel.2010.03.016

http://MolecCell.com/Molecular maps of the reorganization of genome-nuclear lamina interactions during differentiation/

Peric-Hupkes D, Meuleman W, Pagie L, Bruggeman SW, Solovei I, …., van Steensel B. Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, The Netherlands.

To visualize three-dimensional organization of chromosomes within the nucleus, we generated high-resolution maps of genome-nuclear lamina interactions during subsequent differentiation of mouse embryonic stem cells via lineage-committed neural precursor cells into terminally differentiated astrocytes. A basal chromosome architecture present in embryonic stem cells is cumulatively altered at hundreds of sites during lineage commitment and subsequent terminal differentiation. This remodeling involves both individual transcription units and multigene regions and affects many genes that determine cellular identity, genes that move away from the lamina are concomitantly activated; others, remain inactive yet become unlocked for activation in a next differentiation step, lamina-genome interactions are widely involved in the control of gene expression programs during lineage commitment and terminal differentiation.

Molecular Maps of the Reorganization of Genome-Nuclear Lamina Interactions during Differentiation

Molecular Cell, Volume 2010; 38 (4): 603-613. http://dx.doi.org/10.1016/j.molcel.2010.03.016

Referred to by: The Silence of the LADs: Dynamic Genome-…

Authors: Daan Peric-Hupkes, Wouter Meuleman, Ludo Pagie, Sophia W.M. Bruggeman, et al.

Various cell types share a core architecture of genome-nuclear lamina interactions. During differentiation, hundreds of genes change their lamina interactions. Changes in lamina interactions reflect cell identity. Release from the lamina may unlock some genes for activation

Fractal “globule”

About 10 years ago - just as the human genome project was completing its first draft sequence - Dekker pioneered a new technique, called chromosome conformation capture (C3) that allowed researchers to get a glimpse of how chromosomes are arranged relative to each other in the nucleus. The technique relies on the physical cross-linking of chromosomal regions that lie in close proximity to one another. The regions are then sequenced to identify which regions have been cross-linked. In 2009, using a high throughput version of this basic method, called Hi-C, Dekker and his collaborators discovered that the human genome appears to adopt a “fractal globule” conformation - a manner of crumpling without knotting.

In the last 3 years, Jobe Dekker and others have advanced technology even further, allowing them to paint a more refined picture of how the genome folds—and how this influences gene expression and disease states. Dekker’s 2009 findings were a breakthrough in modeling genome folding, but the resolution—about 1 million base pairs— was too crude to allow scientists to really understand how genes interacted with specific regulatory elements. The researchers report two striking findings.

First, the human genome is organized into two separate compartments, keeping

* active genes separate and accessible

* while sequestering unused DNA in a denser storage compartment.

* Chromosomes snake in and out of the two compartments repeatedly

* as their DNA alternates between active, gene-rich and inactive, gene-poor stretches.

Second, at a finer scale, the genome adopts an unusual organization known in mathematics as a “fractal.” The specific architecture the scientists found, called

* a “fractal globule,” enables the cell to pack DNA incredibly tightly – the information density in the nucleus is trillions of times higher than on a computer chip — while avoiding the knots and tangles that might interfere with the cell’s ability to read its own genome. Moreover, the DNA can easily Unfold and Refold during

* gene activation,

* gene repression, and

* cell replication.

Dekker and his colleagues discovered, for example, that chromosomes can be divided into folding domains—megabase-long segments within which

genes and regulatory elements associate more often with one another than with other chromosome sections.

The DNA forms loops within the domains that bring a gene into close proximity with a specific regulatory element at a distant location along the chromosome. Another group, that of molecular biologist Bing Ren at the University of California, San Diego, published a similar finding in the same issue of Nature. Dekker thinks the discovery of [folding] domains will be one of the most fundamental [genetics] discoveries of the last 10 years. The big questions now are

* how these domains are formed, and

* what determines which elements are looped into proximity.

“By breaking the genome into millions of pieces, we created a spatial map showing how close different parts are to one another,” says co-first author Nynke van Berkum, a postdoctoral researcher at UMass Medical School in Dekker‘s laboratory. “We made a fantastic three-dimensional jigsaw puzzle and then, with a computer, solved the puzzle.”

Lieberman-Aiden, van Berkum, Lander, and Dekker’s co-authors are Bryan R. Lajoie of UMMS; Louise Williams, Ido Amit, and Andreas Gnirke of the Broad Institute; Maxim Imakaev and Leonid A. Mirny of MIT; Tobias Ragoczy, Agnes Telling, and Mark Groudine of the Fred Hutchison, Cancer Research Center and the University of Washington; Peter J. Sabo, Michael O. Dorschner, Richard Sandstrom, M.A. Bender, and John Stamatoyannopoulos of the University of Washington; and Bradley Bernstein of the Broad Institute and Harvard Medical School.

2C. three-dimensional structure of the human genome

Lieberman-Aiden et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 2009; DOI: 10.1126/science.1181369.

Harvard University (2009, October 11). 3-D Structure Of Human Genome: Fractal Globule Architecture Packs Two Meters Of DNA Into Each Cell. ScienceDaily. Retrieved February 2, 2013, from http://www.sciencedaily.com/releases/2009/10/091008142957

Using a new technology called Hi-C and applying it to answer the thorny question of how each of our cells stows some three billion base pairs of DNA while maintaining access to functionally crucial segments. The paper comes from a team led by scientists at Harvard University, the Broad Institute of Harvard and MIT, University of Massachusetts Medical School, and the Massachusetts Institute of Technology. “We’ve long known that on a small scale, DNA is a double helix,” says co-first author Erez Lieberman-Aiden, a graduate student in the Harvard-MIT Division of Health Science and Technology and a researcher at Harvard’s School of Engineering and Applied Sciences and in the laboratory of Eric Lander at the Broad Institute. “But if the double helix didn’t fold further, the genome in each cell would be two meters long. Scientists have not really understood how the double helix folds to fit into the nucleus of a human cell, which is only about a hundredth of a millimeter in diameter. This new approach enabled us to probe exactly that question.”

The mapping technique that Aiden and his colleagues have come up with bridges a crucial gap in knowledge—between what goes on at the smallest levels of genetics (the double helix of DNA and the base pairs) and the largest levels (the way DNA is gathered up into the 23 chromosomes that contain much of the human genome). The intermediate level, on the order of thousands or millions of base pairs, has remained murky. As the genome is so closely wound, base pairs in one end can be close to others at another end in ways that are not obvious merely by knowing the sequence of base pairs. Borrowing from work that was started in the 1990s, Aiden and others have been able to figure out which base pairs have wound up next to one another. From there, they can begin to reconstruct the genome—in three dimensions.

Even as the multi-dimensional mapping techniques remain in their early stages, their importance in basic biological research is becoming ever more apparent. “The three-dimensional genome is a powerful thing to know,” Aiden says. “A central mystery of biology is the question of how different cells perform different functions—despite the fact that they share the same genome.” How does a liver cell, for example, “know” to perform its liver duties when it contains the same genome as a cell in the eye? As Aiden and others reconstruct the trail of letters into a three-dimensional entity, they have begun to see that “the way the genome is folded determines which genes were

2D. “Mr. President; The Genome is Fractal !”

Eric Lander (Science Adviser to the President and Director of Broad Institute) et al. delivered the message on Science Magazine cover (Oct. 9, 2009) and generated interest in this by the International HoloGenomics Society at a Sept meeting [Pellionisz, Sept. 16, 2009 in Cold Springs Harbor]

First, it may seem to be trivial to rectify the statement in “About cover” of Science Magazine by AAAS.

The statement “the Hilbert curve is a one-dimensional fractal trajectory” needs mathematical clarification.

The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions. A Hilbert space is an abstract vector space possessing the structure of an inner product that allows length and angle to be measured. Furthermore, Hilbert spaces must be complete, a property that stipulates the existence of enough limits in the space to allow the techniques of calculus to be used. A Hilbert curve (also known as a Hilbert space-filling curve) is a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891,[1] as a variant of the space-filling curves discovered by Giuseppe Peano in 1890.[2] For multidimensional databases, Hilbert order has been proposed to be used instead of Z order because it has better locality-preserving behavior.

Representation as Lindenmayer system

The Hilbert Curve can be expressed by a rewrite system (L-system).

While the paper itself does not make this statement, the new Editorship of the AAAS Magazine might be even more advanced if the previous Editorship did not reject (without review) a Manuscript by 20+ Founders of (formerly) International PostGenetics Society in December, 2006 - [only an Abstract by Pellionisz could be published at his Symposium in native Budapest, 2006, AJP].

Second, it may not be sufficiently clear for the reader that the reasonable requirement for the DNA polymerase to crawl along a “knot-free” (or “low knot”) structure does not need fractals. A “knot-free” structure could be spooled by an ordinary “knitting globule” (such that the DNA polymerase does not bump into a “knot” when duplicating the strand; just like someone knitting can go through the entire thread without encountering an annoying knot): Just to be “knot-free” you don’t need fractals. Note, however, that

* the “strand” can be accessed only at its beginning – it is impossible to e.g. to pluck a segment from deep inside the “globulus”.

This is where certain fractals provide a major advantage – that could be the “Eureka” moment for many readers. [Below, citing a heavily spammed email address instead of the secured andras_at_pellionisz_dot_com, the "Heureka explanation" borrows from here - AJP] For instance,

* the mentioned Hilbert-curve is not only “knot free” -

* but provides an easy access to “linearly remote” segments of the strand.

* If the Hilbert curve starts from the lower right corner and ends at the lower left corner, for instance

* the path shows the very easy access of what would be the mid-point

* if the Hilbert-curve is measured by the Euclidean distance along the zig-zagged path.

Likewise, even the path from the beginning of the Hilbert-curve is about equally easy to access – easier than to reach from the origin a point that is about 2/3 down the path. The Hilbert-curve provides an easy access between two points within the “spooled thread”; from a point that is about 1/5 of the overall length to about 3/5 is also in a “close neighborhood”.

This may be the “Eureka-moment” for some readers, to realize that

* the strand of “the Double Helix” requires quite a finess to fold into the densest possible globuli (the chromosomes) in a clever way

* that various segments can be easily accessed. Moreover, in a way that distances between various segments are minimized.

This marvellous fractal structure is illustrated by the 3D rendering of the Hilbert-curve. Once you observe such fractal structure, you’ll never again think of a chromosome as a “brillo mess”, would you? It will dawn on you that the genome is orders of magnitudes more finessed than we ever thought so.

Those embarking at a somewhat complex review of some historical aspects of the power of fractals may wish to consult the ouvre of Mandelbrot (also, to celebrate his 85th birthday). For the more sophisticated readers, even the fairly simple Hilbert-curve (a representative of the Peano-class) becomes even more stunningly brilliant than just some “see through density”. Those who are familiar with the classic “Traveling Salesman Problem” know that “the shortest path along which every given n locations can be visited once, and only once” requires fairly sophisticated algorithms (and tremendous amount of computation if n>10 (or much more). Some readers will be amazed, therefore, that for n=9 the underlying Hilbert-curve helps to provide an empirical solution.

refer to [Andras J. Pellionisz, andras_at_pellionisz_dot_com]

Briefly, the significance of the above realization, that the (recursive) Fractal Hilbert Curve is intimately connected to the (recursive) solution of TravelingSalesman Problem, a core-concept of Artificial Neural Networks can be summarized as below.

Accomplished physicist John Hopfield (already a member of the National Academy of Science) aroused great excitement in 1982 with his (recursive) design of artificial neural networks and learning algorithms which were able to find reasonable solutions to combinatorial problems such as the Traveling SalesmanProblem. (Book review Clark Jeffries, 1991, see also 2. J. Anderson, R. Rosenfeld, and A. Pellionisz (eds.), Neurocomputing 2: Directions for research, MIT Press, Cambridge, MA, 1990):

“Perceptions were modeled chiefly with neural connections in a “forward” direction: A -> B -* C — D. The analysis of networks with strong backward coupling proved intractable. All our interesting results arise as consequences of the strong back-coupling” (Hopfield, 1982).

The Principle of Recursive Genome Function [Pellionisz, 2008 in peer reviewed science article, also disseminated as Google Tech Talk YouTube "Is IT Ready for the Dreaded DNA Data Deluge"] surpassed obsolete axioms that blocked, for half a Century, entry of recursive algorithms to interpretation of the structure-and function of (Holo)Genome. This breakthrough, by uniting the two largely separate fields of Neural Networks and Genome Informatics, is particularly important for

* those who focused on Biological (actually occurring) Neural Networks (rather than abstract algorithms that may not, or because of their core-axioms, simply could not

* represent neural networks under the governance of DNA information).

3A. The FractoGene Decade

from Inception in 2002 to Proofs of Concept and Impending Clinical Applications by 2012

[Below, Pharmaceutical Intelligence lists the yearly milestones of FractoGene. The document that also contains all hyperlinks is here http://www.junkdna.com/the_fractogene_decade.pdf ]

Junk DNA Revisited (SF Gate, 2002)

The Future of Life, 50th Anniversary of DNA (Monterey, 2003)

Mandelbrot and Pellionisz (Stanford, 2004)

Morphogenesis, Physiology and Biophysics (Simons, Pellionisz 2005)

PostGenetics; Genetics beyond Genes (Budapest, 2006)

ENCODE-conclusion (Collins, 2007)

The Principle of Recursive Genome Function (paper, YouTube, 2008)

Cold Spring Harbor presentation of FractoGene (Cold Spring Harbor, 2009)

Mr. President, the Genome is Fractal! (2009)

HolGenTech, Inc. Founded (2010)

Pellionisz on the Board of Advisers in the USA and India (2011)

ENCODE – final admission (2012)

Recursive Genome Function is Clogged by Fractal Defects in Hilbert-Curve (2012)

Geometric Unification of Neuroscience and Genomics (2012)

US Patent Office issues FractoGene 8,280,641 to Pellionisz (2012)

http://www.junkdna.com/the_fractogene_decade.pdf

http://www.scribd.com/doc/116159052/The-Decade-of-FractoGene-From-Discovery-to-Utility-Proofs-of-Concept-Open-Genome-Based-Clinical-Applications

http://fractogene.com/full_genome/morphogenesis.html

[Below, Pharmaceutical Intelligence provides some excerpts from a 2002 article in SF-Gate (the electronic version of San Francisco Chronicle). This is a very lucid overview of the beginnings at 2002 - AJP]

When the human genome was first sequenced in June 2000, there were two pretty big surprises. The first was thathumans have only about 30,000-40,000 identifiable genes, not the 100,000 or more many researchers were expecting. The lower –and more humbling — number means humans have just one-third more genes than a common species of worm.

The second stunner was how much human genetic material — more than 90 percent — is made up of what scientists were calling “junk DNA.”

The term was coined to describe similar but not completely identical repetitive sequences of amino acids (the same substances that make genes), which appeared to have no function or purpose. The main theory at the time was that these apparently non-working sections of DNA were just evolutionary leftovers, much like our earlobes.

If biophysicist Andras Pellionisz is correct, genetic science may be on the verge of yielding its third — and by far biggest — surprise.

With a doctorate in physics, Pellionisz is the holder of Ph.D.’s in computer sciences and experimental biology from the prestigious Budapest Technical University and the Hungarian National Academy of Sciences. A biophysicist by training, the 59-year-old is a former research associate professor of physiology and biophysics at New York University, author of numerous papers in respected scientific journals and textbooks, a past winner of the prestigious Humboldt Prize for scientific research, a former consultant to NASA and holder of a patent on the world’s first artificial cerebellum, a technology that has already been integrated into research on advanced avionics systems. Because of his background, the Hungarian-born brain researcher might also become one of the first people to successfully launch a new company by using the Internet to gather momentum for a novel scientific idea.

The genes we know about today, Pellionisz says, can be thought of as something similar to machines that make bricks (proteins, in the case of genes), with certain junk-DNA sections providing a blueprint for the different ways those proteins are assembled. The notion that at least certain parts of junk DNA might have a purpose for example, many researchers now refer to with a far less derogatory term: introns.

In a provisional patent application filed July 31, Pellionisz claims to have unlocked a key to the hidden role junk DNA plays in growth — and in life itself. His patent application covers all attempts to count, measure and compare the fractal properties of introns for diagnostic and therapeutic purposes.

[The patent with priority date of 2002 is now a USPTO issued patent 8,280,641 in force till 2026 late March. The utility of "diagnostic and therapeutic purposes" has just gained a tremendous new market with "genome editing" unfolding. "Fractal Defects" in the genome producing "Fractal Defects" of the organism (perhaps most importantly, cancer) can not only be matched to the therapeutic agents (chemos) with the highest probability to be effective (80% of chemos are NOT effective for the genome of any particular individual). Beyond this vast market, editing out fractal defects that initiate the derailment of fractal genome regulation hold a key to the ultimate "inner sanctum" of providing genomic cures based on mathematical understanding - AJP]

3B. The Hidden Fractal Language of Intron DNA

[Excerpts from San Francisco Chronicle, 2002 continued] -To fully understand Pellionisz’ idea, one must first know what a fractal is.

Fractals are a way that nature organizes matter. Fractal patterns can be found in anything that has a nonsmooth surface (unlike a billiard ball), such as coastal seashores, the branches of a tree or the contours of a neuron (a nerve cell in the brain). Some, but not all, fractals are self-similar and stop repeating their patterns at some stage; the branches of a tree, for example, can get only so small. Because they are geometric, meaning they have a shape, fractals can be described in mathematical terms. It’s similar to the way a circle can be described by using a number to represent its radius (the distance from its center to its outer edge). When that number is known, it’s possible to draw the circle it represents without ever having seen it before.

Although the math is much more complicated, the same is true of fractals. If one has the formula for a given fractal, it’s possible to use that formula to construct, or reconstruct, an image of whatever structure it represents, no matter how complicated.

The mysteriously repetitive but not identical strands of genetic material are in reality building instructions organized in a special type of pattern known as a fractal. It’s this pattern of fractal instructions, he says, that tells genes what they must do in order to form living tissue, everything from the wings of a fly to the entire body of a full-grown human.

In a move sure to alienate some scientists, Pellionisz has chosen the unorthodox route of making his initial disclosures online on his own Web site. He picked that strategy, he says, because it is the fastest way he can document his claims and find scientific collaborators and investors. Most mainstream scientists usually blanch at such approaches, preferring more traditionally credible methods, such as publishing articles in peer-reviewed journals.

[San Francisco Chronicle could not possess the domain expertise to know that the double-disruption (overturning both of the underlying axioms of Genomics, the JunkDNA and Central Dogmas) not only in 2002 made it impossible to publish with the prevailing bias of "peer review", but even in 2006, along with 20+ leading scientists, worldwide, Science Magazine rejected (without review, a violation of their bylaws...) publication. The enormous utility in the scientific breakthrough compelled the scientist-inventor, now seeking the proper class of entrepreneurs, to swiftly file to USPTO, spend well over a million dollars of his own money (to become the sole inventor and "clean as a whistle" owner), in the struggle to see through the patent, approved over ten years of wrangling, finally USPTO throwing in the towel a week after ENCODE-II killed the Old School Dogmas. Meanwhile, both the mathematical theory, software enabling algorithms had to go beyond the "best methods" of the time of last CIP to patent (2007 - now available as "trade secrets"), and once the priority was secured peer reviewed publications could resume. Noteworthy that the scientist-inventor has published well over 100 peer-reviewed papers before his double-disruptive FractoGene. A previous issued patent of Pellionisz took NASA 10 years to improve the avionics of F15 fighter jets. - AJP]

Basically, Pellionisz’ idea is that a fractal set of building instructions in the DNA plays a similar role in organizing life itself. Decode the way that language works, he says, and in theory it could be reverse engineered. Just as knowing the radius of a circle lets one create that circle, the more complicated fractal-based formula would allow us to understand how nature creates a heart or simpler structures, such as disease-fighting antibodies. At a minimum, we’d get a far better understanding of how nature gets that job done.

The complicated quality of the idea is helping encourage new collaborations across the boundaries that sometimes separate the increasingly intertwined disciplines of biology, mathematics and computer sciences.

Hal Plotkin, Special to SF Gate. Thursday, November 21, 2002. http://www.junkdna.com/Special to SF Gate/plotkin.htm (1 of 10)2012.12.13. 12:11:58/

3C. multifractal analysis

The human genome: a multifractal analysis. Moreno PA, Vélez PE, Martínez E, et al.

BMC Genomics 2011, 12:506. http://www.biomedcentral.com/1471-2164/12/506

Background: Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode.

Results: We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content.

In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information.

Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality.

Additionally, a multifractal map and classification for human chromosomes are proposed.

Conclusions

we propose a descriptive non-linear model for the structure of the human genome,

This model reveals

a multifractal regionalization where many regions coexist that are far from equilibrium and this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome.

Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful.


Future of genomic medicine depends on sharing information: Eric Lander

Feb 26, 2015 01:34 AM , By Special Correspondent | 0 comments

[Eric Lander goes to Bangalore (Tata Auditorium) early March]

Eric S. Lander, one of the principal leaders of the Human Genome Project that mapped the entire human genetic code in 2003, said on Wednesday that the “real genome project” is about studying huge samples of genomic data to identify disease genes.

While phenomenal technological advances had helped reduce the cost of genome sequencing by a million-fold over the last decade, allowing researchers to map thousands of human genomes, the future of genomic medicine depended on “sharing information” between organisations and countries — including India — Professor Lander said.

In order for therapy to emerge from genetic research, “health systems around the world need to turn into learning systems” that share information, said Prof Lander, delivering a lecture on “The Human Genome and Beyond: A 35 year Journey of Genomic Medicine” as part of the three-city Cell Press-TNQ Distinguished Lectureship Series.

Prof. Lander envisaged a “DNA library” where genes can be cross-referenced to detect “spelling differences” and disease genes. The goal before the scientific community now was to find targets for therapeutic intervention, he said, to a packed auditorium comprising a large number of medical students. There was much to be learnt in the course of clinical care, said Prof. Lander, founding director of the Broad Institute of MIT and Harvard University.

While the “breathless hype” created around the Human Genome Project suggested that it would cure all disease in a couple of years, he said much progress had indeed been made over the last decade with the discovery of several genes responsible for diabetes, schizophrenia and heart attacks.

Prof. Lander will be speaking next on Friday at the JN Tata Auditorium in Bengaluru as part of the lectureship series.

[For Pellionisz, in his 2012 Bangalore-Hyderabad-Trivandrum lectureship series the "Fractal Approach" was an "easy sale in India" - where culture is replete of self-similar repetitions:

[Pellionisz' lectureship series in India, selling fractals, 2012]

Pellionisz initiated FractoGene in 2002 as a US patent application not because he is a scientist driven by money (see about a hundred academic publications to geometrize neuroscience). "Fractal Genome Grows Fractal Organism" was in 2002 a "double lucid heresy" (reversing both mistaken dogmas of old-schoool genomics; the "Junk DNA" misnomer and "Central Dogma"). Not only no peer-review would accept it (even in 2006, prior to releasing ENCODE-I Science rejected without review a manuscript submitted with dozens of world-class co-authors). In fact, after publishing the seminal concept in 1989 of fractal recursion to the genome in a Cambridge University Press book (a Proceedings of a Neural Networks meeting in which Pellionisz was on the Program Committee), his ongoing NIH grant was discontinued and the application to a new NIH program, promoting informatics was not accepted (see "acknowledgement" in the 1989 paper). Now the utility is a US patent in force, 8,280,641), academically followed by Lander putting the Hilbert-fractal on the cover of Science magazine (2009). In the "Global $2 Trillion Trilemma" (see essay below) India can contribute with huge numbers of human genomes (both control and cancerous), along with the much less regulated personal data, and much more economical genome-based chemo-matching. Pellionisz put forward this plan as his Proceedings of award-winning lecture-tour in India. Francis Collins toured Bangalore at about the same time, and now Eric Lander has a chance to bring the international collaboration to success with Ratan Tata. The video of Eric' pitch (taped in New Delhi) answers the reporter's question "what is the single biggest thing (towards breakthrough of understanding genomic underpinning of e.g. cancer)?" in an interesting manner: "The diagram of a cell".

With due respect, a fractal diagram of a (Purkinje) cell, generated by the fractal recursive genome function, is already available, and India is keenly aware of the powerful architecture of self-similar repetitions (fractals) both by a presentation and Proceedings.

[Samples from presentation in lecture-tour of Pellionisz in India, 2012]

Eric Lander also visited Bangalore and Chennai, and concluded with the prediction that 'India Will Lead the Genetic Revolution':

The New Indian Express

By Papiya Bhattacharya

BENGALURU:India will lead the genetic revolution, said Broad Institute of MIT and Harvard’s core member Prof Eric Lander, while delivering the last of his lectures in the Cell Press-TNQ India Distinguished Lectureship Series 2015 here on Friday.

“India is a country of a billion people. It has a special role to play because of its huge diversity of environment, people, their exposure to these environments and a large percentage of consanguinity. All these factors can be put to good use to study the existence and function of human genes for India,“ he said.

Lander is one of the leaders of the Human Genome Project. He and his colleagues are known for sequencing the human genome in 2000 and they have standing interest in applying genomics to understand the molecular basis of human physiology and disease.

Lander has a PhD in Mathematics from Oxford University as a Rhodes scholar. He later turned a biologist and a geneticist.

His mathematical talent came in handy when he turned to interpret the human genome and its sequence.

On Friday, he spoke on the history of genetics, its birth in 1911 to 1980 when he and his collaborators spent $3 billion to sequence the human genome.

“Now the job is to find the genes responsible for diseases so that drugs can target those genes and the proteins they make and help in treating diseases,” he said.

The future belongs to precision medicine where all medical decisions, medicines and products will be tailored to suit the patients individual needs of the body and genome, he added.


Genetic Geometry Takes Shape

By: Ivan Amato

February 25, 2015

nuclei from a half-million human cells could all fit inside a single poppy seed. Yet within each and every nucleus resides genomic machinery that is incredibly vast, at least from a molecular point of view. It has billions of parts, many used to activate and silence genes — an arrangement that allows individual cells to specialize as brain cells, heart cells and some 200 other different cell types. What’s more, each cell’s genome is atwitter with millions of mobile pieces that swarm throughout the nucleus and latch on here and there to tweak the genetic program. Every so often, the genomic machine replicates itself.

At the heart of the human genome’s Lilliputian machinery is the two meters’ worth of DNA that it takes to embody a person’s 3 billion genetic letters, or nucleotides. Stretch out all of the genomes in all of your body’s trillions of cells, says Tom Misteli, the head of the cell biology of genomes group at the National Cancer Institute in Bethesda, Md., and it would make 50 round trips to the sun. Since 1953, when James Watson and Francis Crick revealed the structure of DNA, researchers have made spectacular progress in spelling out these genetic letters. But this information-storage view reveals almost nothing about what makes specific genes turn on or off at different times, in different tissue types, at different moments in a person’s day or life.

To figure out these processes, we must understand how those genetic letters collectively spiral about, coil, pinch off into loops, aggregate into domains and globules, and otherwise assume a nucleus-wide architecture. “The beauty of DNA made people forget about the genome’s larger-scale structure,” said Job Dekker, a molecular biologist at the University of Massachusetts Medical School in Worcester who has built some of the most consequential tools for unveiling genomic geometry. “Now we are going back to studying the structure of the genome because we realize that the three-dimensional architecture of DNA will tell us how cells actually use the information. Everything in the genome only makes sense in 3-D.”

Genome archaeologists like Dekker have invented and deployed molecular excavation techniques for uncovering the genome’s architecture with the hope of finally discerning how all of that structure helps to orchestrate life on Earth. For the past decade or so, they have been exposing a nested hierarchy of structural motifs in genomes that are every bit as elemental to the identity and activity of each cell as the double helix.

A Better Genetic Microscope

A close investigation of the genomic machine has been a long time in coming. The early British microscopist Robert Hooke coined the word cell as a result of his mid-17th-century observations of a thin section of cork. The small compartments he saw reminded him of monks’ living quarters — their cells. By 1710, Antonie van Leeuwenhoek had spied tiny compartments within cells, though it was Robert Brown, of Brownian motion fame, who coined the word nucleus to describe these compartments in the early 1830s. A half-century later, in 1888, the German anatomist Heinrich Wilhelm Gottfried von Waldeyer-Hartz peered through his microscope and decided to use the word chromosome — meaning “color body” — for the tiny, dye-absorbing threads that he and others could see inside nuclei with the best microscopes of their day.

During the 20th century, biologists found that the DNA in chromosomes, rather than their protein components, is the molecular incarnation of genetic information. The sum total of the DNA contained in the 23 pairs of chromosomes is the genome. But how these chromosomes fit together largely remained a mystery.

Then in the early 1990s, Katherine Cullen and a team at Vanderbilt University developed a method to artificially fuse pieces of DNA that are nearby in the nucleus — a seminal feat that made it possible to analyze the ultrafolded structure of DNA merely by reading the DNA sequence. This approach has been improved over the years. One of its latest iterations, called Hi-C, makes it possible to map the folding of entire genomes.

The first step in a Hi-C experiment is to treat a sample of millions of cells with formaldehyde, which has the chemical effect of cross-linking strands of DNA wherever two strands happen to be close together. Those two nearby bits might be some distance away along the same chromosome that has bent back onto itself, or they may be on separate but adjacent chromosomes.

Next, researchers mince the genomes, harvest the millions of cross-linked snippets, and sequence the DNA of each snippet. The sequenced snippets are like close-up photos of the DNA-DNA contacts in the 3-D genome. Researchers map these snippets onto existing genome-wide sequence data to create a listing of the genome’s contact points. The results of this matching exercise are astoundingly data-rich maps — they look like quilts of nested, color-coded squares of different sizes — that specify the likelihood of any two segments of a chromosome (or even two segments of an entire genome) to be physically close to one another in the nucleus.

So far, most Hi-C data depict an average contact map using contact hits pooled from all of the cells in the sample. But researchers have begun to push the technique so that they can harvest the data from single cells. The emerging capability could lead to the most accurate 3-D renderings yet of chromosomes and genomes inside nuclei.

In addition, Erez Lieberman Aiden, the director of the Baylor College of Medicine Center for Genome Architecture, and his colleagues have recently cataloged DNA-DNA contacts in intact nuclei, rather than in DNA that previously had to be extracted from nuclei, a step that adds uncertainty to the data. The higher-resolution contact maps enable the researchers to discern genomic structural features on the scale of 1,000 genetic letters — a resolution about 1,000 times finer than before. It is like looking right under the hood of a car instead of squinting at the engine from a few blocks away. The researchers published their views of nine cell types, including cancer cells in both humans and mice, in the December 18, 2014, issue of Cell.

The Power of Loops

Using sophisticated algorithms to analyze the hundreds of millions — in some cases, billions — of contact points in these cells, Aiden and his colleagues could see that these genomes pinch off into some 10,000 loops. Cell biologists have known about genomic loops for decades, but were not previously able to examine them with the level of molecular resolution and detail that is possible now. These loops, whose fluid shapes Dekker likens to “snakes all curled up,” reveal previously unseen ways that the genome’s large-scale architecture might influence how specific genes turn on and off, said Miriam Huntley, a doctoral student at Harvard University and a co-author of the Cell article.

In the different cell types, the loops begin and end at different specific chromosomal locations, so each cell line’s genome appears to have a unique population of loops. And that differentiation could provide a structural basis to help explain how cells with the same overall genome nonetheless can differentiate into hundreds of different cell types. “The 3-D architecture is associated with which program the cell runs,” Aiden said.

What do these loops do? Misteli imagines them “swaying in the breeze” inside the fluid interior of the nucleus. As they approach and recede from one another, other proteins might swoop in and stabilize the transient loop structure. At that point, a particular type of protein called a transcription activator can kick-start the molecular process by which a gene gets turned on.

Misteli muses that each cell type — a liver cell or a brain cell, for example — could have a signature network of these transient loop-loop interactions. Loop structures could determine which genes get activated and which get silenced.

Yet the researchers are careful to note that they’ve only found associations between structure and function — it’s still too early to know for sure if one causes the other, and the direction in which the causal arrow points.

As they mined their data on inter-loop interactions, Aiden, Huntley and their colleagues were also able to discern a half-dozen larger structural features in the genome called subcompartments. Aiden refers to them as “spatial neighborhoods in the nucleus” — the nucleic equivalent of New York City’s midtown or Greenwich Village. And just as people gravitate toward one neighborhood or another, different stretches of chromosomes carry a kind of molecular zip code for certain subcompartments and tend to slither toward them.

These molecular zip codes are written in chromatin, the mix of DNA and protein that makes up chromosomes. Chromatin is built when DNA winds around millions of spool-like protein structures called nucleosomes. (This winding is why two meters of DNA can cram inside nuclei with diameters just one-three-hundred-thousandth as wide.)

A large cast of biomolecular players finesses different swaths of this contorted chromatin into more closed or open shapes. Roving parts of the genomic machine can better access the open sections, and so have a better chance of turning on the genes located there.

The increasingly detailed hierarchical picture of the genome that researchers like Dekker, Misteli, Aiden and their colleagues have been building goes something like this: Nucleotides assemble into the famous DNA double helix. The helix winds onto nucleosomes to form chromatin, which winds and winds in its turn into formations similar to what you get when you keep twisting the two ends of a string. Amid all of this, the chromatin pinches off here and there into thousands of loops. These loops, both on the same chromosome and on different ones, engage one another in subcompartments.

As researchers gradually gain more insight into the genome’s hierarchy of structures, they will get closer to figuring out how this macromolecular wonder works in all of its vastness and mechanistic detail. The National Institutes of Health has launched a five-year, $120 million program called 4D Nucleome that is sure to build momentum in the nuclear-architecture research community, and a similar initiative is being launched in Europe. The goal of the NIH program, as described on its website, is “to understand the principles behind the three-dimensional organization of the nucleus in space and time (the fourth dimension), the role nuclear organization plays in gene expression and cellular function, and how changes in the nuclear organization affect normal development as well as various diseases.”

Or, as Dekker says, “It will finally allow us to see the living genome in action, and that would ultimately tell us how it actually works.

[By the completion of the Human Genome Project in 2001, and especially after the shock of finding next year (2002) that the mouse has essentially the same tiny set of "genes", thinkers had to seek principles of the genome function. This was not easy, since the celebrated principle of genome STRUCTURE (the Double Helix, 1953) biased thinking towards a linear (though twisted) "thread". Nothing can take away the significance of the discovery of double-stranded structure, since it is the basis how the genome propagates itself. Nonetheless, the structure (and its propagation) essentially says nothing about how the genome functions; how the genome governs the growth of living organisms. (Transcription is serial, but different kinds of proteins are produced in parallel even within a single cell, moreover the regulation of production of proteins is obviously interactive in a parallel manner). The above journalistic reminder takes us back to 2002 when Job Dekker (and co-workers)”discovered and developed an experimental method (3C) to measure the frequency of interaction between any two genomic loci. The parallel function of the genome was, therefore, experimentally established. Along a separate line of thinking, since 1989 Pellionisz showed that the single cell of a Purkinje neuron develops branchlets in a parallel fashion (just like any tree grows branchlets and leaves in a parallel fashion, certainly not serially one after the other). Moreover, the growth of the cell has proven to be fractal, requiring the Principle of Recursive Genome Function (Pellionisz, 2008). It was the brilliance of Eric Lander, handed over a copy of the manuscript of "The Principle of Recursive Genome Function" dedicated to him in 2007, that connected the two lines of thoughts by means of the spectacular improvement of Dekker's 3C experimental technique to "Hi-C" by Erez Lieberman. The importance of the principle of "structural closeness" in a massively parallel function is elaborated here. The resulting Science cover article (Lieberman, Mirny, Lander, Dekker et al, 2009) experimentally clinched the "fractal globule" of DNA (theoretically predicted by Grosberg et al, 1988, 1993). Already (at 2002, Pellionisz), "FractoGene" utility IP was secured that "genomic fractals are in a cause-effect relationship with fractal growth of organisms" (8,280,641) - a finding corroborated in case of cancer by Mirny et al (2011, see assorted further independent experimental evidence linking fractal defects of the genome to cancer, autism, schizophrenia and autoimmune diseases in Pellionisz 2012). The correlation of genomic variants with cancer therapies is now an exploding area of activities (see Foundation Medicine, with Founding Adviser Eric Lander, and Roche having invested $1Bn into FMI). Nobody claims (any more) any objection against "The Principle of Recursive Genome Function", and "the fractal approach" is now almost taken for granted in the "New School Genomics" (based on fractal/chaotic nonlinear dynamics, with FractoGene just in "patent trolling mode" estimated at $500 M) with an exclusive license value heralded back in 2002 far surpassing this conservative valuation. Andras_at_Pellionisz_dot_com]


The $2 Trillion Trilemma of Global Precision Medicine

As shown below, BGI of China just bought the San Diego-based Irys System, to try to cope with some analytics of the "Dreaded DNA Data Tsunami". Also, Switzerland-based Roche, that acquired Silicon Valley's Genentech for $44 Bn years ago, now bought into Boston-based Foundation Medicine for a $Bn. All this infiltration of the $2 Trillion US Health Care ("Sick Care", rather), is at the time (see news items below) when the USA officially launched their "Precision Medicine" programs. Similar to the Government/Private Sector duel of Human Genome Project (led by Francis Collins/Craig Venter), now Venter's initiative to sequence 1 million humans towards "precision medicine" was announced (see news below), to be closely followed by the competitive US Government Initiative at $215 M in the 2016 budget.

The point is made here that the $2 Trillion traditional Sick Care service of the USA simply can not be transformed into the newfangled "Genome-based Precision Medicine") - unless it is done globally. The trilemma of either the USA, Asia or Europe doing it alone is just not economically feasible.

As the Battelle Report elaborated (see coverage in this column), the $3Bn Human Genome Project (concluding in 2001) generated about $1 Trillion business in the USA alone.

Motivated by earlier and present numbers (and the identical leaders), let's ponder the expected figures of a most likely several decade-long "Global Precision Medicine Program" (with cancer in the focus).

First, in genomics one of the most often cited guestimate for a single human genome is that the present numbers are based on the "one thousand dollar sequencing and a million dollar analysis". Based on this, the two competitive US initiatives will run well over $2 Trillion (just the DNA sequencing might run up a $Bn bill, as 1 M x $1,000= $1 Bn in EACH US-based initiatives). "Precision Medicine" thus appears to be a very noble goal - but not very good mathematics with a US Government budget-proposal of $220 M next year - even if that budget-item would be approved by Congress.

The $2 Trillion ticket appears more interesting in a global sense. China has announced lately "to shop around in the USA for about $2 Trillion worth". Sony has expressed interest in San Diego-based Illumina. Tata Consultants Services are exploring ways of cooperation with the USA for the needed (colossal amount) of software, needed for e.g. fractal genome analytics. Also, investments from Europe (Roche in pharmaceutics, Siemens in medical instrumentation) round up the global picture. Any reform towards "Precision Medicine" of the present USA "Sick care", a vastly lucrative yearly $2 Trillion dollar for-profit business simply represents way too much inertia to adequately respond to small scale initiatives (in the range of couple of hundred milliion dollars). The US faces the trilemma of either going for it alone (extremely unlikely to succeed in a reasonable time-frame), let either Asia or Europe forge ahead and the US just following the trend - or figure out the best ways of global cooperation, also in terms of economy.

Obviously the best resolution for the trilemma is a choreographed cooperation. Especially, since for instance in the disruption from land-line phone systems to smart mobile phone systems such a transition already took place. Some lessons can be directly used. China and India simply skipped development of their land-line phone system and went directly to the supreme technology (with one billion cell phones used in India). Also in China, hospitals are often too far apart - necessitating a "Precision Therapy technology" that is largely IT-based.

Like with the earlier disruption (in phone service), some key innovations will make a crucial difference - for instance the innovation to locate the exact coordinates of the cell phone user. This enables to serve him/her with "precision service" (whenever location is crucial).

Likewise, Information Theory and Technology of Genome Interpretation is presently the most advanced in the USA. Already, this is the most desired essential component of "Precision therapy". By far the most important challenge is (similar to DNA sequencing), to lower the "one million dollar interpretation" price-tag, Moore-Law style.

Clouds, awesome personal computers (disguished as "smart phones") will not listen to anything but (software-enabling) algorithms.

This is what the FractoGene genome interpretation, a double-disruption of overturning the two most fundamental (but wrong) axioms of Genomics accomplished. "Fractal genome governs growth of fractal organisms".

Implementing the "FractoGene Operator" is a new industry, in the footsteps of advanced geometry of nonlinear dynamics.

Is it something that is entire novel? Not at all. Those who figured out how "fractal laws govern the fractal fluctuation of stock-prices" used the software-enabling algorithms and made fortunes.

andras_at_pellionisz_dot_com


BGI Pushing for Analytics - Research Documents Rapid Detection of Structural Variation in a Human Genome Using BioNano's Irys System

SAN DIEGO and SHENZHEN, China, Feb. 9, 2015 /PRNewswire/ -- BioNano Genomics, Inc., the leader in genome mapping, and BGI, the world's largest genomics organization, highlight the publication of a peer-reviewed research article and its accompanying data* in GigaScience. This article describes the rapid detection of structural variation in a human genome using the high-throughput, cost-effective genome mapping technology of the Irys® System. Structural variations are known to play an important role in human genetic diversity and disease susceptibility. However, comprehensive, efficient and unbiased discovery of structural variations has previously not been possible through next generation sequencing (NGS) and DNA arrays with their inherent technology limitations.

This study showed that the Irys System was able to detect more than 600 structural variations larger than 1kb in a single human genome. Approximately 30 percent of detected structural variations affected coding regions, responsible for making proteins. Proteins participate in virtually every process within cells, suggesting that these structural variations may have a deep impact on human health. The Irys System also accurately mapped the sequence of a virus that had integrated into the genome. The ability to provide this type of information may help inform how virus sequence integration can lead to diseases such as cancer.

"We found that BioNano's Irys System helps overcome the technological issues that have severely limited our understanding of the human genome," said Xun Xu, deputy director at BGI. "In a matter of days and with fewer than three IrysChip®, we were able to collect enough data for de novo assembly of a human genome and perform comprehensive structural variation detection without additional technologies or multiple library preparations. BioNano has since improved throughput of the Irys System enabling enough data for human genome de novo assembly to be collected in one day on a single IrysChip."

Genome maps built using the Irys System reveal biologically and clinically significant order and orientation of functionally relevant components in complex genomes. This includes genes, promoters, regulatory elements, the length and location of long areas of repeats, as well as viral integration sites.

"The Irys System provides a single, cost-effective technology platform solution to assemble a comprehensive view of a genome and discover and investigate structural variations," said Han Cao, Ph.D., founder and chief scientific officer of BioNano Genomics. "The Irys System enables de novo assembly of genomes containing complex, highly variable regions and accurate detection of all types of structural variation, both balanced and imbalanced, within complex heterogeneous samples."

The Irys System has previously been used to map the 4.7-Mb highly variable human major histocompatibility complex (MHC) region and to enable a de novo assembly of a 2.1-Mb region in the highly complex genome of Aegilops tauschii, one of three progenitor genomes that make up today's wheat.

BGI acquired the Irys System in 2014 to enable comprehensive exploration of structural variation in the human genome and to provide vastly improved assemblies for various organisms that have very complex genomic structure, including those organisms where no reference exists. Together with other available platforms, BGI aims to provide researchers with the most comprehensive information and comprehensive interpretation.

The article is one of the first articles that are part of GigaScience's series Optical Mapping: New Applications, Advances, and Challenges (http://www.gigasciencejournal.com/series/OpticalMapping), and is available through this link: http://www.gigasciencejournal.com/content/3/1/34.

*The data for this study, as part of the journal's mission of making published research reproducible and data reusable, are available in the Journal's linked database, GigaDB, at http://dx.doi.org/10.5524/100097

[Francis Collins-based US Government versus Craig Venter-based US Private sector are not in a duel for their sequencing and analysis of 1 million people. BGI, especially when the wholly purchased sequencing technology is fully absorbed (made cheaper, faster, better) than Complete Genomics, quite conceiveably China's BGI with its centralized system combining the advantages of both government-subsidy and global entrepreneurship, could actually beat the two leading US efforts. Don't forget that the Switzerland-based Roche, having acquired Genentech and now Foundation Medicine makes the horse-race at least a foursome. The Shenzhen/San Diego setup of BGI/BioNano Genomics is rather interesting at the outset, not only because of making the sprint truly global, but also because if the found structural variants (no longer SNP-s, but larger than 1kb stretches) are only 30 percent in the coding regions, it means that 70 percent of detected "structural variants" are in the non-coding (in the Old School "Junk") parts of the fractal genome. The "Chinese Solution" to penetrate the vastly lucrative US (cancer) hospital market is also interesting. "They just buy it" - earlier BGI bought the Silicon Valley jewel Complete Genomics to save it from bankruptcy caused by a glut of "dreaded DNA data deluge". In 2014 BGI "just bought the Irys System" (why bother with licensing or infringement?). Incidentally, as calculated below, the true cost of Tsunami (after the 2008 Data Deluge) is estimated at $2 Trillion. This is exactly the Chinese budget to shop around for US technologies and businesses, for about $2 Trillion. andras_at_pellionisz_dot_com]


Round II of "Government vs Private Sector" - or "Is Our Understanding of Genome Regulation Ready for the Dreaded DNA Data Tsunami?"

[News items over the last two weeks, Venter's Private Sector Initiative and the US Government's promise of the same goal (to sequence genomes of 1 million people) inevitably trigger strong memories or earlier markedly similar parallel events. In addition, I warned in my 2008 Google Tech Talk YouTube "Is IT Ready for the Dreaded DNA Data Deluge" that data gathering, in itself, not only falls short of "science" (it is an industry), but if supply of data is not matched with demand might result in unsustainable business model (of DNA sequencing companies). The last seven years have proven that billions of dollars of valuation of "sequencing companies" was lost due to the glut (oversupply) of DNA data without matching analysis. Complete Genomics (a USA-investment, crown jewel of Silicon Valley had to be sold to China for a mere $117M). Data gathering is a necessary, but in itself not a satisfactory ingredient of science. Perhaps the bottom line is best expressed: "Altshuler says. “No amount of genome sequencing would ever lead to a new medicine directly.” The bottleneck is our understanding of genome regulation; Andras_at_Pellionisz_dot_com.]

--

Who was next to President Obama at the perhaps critical get-together (2011)?

[Almost three years prior to President Obama at shoulder-to-shoulder with a cancer patient (see above, Ms. Elana Simon), Obama had the chance to have next to him another cancer patient (see below, Steve Jobs). The iconic leader of the world's most valuable company (Apple) claimed in his memoirs that perhaps he will be the first cancer patient to be cured by (repeated) genome sequencing & rough preliminary analysis. Or, the last one to die, since sequencing of his genome came too late for him, and too early for science. The Silicon Valley IT-Giants (labeled by "Financetwitter") could have decided in February 2011 in the home of John Doerr at the dinner to launch Calico, Google Genomics and the sequencing (and analysis?) of one million humans. It is unclear if at that dinner this decision was debated, or mentioned at all. (Please let me know, andras_at_pellionisz_dot_com). We all wish that Ms. Elana Simon will not necessarily be the "first" whom genome sequencing and precision medicine will help, but certainly will be among the hundreds of millions who will benefit from this effort. Since just sequencing the genome costs at present $1,000, it is clear that the "sequencing part" of the project (both at the government, and at the private sector) is going to be many billions of dollars. (It is very common these days to quote "one thousand dollar sequencing and one million dollar analytics"; with such rates each of the two competing projects should be planned at the Grand Total well over Two Trillion Dollars. Unless a theoretical (software enabling algorithmic) understanding of fractal recursive genome function will crush the perhaps untenable further two trillion dollar debt to a sustainable expenditure. Earlier, see 2008 YouTube a similar projection was made that unless the dreaded DNA data deluge is matched by appropriate analytics, billions of dollars invested into sequencing technologies would provide an oversupply of data - and billions of dollars of investment will be lost - or sold to China (for $117 M).]

Latest NewsU.S. proposes effort to analyze DNA from 1 million people

Reuters

BY TONI CLARKE AND SHARON BEGLEY

WASHINGTON Fri Jan 30, 2015 12:22pm EST(Reuters) - The United States has proposed analyzing genetic information from more than 1 million American volunteers as part of a new initiative to understand human disease and develop medicines targeted to an individual's genetic make-up.

At the heart of the "precision medicine" initiative, announced on Friday by President Barack Obama, is the creation of a pool of people - healthy and ill, men and women, old and young - who would be studied to learn how genetic variants affect health and disease.

Officials hope genetic data from several hundred thousand participants in ongoing genetic studies would be used and other volunteers recruited to reach the 1 million total.

"Precision medicine gives us one of the greatest opportunities for new medical breakthroughs we've ever seen," Obama said, promising that it would "lay a foundation for a new era of life-saving discoveries."

The near-term goal is to create more and better treatments for cancer, Dr. Francis Collins, director of the National Institutes of Health (NIH), told reporters on a conference call on Thursday. Longer term, he said, the project would provide information on how to individualize treatment for a range of diseases.

The initial focus on cancer, he said, reflects the lethality of the disease and the significant advances against cancer that precision medicine has already made, though more work is needed.

The president proposed $215 million in his 2016 budget for the initiative. Of that, $130 million would go to the NIH to fund the research cohort and $70 million to NIH's National Cancer Institute to intensify efforts to identify molecular drivers of cancer and apply that knowledge to drug development.

A further $10 million would go to the Food and Drug Administration to develop databases on which to build an appropriate regulatory structure; $5 million would go to the Office of the National Coordinator for Health Information Technology to develop privacy standards and ensure the secure exchange of data.

The effort may raise alarm bells for privacy rights advocates who have questioned the government's ability to guarantee that DNA information is kept anonymous.

Obama promised that "privacy will be built in from day one."

SEQUENCING 1 MILLION GENOMES

The funding is not nearly enough to sequence 1 million genomes from scratch. Whole-genome sequencing, though plummeting in price, still costs about $1,000 per genome, Collins said, meaning this component alone would cost $1 billion.

Instead, he said, the national cohort would be assembled both from new volunteers interested in "an opportunity to take part in something historic," and existing cohorts that are already linking genomic data to medical outcomes.

The most ambitious of these is the Million Veteran Program, launched in 2011 by the Department of Veterans Affairs. Aimed at making genomic discoveries and bringing personalized medicine to veterans, it has enrolled more than 300,000 veterans and determined DNA sequences of about 200,000.

The VA was a pioneer in electronic health records, which it will use to link the genotypes to vets' medical histories.

Academic centers have, with NIH funding, also amassed thousands of genomes and linked them to the risk of disease and other health outcomes. The Electronic Medical Records and Genomics Network, announced by NIH in 2007, aims to combine DNA information on more than 300,000 people and look for connections to diseases as varied as autism, appendicitis, cataracts, diabetes and dementia.

In 2014, Regeneron Pharmaceuticals Inc launched a collaboration with Pennsylvania-based Geisinger Health System to sequence the DNA of 100,000 Geisinger patients and, using their anonymous medical records, look for correlations between genes and disease. The company is sequencing 50,000 samples per year, spokeswoman Hala Mirza said.

"NAIVE ASSUMPTION"

Perhaps the most audacious effort is by the non-profit Human Longevity Inc, headed by Craig Venter. In 2013 it launched a project to sequence 1 million genomes by 2020. Privately funded, it will be made available to pharmaceutical companies such as Roche Holding AG.

"We're happy to work with them to help move the science," Venter said in an interview, referring to the administration's initiative.

But because of regulations surrounding medical privacy, he said, "we can't just mingle databases. It sounds like a naive assumption" if the White House expects existing cohorts to merge into its 1 million-genomes project.

Venter raced the government-funded Human Genome Project to a draw in 2000, sequencing the entire human genome using private funding in less time than it took the public effort.

Collins conceded that mingling the databases would be a challenge but insisted it is doable.

"It is something that can be achieved but obviously there is a lot that needs to be done," he said.

Collating, analyzing and applying the data to develop drugs will require changes to how products are reviewed and approved by health regulators.

Dr. Margaret Hamburg, the FDA's commissioner, said precision medicine "presents a set of new issues for us at FDA." The agency is discussing new ways to approach the review process for personalized medicines and tests, she added.

(Reporting by Toni Clarke in Washington; Editing by Cynthia Osterman and Leslie Adler)

--

J. Craig Venter, Ph.D., Co-Founder and CEO, Human Longevity, Inc. (HLI) Participates in White House Precision Medicine Event

Prepared Statement by J. Craig Venter, Ph.D.

LA JOLLA, Calif., Jan. 30, 2015 /PRNewswire/ -- It is gratifying to see that the Obama Administration realizes the great power and potential for genomic science as a means to better understand human biology, and to aid in disease prevention and treatment. I was honored to participate in today's White House event outlining a potential new, government-funded precision medicine program.

Since the 1980s my teams have been focused on advancing the science of genomics—from the first sequenced genome of a free living organism, the first complete human genome, microbiome and synthetic cell— to better all our lives.

We founded HLI in 2013 with the goal of revolutionizing healthcare and medicine by systematically harnessing genomics data to address disease. Our comprehensive database is already in place with thousands of complete human genomes, microbiomes and phenotypic information together with accompanying clinical records, and is enabling the pharmaceutical industry, academics, physicians and patients to use these data to advance understanding about disease and wellness, and to apply them for personalized care.

We envisioned a new era in medicine when we founded HLI in which millions of lives will be improved through genomics and comprehensive phenotype data.

Now, through sequencing and analyzing thousands of genomes with private funds – with the goal of reaching 1 million genomes by 2020 – we believe that we can get a holistic understanding of human biology and the individual.

It is encouraging that the US government is discussing taking a role in a genomic-enabled future, especially funding the Food and Drug Administration (FDA) to develop high-quality, curated databases and develop additional genomic expertise. We agree, though, that there are still significant issues that must be addressed in any government-funded and led precision medicine program. Issues surrounding who will have access to the data, privacy and patient medical/genomic records are some of the most pressing.

We look forward to continuing the dialogue with the Administration, FDA and other stakeholders as this is an important initiative in which government must work hand in hand with the commercial sector and academia.

Additional Background on Human Longevity, Inc.

HLI, a privately held company headquartered in San Diego, CA was founded in 2013 by pioneers in the fields of genomics and stem cell therapy. Using advances in genomic sequencing, the human microbiome, proteomics, informatics, computing, and cell therapy technologies, HLI is building the world's largest and most comprehensive database of human genomic and phenotype data.

The company is also building advanced health centers – called HLI Health Hubs – which will be the embodiment of our philosophies of genomic science-based longevity care – where we will apply this learning and deliver it to the general public for the greatest benefit. Individuals and families will be seen in welcoming environments for one-stop, advanced evaluations (advanced genotype and phenotype analysis including whole body MRI, wireless digital monitoring, etc.). Our first prototype center is slated to open in July 2015 in San Diego, California.

--

Obama gives East Room rollout to Precision Medicine Initiative

http://news.sciencemag.org/biology/2015/01/obama-gives-east-room-rollout-precision-medicine-initiative

By Jocelyn Kaiser 30 January 2015 4:15 pm 2 Comments

President Barack Obama this morning unveiled the Precision Medicine Initiative he’ll include in his 2016 budget request to a White House East Room audience packed with federal science leaders, academic researchers, patient and research advocacy groups, congressional guests, and drug industry executives. By and large, they seemed to cheer his plan to find ways to use genomics and other molecular information to tailor patient care.

After poking fun at his own knowledge of science—a model of chromosomes made from pink swim noodles “was helpful to me,” he said—Obama explained what precision medicine is: “delivering the right treatments, at the right time, every time to the right person.” Such an approach “gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen,” he added. He went on to describe the $215 million initiative, which includes new support for cancer genomics and molecularly targeted drug trials at the National Cancer Institute (NCI), and a plan to study links among genes, health, and environment in 1 million Americans by pooling participants in existing cohort studies.

“So if we have a big data set—a big pool of people that’s varied—then that allows us to really map out not only the genome of one person, but now we can start seeing connections and patterns and correlations that helps us refine exactly what it is that we’re trying to do with respect to treatment,” the president explained in his 20-minute speech, flanked by a red-and-blue model of the DNA double helix.

In the room were various patients, from Elana Simon, a young survivor of a rare liver cancer who has helped sequence her cancer type, who introduced the president; to towering former basketball great Kareem Abdul-Jabbar, who apparently takes targeted therapy for his leukemia; and cystic fibrosis patient William Elder, a 27-year-old medical student and guest at the State of the Union address who takes a new drug aimed at the genetic flaw underlying his form of the disease.

Representative Diana DeGette (D–CO), who has been working on 21st Century Cures, a plan to speed drug development, and Senator Lamar Alexander (R–TN), who has similar aims, were also present.

Sitting in the front row were the two lieutenants who will carry out the bulk of the precision medicine plan: National Institutes of Health (NIH) Director Francis Collins and NCI Director Harold Varmus. Another attendee was Craig Venter, who led a private effort to sequence the human genome in the late 1990s that competed with a public effort led by Collins. (Fifteen years ago, Venter sat in the same room with Collins when President Bill Clinton announced the first rough draft of the human genome.) Venter is now CEO of a company called Human Longevity Inc. that aims to sequence 1 million participants’ genomes by 2020—a new private competitor to Collins’s federal cohort study, perhaps.

Many other genome-medical biobank projects at academic health centers and companies are clamoring to be part of the 1 million–person cohort study. NIH will begin to explore which studies to include at an 11 to 12 February meeting (agenda here) that will also examine issues ranging from data privacy to using electronic medical records.

Amid all the hoopla, one prominent human geneticist in the audience offered a cautionary note. David Altshuler, who recently left the Broad Institute for Vertex Pharmaceuticals in Boston, which makes Elder’s cystic fibrosis drug, warns that although the new 1 million American cohort study may uncover new possible drug targets, it will be 10 to 15 years before any such discoveries lead to a successful drug.

“This is the first step,” Altshuler says. “No amount of genome sequencing would ever lead to a new medicine directly.”

---

Pellionisz' 2008 Google Tech YouTube

Forget the genome, Australian scientists crack the 'methylome' for an aggressive type of breast cancer

Sidney Morning Herald

February 3rd, 2015

http://www.smh.com.au/technology/sci-tech/forget-the-genome-australian-scientists-crack-the-methylome-for-an-aggressive-type-of-breast-cancer-20150202-1342al.html

Decoding the letters of the human genome revolutionised scientists' understanding of the role of genetic mutations in many diseases, including about one in every five cancers.

Now a team of Australian scientists have gone a step further, inventing a way to decipher another layer of information that garnishes genes, called methyl groups, which may explain the cause of many more cancers.

Methyl groups hang off sections of DNA like Christmas lights and act like a switch, affecting how genes are expressed in different cell types. Collectively called the methylome, they can also switch off tumour suppressor genes and switch on cancer promoting genes.

Susan Clark from the Garvan Institute of Medical Research and her team have for first the first time translated the methylome of breast cancer, finding distinct patterns associated with different types of breast cancer.

They have also found a way to classify women with the worst type of breast cancer, triple-negative, into two groups; those with a highly aggressive form and those with a lower-risk variety with a longer survival time. At present there is no reliable way to divide triple-negative cancers, which do not respond to targeted treatment, into these sub-groups.

With further testing, methylation signatures may be used as predictive biomarkers that doctors use to prescribe more appropriate treatments for women diagnosed with breast cancer in the future.

Professor Clark's team are the first in the world to sequence large chunks of the methylome from samples of cancer tissue that had been archived for up to two decades.

Using historical samples meant they could trace which methylation patterns were linked to patient survival times.

Cancer specialist Paul Mainwaring, who was not involved in the research, said Professor Clark's new technique to decode the entire methylome will have significant implications for cancer research in general.

"The power of this technology is that it's allowing us to get a much sharper view on how cancer starts, progresses, metastasizes, behaves and a new avenue of treatment," said Dr Mainwaring from ICON Cancer Care in Brisbane.

"We'll still be talking about this paper in 20 years," he said.

While specific faults in a person's DNA sequence have been shown to increase their risk of certain cancers – the BRCA 2 mutation which significantly increases a woman's chance of developing breast tumours – in about two-thirds of cancers there are no changes to the DNA code.

In many of these cases scientists are finding changes to the genome that do not affect the underlying code, principally through DNA methylation.

"Every cancer has some sort of mutational profile, but there are multiple layers of where those abnormalities can occur. This is a giving us the ability to read one of those layers," he said.

Dr Mainwaring said the exciting part about identifying methylation patterns was that they are potentially reversible.

"It's the bit of the genome we may be able to influence most, certain regions can be changed either by diet, exercise or drugs," he said.

Professor Clark and team's research was funded by the National Breast Cancer Foundation and has been published in the leading scientific journal Nature Communications.


Houston, We've Got a Problem!

[FractoGene, 2002 yielded fractal defect mining, consistent with repeats algorithmically described as pyknon-s by Rigoutsos, 2006 disseminated in Google Tech Talk Youtube 2008, a year before the Hilbert-fractal of genome folding appeared on Science Cover in 2009]

Paraphrasing the infamous alarm so well pictured in "Apollo 13" of the US Space Program, one would be urged to cry out now: "USA Genome Project, "We've Got a Problem!"

One thing is amiss, that there is no "Command Center" to call with the increasingly obvious alarm that even Craig Venter articulated years ago about our that "our concepts of genome regulation are frighteningly unsophisticated". The Old School of genomics with the fairy tale of 1.3% Genes and 98.7% of Junk, with the bad joke by Crick's Central Dogma falsely arbitrating that "protein to DNA recursion can never happen" has now totally unraveled. Yet, the "New School of Hologenomics", based on advanced mathematics of non-linear dynamics is only budding after hardly more than its first decade (hear double-degree biomathematician Eric Schadt).

Whom to alert? Though even very small Countries (see Estonian Genome Project, Latvian Genome Project, etc, etc) have their "National Genome Project", the USA-led international project, that led to the $3Bn sequencing of a single genome, the project expired one-and-a-half Decade ago. Some consider the NIH-led "ENCODE" its continuation (2003-2007, prompted e.g. by my personal debate with Dr. Francis Collins at the 50th Anniversary of the Double Helix, arguing the importance of settlling the very disturbing result that only about 20 thousand genes were found, and according to my 2002 FractoGene 98.7% of the human genome was NOT JUNK). ENCODE-II (2007-2012) was even less of a "continuation". ENCODE-II essentially reinforced the surprise that "the human genome is pervasively transcribed", and attached a suspiciously arbitrary-looking number (80%) for the "functional" parts of the genome (the exons and introns of genes plus vast sees of intergenic non-directly-coding DNA). However, neither the original US-led Human Genome Project, nor ENCODE I-II addressed the basic question of algorithmic interpretation of (recursive) genome function.

In the absence of any overarching "USA Genome Project" (NIHGR, DoE, NSF, DARPA etc. compete for taxpayer dollars, thus by definition their activities are scattered), whom to alert, for instance, that "microexons" (see two articles below) await not only a definition, but are often self-contradictory? For instance, a paper lists "microexons" of 1nt "long". Since "exon" is defined as protein-coding sequence (of triplets of A,C,T,G in an open reading frame), nothing shorter than 3nt can be called "microexon". Since a single base can not code for protein (amino-acid, rather), the referred single nucleotide could well be part of an "intron". The mathematically dual valence of exons, introns and intergenic non-coding DNA was exposed in a Springer Textbook, but the advanced mathematics of e.g. the significance of dual valence (and fractal eigenstates) are not easily digestible for non-mathematically-minded workers. This is most unfortunate, since after the "genome disease" a.k.a cancers now autism established the case that these major diseases are so complex, involving myriads of coding and non-coding DNA structural variants that the recent Newsweek cover applies "You can not cure a disease that you do not understand". By now it is totally clear that neither cancer nor autism could even be cured, and not even understood, without an algorithmic (mathematical) approach to genome regulation. It is commandable, therefore, that one of the leading "agencies" is not at all an "agency" in the government-sector - but the charitable Simons Foundation (headed by the most accomplished mathematician, Jim Simons, who made $Billions with his stock-market algorithms). Mathematics is also not much of a problem for world-leader Information Technology companies (e.g. my Google Youtube points out near to its end that even the Internet is fractal). Thus, Google Genomics, Amazon Web Services, IBM in the USA, and SAP or Siemens in Germany, Samsung, Sony or even TATA in Asia are the entities that are likely to heed (and lucratively profit from) this "alert". One challenge is, that cross-domain expertise (genomics AND informatics) is required, that is presently a still somewhat unusual combination - but advisership is available. Andras_at_Pellionisz_dot_com


Small snippets of genes may have big effects in autism

Print Kate Yandell

22 January 2015

Small pieces of DNA within genes, dubbed ‘microexons,’ are abnormally regulated in people with autism, suggests a study of postmortem brains published 18 December in Cell1. These sequences, some as short as three nucleotides, moderate interactions between key proteins during development.

“The fact that we see frequent misregulation in autism is telling us that these microexons likely play an important role in the development of the disorder,” says lead researcher Benjamin Blencowe, professor of molecular genetics at the University of Toronto.

Genes are made up of DNA sequences called exons, separated by swaths of noncoding DNA. These exons are mixed and matched to form different versions of a protein. This process, called alternative splicing, is thought to be abnormal in autism.

Many sequencing studies tend to skip over microexons because they are not recorded in reference sequences. Although researchers have known about microexons for decades, they were unsure whether the small segments had any widespread purpose.

The new study confirms microexons’ importance, suggesting that these tiny sequences can have big effects on brain development.

“It’s really a new landscape of regulation that’s associated with a disorder,” says Blencowe. “We have a big challenge ahead of us to start dissecting the function of these microexons in more detail.”

Blencowe and his team developed a tool that flags short segments of RNA flanked by sequences that signal splice sites. They used the tool to identify microexons in RNA sequences from various cell types and species throughout development.

In the brain, microexons are highly conserved across people, mice, frogs, zebrafish and other vertebrates. Alternatively spliced microexons are more likely to be present in neurons than in other cell types, suggesting that they have an important, evolutionarily conserved role in neurons.

Irregular splicing:

The researchers analyzed patterns of microexon splicing in the postmortem brains of 12 people with autism and 12 controls between 15 and 60 years of age.

Nearly one-third of alternatively spliced microexons are present at abnormal levels in autism brains compared with control brains, they found. By contrast, only 5 percent of exons longer than 27 nucleotides are differentially spliced in autism brains.

Genes with microexons that are misregulated in autism tend to be involved in the formation of neurons and the function of synapses — the junctions between neurons. Both of these processes are implicated in autism.

Microexons are particularly likely to be misregulated in autism-linked genes, such as SHANK2 and ANK2. What’s more, the expression of a gene called nSR100, which regulates splicing of microexons, is lower in the brains of people with autism than in those of controls.

One future goal is to determine the biology underlying these differences, says Daniel Geschwind, director of the University of California, Los Angeles Center for Autism Research and Treatment. nSR100 belongs to a module of genes that includes transcription factors — which regulate the expression of other genes — and those that modify chromatin, which helps package DNA into the nucleus. Many of these genes have known links to autism.

To look at microexon splicing throughout development, Blencowe and his team sequenced RNA from mouse embryonic stem cells as they differentiated into neurons. Microexon levels tend to spike after the cells finish dividing, hinting at a role in the late stages of neuronal maturation.

Studying microexon regulation at various stages of normal development in people is another logical next step, says Lilia Iakoucheva, assistant professor of psychiatry at the University of California, San Diego, who was not involved in the study. “Then, of course, we can study gene expression in autism brains and then talk about what’s regulated correctly and what’s misregulated.”

As a complement to the postmortem data, the researchers could also look at how microexons are regulated in developing neurons derived from people with autism, says Chaolin Zhang, assistant professor of systems biology at Columbia University in New York, who was not involved in the study.

“We should not underestimate the potential of more detailed characterization of these splicing variants,” he says. “They really expand the genome and [its] complexity in an exponential way.”

Yang Li, a postdoctoral fellow at Stanford University in California also applauds the attention to the microexons. “There’s still not enough recognition that different [forms of proteins] can have very different functions,” he says. “This is especially true in the brain.”

In an independent study published in December in Genome Research, Li and his colleagues reported that microexons in the brain tend to encode amino acids in locations that are likely to affect protein-protein interactions2. They also found that the autism-linked RBFOX gene family regulates microexon splicing in the brain.

“I definitely think that microexons are important because of how conserved they are in terms of brain function,” says Li. “But I don’t know if they cause autism.”

News and Opinion articles on SFARI.org are editorially independent of the Simons Foundation.

References:

1. Irimia M. et al. Cell 159, 1511-1523 (2014) PubMed

2. Li Y.I. et al. Genome Res. 25, 1-13 (2015) PubMed


Autism genomes add to disorder's mystery

By GEOFFREY MOHAN

Los Angeles Times,

January 26, 2015

Less than a third of siblings with autism shared the same DNA mutations in genes associated with the disorder, according to a new study that is the largest whole-genome sequencing for autism to date.

Canadian researchers sequenced whole genomes from 170 siblings with autism spectrum disorder and both their parents. They found that these sibling pairs shared the same autism-relevant gene variations only about 31% of the time, according to the study published online Monday in the journal Nature Medicine.

More than a third of the mutations believed to be relevant to autism arose in a seemingly random way, the study also found.

“It isn’t really autism; it’s autisms,” said the study’s lead investigator, Dr. Stephen W. Scherer, head of the Center for Applied Genomics, Genetics and Genome Biology at the Hospital for Sick Children in Toronto. In some cases, he added, “it’s like lightning striking twice in the same family.”

The results are part of 1,000 whole genomes that are being made available to researchers via a massive Google database that autism advocates hope will grow to 10 times that size by next year.

The effort, spearheaded by the research and advocacy group Autism Speaks, has been somewhat controversial from the start, with some questioning whether results from the relatively costly and time-consuming process will be too complicated or obscure to yield significant breakthroughs.

Indeed, researchers associated with the effort acknowledged that much of their data remain a mysterious ocean of jumbled, deleted or inserted DNA code, much of which is not located on areas of the genome that program the proteins that directly affect biological functions.

“You might expect that you’d see some commonalities in the mutations between kids in the same family, but that’s actually not the case here,” said Rob Ring, chief science officer of Autism Speaks. “We’re not really sure what might explain that at this time.”

Said Scherer: “We’ve really just scratched the surface of this data.”

That’s where Google’s cloud-based data capabilities will come in, according to Ring and Scherer. Making these whole genomes – potentially 10,000 of them – available to any researcher could yield unexpected connections and order in data that are the equivalent of more than 13 years of streaming high-definition television programming.

Even the more limited data from several hundred genomes sequenced in the study proved difficult to handle. “We couldn’t transfer it over the Internet,” said Scherer. “We had to buy hard drives and Fed-Ex them.”

Autism Speaks hopes the database will attract researchers from varied fields, including those outside of genetics.

“It may be a genetic code as it rolls off of sequencers, but it’s just data and numbers,” Ring said.

Other sequencing studies have examined more children diagnosed with autism, but involved single siblings with the diagnosis and have focused on a narrower part of the genome – a little more than 1% of the genome that codes the proteins that carry out biological processes.

The Canadian study is the largest of so-called "multiplex" families with more than one child diagnosed with the disorder.

The researchers had examined a smaller batch of 32 family genomes in 2013, uncovering damaging variations of four genes not previously correlated to autism spectrum disorder. That study also identified mutations in 17 other known or suspected autism genes. The small variations in DNA coding it found accounted for about 19% of the autism cases, the study found.The current study found autism-relevant mutations in 36 of the 85 families studied. Those mutations were shared by siblings in only 11 of those 36 families, and 10 of those were inherited.

Advocates for whole-genome sequencing argue that their approach picks up all kinds and sizes of mutations, including much smaller additions and deletions of code, than are detected in other forms of sequencing. The study noted that more than 95% of one particular category of coding variation would have been missed by narrower approaches.

The cost and time involved in whole genome sequencing are rapidly declining, while cloud-based computing opens up massive computational power that could potentially make sense of the vast database, advocates say.


Critics have argued that turning up more small oddities may not necessarily be helpful, given that many are so rare that it will be hard to make any statistical sense of them. Even some of the strongest “autism gene” candidates are associated with only a small fraction of autism cases, they note.

Still, genomics is increasingly examining the potential roles of vast stretches of DNA that do not directly code proteins, or that lie outside of genes. Those areas can affect how genes are expressed and how they interact with the environment.

Autism Speaks has committed $50 million to the whole-genome sequencing effort so far, Ring said. The portal to the 1,000 genomes should be in place by the second quarter this year, he said.


Hundreds of Millions Sought for Personalized Medicine Initiative

Jan 26, 2015

US President Barack Obama will seek hundreds of millions of dollars to fund the new personalized medicine initiative he announced in his State of the Union address last week, the New York Times reports.

Such a program would bring about "a new era of medicine  —  one that delivers the right treatment at the right time," Obama said in his speech.

According to the Times, this initiative may have broad, bipartisan support. "This is an incredible area of promise," says Senator Bill Cassidy (R-La.), who is also a gastroenterologist.

The funds would go to both the National Institutes of Health to support biomedical research and to the Food and Drug Administration to regulate diagnostic tests.

Ralph Snyderman, the former chancellor for health affairs at Duke University, tells the Times that he is excited by the prospect of the initiative. "Personalized medicine has the potential to transform our health care system, which consumes almost $3 trillion a year, 80 percent of it for preventable diseases," Snyderman says.

Though new treatments are expensive, Snyderman says personalized therapies will save money, as they will only be given to people for whom they'll work.

[The purpose of "State of the Union Address" by US Presidents is to seek maximally broad-based political support. Thus, most everybody gets a little of the thinly spread promises. However, any "Initiative" would have to be 1) worked out by experts, 2) pushed through (often requiring years) the legislative system of Congress. While according to the above it is questionable how much effect and when such "initiative" might have e.g. on the NIH (with already a thirthy thousand millions of dollars, yearly, thus "hundreds of millions" might barely make a dent with NIH). The Statement might be very useful to stimulate task # 1) (to work out by domain experts the most cost-effective plan). In this regard, in the multiple quality of a) someone whose NIH grant-continuation was cut in 1989 when the colossal disruption by Genomics became a "perceived threat on the establishment" (see acknowledgement in Pellionisz, 1989), b) someone who already contributed to governement-blueprints, see "Decade of the Brain Initiative", c) someone who worked out the mathematical (geometrical) algorithmic approach to unification of neuroscience and genomics (Pellionisz et al, 2013) this worker would add two further improvements that the US government could plan for - if influencing by "hundreds of millions of dollars" the "$3 trillion dollar health care system" is meant as a real catalyzer. First, with the new involvement of the government in health care insurance system, some "catalyzer monies" could be well spent to shape the US health insurance system into the direction of Germany (see news below), France, UK, Canada (where instead of a for-profit "sick care system" health-care is a non-profit government service). Second, (as the news below also clearly indicates), "personalized medicine" will happen by massive involvement of Information Technology giants (SAP in Germany, Google Genomics, Amazon Web Services, IBM etc. in the USA). These monstrous companies, however, typically have a rather hard time embracing "paradigm-shifts" (see the classic best-seller of Christensen "The Innovator's Dilemma"). Indeed, there is a new crop of "personalized medicine start-ups" in the USA (most notably Foundation Medicine in Boston, that is already a post-IPO $Bn business). Government incentives on the scale of "hundreds of millions of dollars" could boost the (existing) "SBIR programs" seeking innovative IT-based solutions for personalized medicine. This is all the more important, since judging from the past history, informatics falls much more into the forte of NSF, DOE, DARPA (etc), rather than the mostly still "old schooler"-dominated NIH. This opinion could be based on the Memoirs of Mandelbrot, that recalls the opportunity "to mathematize biology". The now late Mandelbrot deliberately declined the offer (though it came along with ample funding) since his opinion "biologists were not ready for advanced mathematics" (an opinion he upheld till his passing away; The Fractalist, 2012). This worker would like to note here, that there is also a third, much superior opportunity as well, to be elaborated elsewhere. Andras_at_Pellionisz_dot_com.]


SAP Teams with ASCO to Fight Cancer

[SAP of Germany uses Big Data to fight cancer together with USA - Video]

SAP is teaming with the American Society of Clinical Oncology (ASCO) to develop CancerLinQ, a big data solution that will transform care for cancer patients.The collaboration brings data and expertise from ASCO, a non-profit physician group with over 35,000 members worldwide, onto SAP HANA. CancerLinQ will give doctors new insights in seconds when they are deciding on personalized treatment plans with patients.

[In the USA Health Care ("Sick Care", rather) is well known to be a for-profit business. Thus, it is in the best interest to both hospital systems as well as Pharma to try as many chemo-s on a single patient, as possible. Since 80% of chemos do NOT work for any particular individual, there is a lot of "repeat customer mode" for "sick care" to experiment on humans. This is fortunately not true for countries like Germany, France, UK, Canada (even China...) where Health Care is NOT a for-profit business, but a government-paid public service. For the government budget, it is extremely important for such countries to minimize the ineffective expenditure - and e.g. in Germany that is rich enough to afford expensive cancer-medication but smart and motivated enough to use "Big Data" (genome-matching) to personalize cancer medicine both SAP and Siemens are already engaged in "genome-matched chemo-personalization". In the USA, at least 3 major IT companies (Google Genomics, Amazon/Illumina, IBM/New York Genome Center) already engaged in genome analytics - and e.g. Boston-based Foundation Medicine is already a post-IPO business beyond $Bn valuation). Now the USA is facing an increasingly more potent, and much more motivated competition from Germany, Japan (Riken/Sony), Korea (Samsung) and even China (BGI). While an earlier trend used to be to travel to the USA for the best medical care, these days some cancer patients leave the USA for Germany for more personalized medicine. A key to the best matching is THE ALGORITHM - andras_at_pellionisz_dot_com]


Human Longevity, Genentech Ink Deal to Sequence Thousands of Genomes

Jan 14, 2015 | a GenomeWeb staff reporter

NEW YORK (GenomeWeb) – Human Longevity today announced it has signed a multi-year agreement with Genentech to conduct whole genome sequencing and analysis on tens of thousands of patient samples provided by the drug developer.

Human Longevity will sequence the genomes at 30x coverage with the Illumina HiSeq X Ten machines in its genomic sequencing center, the firm said in a statement.

"We are excited to be working with Genentech so that patient samples can be analyzed according to more precise genetic categories," Human Longevity CEO Craig Venter said in a statement. "The application of our capabilities to discover new diagnostics and targeted therapies is one of the most relevant today."

Genentech Senior VP James Sabry also said that the partnership would advance the firm's drug discovery program.

All sample and patient data elements will be de-identified to protect privacy, the firms added.

Financial details of the agreement were not disclosed.

Human Longevity continues to sign deals giving it more genomes to sequence as it builds its human genotype and phenotype database. Earlier this week, the firm announced it had signed a deal to sequence genomes for the oncology testing firm Personal Genome Diagnostics. In November 2014, the firm signed a deal to gain access to the Twins UK registry and sequence samples from it.

Last week, Genentech signed a deal with 23andMe to sequence the genomes of 3,000 people in the Parkinson's disease community.

[Craig Venter churns it up, again! The announcement is somewhat uncharacteristically understated. The title does not mention that there is no "Genentech" (it is a subsidiary of Roche), and glosses over the brilliance how Craig's latest move towards the private sector put not just Roche, but also Illumina, Amazon and Google into a fiercely competitive mode - serving the interest of science (Craig Venter's style...). Venter rather recently appeared to compete against Google (by snatching Franz Och). As we know, Craig answered the rhetorical question "what's the difference between Celera and God?" by answering "we had computers". IBM wanted to do it for him for free - but he built the largest computer system, instead. Now Illumina could either remain "the King" by providing sequencers - or by a monopoly on algorithms can in addition either catapult Amazon Web Services, or the competitors (Google and/or IBM).The world will never be the same - andras_at_pellionisz_dot_com]


UCSC Receives $1M Grant from Simons Foundation to Create Human Genetic Variation Map

Jan 13, 2015

|

a GenomeWeb staff reporter

NEW YORK (GenomeWeb) – Researchers at the University of California Santa Cruz's Genomics Institute have received a grant for up to $1 million from the Simons Foundation that will support a one-year pilot project to create a comprehensive map of human genetic variation for biomedical research.

Co-leading the project is David Haussler, a professor of biomolecular engineering and director of the Genomics Institute at UC Santa Cruz, and Benedict Paten, a research scientist at the Genomics Institute.

They'll work with scientists at the Broad Institute, Memorial Sloan Kettering Cancer Center, UC San Francisco, Oxford University, the Wellcome Trust Sanger Institute, and the European Bioinformatics Institute to develop algorithms and formulate the best mathematical approaches for constructing a new graph-based human reference genome structure that will better account for and reflect the different kinds of variation that occur across populations. They'll test algorithms developed as part of the project on tricky parts of the genome within the first six months of the pilot, Paten said in a statement.

The researchers will use a dataset of more than 300 complete and ethnically diverse human genomes sequenced by researchers at the Broad Institute to construct the reference structure and they'll also leverage work done to create a standard data model for the structure by members of the reference variation task team, a subgroup of the data working arm of the Global Alliance for Genomics and Health that Paten co-leads.

The project aims to overcome the limitations of the current model for analyzing human genomic data, which relies on mapping newly sequenced data to a single set of arbitrarily chosen reference sequences resulting in biases and mapping ambiguities. "One exemplary human genome cannot represent humanity as a whole, and the scientific community has not been able to agree on a single precise method to refer to and represent human genome variants," Haussler said in a statement. "There is a great deal we still don't know about human genetic variation because of these problems."

Paten added that the proliferation of different genomic databases within the biomedical research community has resulted in hundreds of specialized coordinate systems and nomenclatures for describing human genetic variation. This poses problems for tools such as the widely used UCSC Genome Browser which was developed and is maintained by UCSC researchers. "For now, all our browser staff can do is to serve the data from these disparate sources in their native, mutually incompatible formats," Paten said in a statement. "This lack of comprehensive integration, coupled with the over-simplicity of the reference model, seriously impedes progress in the science of genomics and its use in medicine."

The diversity of genomes in the Broad's dataset, Paten continued, offers a rich data resource that will be used "to define a comprehensive reference genome structure that can be truly representative of human variation." The plan is eventually to expand the graph-structure to include many more genomes, he said.

The researchers expect to have a draft variation map available by the end of the year. Paten and Haussler have also outlined the follow-up activities needed to extend the pilot project and fully realize their vision for the new map.

The new map will make it easier to detect and analyze both simple and complex variants that contribute to conditions with a genetic component such as autism and diabetes. It will also be a valuable tool for understanding recent human evolution, according to the researchers.

[The news talks about "algorithms" and "maps" (of genomic variations). Given that Jim Simons is a most brilliant mathematician (with autism in the family), it is more likely that he invested this sum, relatively minor on his scale, towards having more "algorithms", rather than just"maps" around. "Pathways" and "maps" already abound - both mathematicians and computers are yearning for software-enabling ALGORITHMS to call genomic variants responsible for human diversity from pathological genomic variants. It is almost self-evident that some variants are "self-similar" - thus one of the many (?) algorithmic approaches might be a measure of self-similarity (fractality). andras_at_pellionisz_dot_com]


Silencing long noncoding RNAs with genome-editing tools with full .pdf.

Methods Mol Biol. 2015;1239:241-50. doi: 10.1007/978-1-4939-1862-1_13.

Gutschner T.

Abstract

Long noncoding RNAs (lncRNAs) are a functional and structural diverse class of cellular transcripts that comprise the largest fraction of the human transcriptome. However, detailed functional analysis lags behind their rapid discovery. This might be partially due to the lack of loss-of-function approaches that efficiently reduce the expression of these transcripts. Here, I describe a method that allows a specific and efficient targeting of the highly abundant lncRNA MALAT1 in human (lung) cancer cells. The method relies on the site-specific integration of RNA-destabilizing elements mediated by Zinc Finger Nucleases (ZFNs).

See full .pdf of Chapter 13 here

[Genome Editing, an effort that has long been brewing and broke through with full force by 2015 calls for a crucially important "heads up". In earlier times, efforts towards an effective modification of the genome used to be labelled as "Gene Surgery". Thus, some readers may be under the impression that the classic misunderstanding (that the "genome is your destiny and there is no way to change it") needs perhaps only a slight cosmetics; changes of gene(s) (the protein-coding, though not contiguous, but fractally scattered parts of the genome) could, in theory, be altered. This recent paper should totally dispell any such misunderstanding. First, the paper is not even about "genes" and "non-coding DNA" of the genome - but provides an experimentally verifiable method to alter the function of the (mistakenly believed as "function-less" RNA, more particularly of "Long noncoding RNAs (lncRNAs). The effort would be totally misspent if lncRNAs were without important function in genome regulation - critical to cancer(s), in this case lung cancer, one of the most dreadful and rampant diseases. The first words of the abstract, however, clinch that lncRNAs are "a functional and structural diverse class of cellular transcripts that comprise the largest fraction of the human transcriptome". The Fractal Approach (FractoGene), since its inception (concept in 1989 and utility in 2002) has long been kept at bay (in order to delay a humanly and materially very expensive total paradigm-shift as long as possible) by the rationale that "what is the importance of a mathematical (algorithmic) theory of fractal recursive genome function"? For some time the answer was "to find fractal defects in the genome that are in a cause-and-effect relationship with e.g. cancer devepment by misregulation". While in itself the reason has been totally justified (as a recent cover issue in Newsweek on cancer very properly stated "You can not cure a disease that you do not understand" - and scribbling some equations underneath the graphics), with Genome Editing that (also) matured over the "wilderness of genomics" (1953 of Double Helix to end of Encode-2 in 2012), the enormous importance of "fractal defect mining" resulting in "genome editing" can be trivialized even for those in elementary schools. Before "spelling checkers" and "word processors" anybody could write (as this columnist, for whom English is the sixth language...) maybe important sets of letters, but occasionally laden with typos. In natural languages such errors are not nearly as important as e.g. in "computer languages" (codes, rather). Anybody who ever wrote a line of code knows all too well, that a freshly written code (even if it is "interpreted", not "compiled") for best results should undergo the dove-tailing process of "syntax checking" and subsequently the "debugging". (A recursive computer code may produce an infinitely repeating "uncontrolled cycle" is the "stop" symbol is missing or is at an error. While it is common sense that "wash, rinse, repeat" is "meaningful enough", coders itch to add "after repeating the cycle n times, do not cycle it for the n+1 occasion. This trivialization may not be superfluous since it also brings up the question that came back to fashion after 20 years "how similar, or profoundly different are natural languages from the code of recursive genome function". Withouth a serious probe into this question (for which the NIH newly allocated $28 million), perhaps one might want to read & cite (beyond the 28 citation) the full pdf of 2008 peer-reviewed paper on "The Principle of Recursive Genome Function" - andras_at_pellionisz_dot_com]

Long noncoding RNAs (lncRNAs) are a functional and structural diverse class of cellular transcripts that comprise the largest fraction of the human transcriptome. However, detailed functional analysis lags behind their rapid discovery. This might be partially due to the lack of loss-of-function approaches that efficiently reduce the expression of these transcripts. Here, I describe a method that allows a spe- cific and efficient targeting of the highly abundant lncRNA MALAT1 in human (lung) cancer cells. The method relies on the site-specific integration of RNA-destabilizing elements mediated by Zinc Finger Nucleases (ZFNs).

Key words Cancer, CRISPR, Genome engineering, Homologous recombination, MALAT1, LncRNA, Single cell analysis, TALEN, Zinc finger nuclease

1 Introduction

LncRNAs represent a novel and exciting class of transcripts usually defined by their size (>200 nucleotides) and the lack of an open reading frame of significant length (<100 amino acids). Several studies link the expression of these transcripts to human diseases, e.g., cancer [1]. Functional analysis using RNA interference- mediated knockdown approaches are a common strategy to infer a gene’s cellular role. However, these widely used approaches have multiple limitations [2] and might have limited efficiency for lncRNA research due to the intracellular localization (nuclear) and secondary structure of a large fraction of lncRNA molecules.

To overcome these limitations, a novel gene targeting method was developed to reduce the expression of the lncRNA MALAT1 in human A549 lung cancer cells [3]. MALAT1 is a ~8 kb long, highly abundant, nuclear transcript which was originally discov- ered in a screen for lung cancer metastasis associated genes [4, 5]. The targeting method relies on the site-specific integration of a selection marker (here: GFP) and RNA-destabilizing elements or

Shondra M. Pruett-Miller (ed.), Chromosomal Mutagenesis, Methods in Molecular Biology, vol. 1239, DOI 10.1007/978-1-4939-1862-1_13, © Springer Science+Business Media New York 2015

241

Chapter 13

transcriptional stop signals, e.g., poly(A) signals, into the pro- moter region of the MALAT1 gene. The integration is mediated by ZFNs that specifically introduce a DNA double-strand break (DSB) [6]. The induced DNA damage activates the cellular repair pathways, namely, Nonhomologous end joining (NHEJ) or Homologous Recombination (HR). By providing an appropriate template (donor plasmid) the HR pathway can be used to repair the DSB and to integrate exogenous DNA sequences (Fig. 1). Application of this method to human lung cancer cells yielded a stable, specific and more than 1,000-fold reduction of MALAT1 expression and functional analysis established MALAT1 as an active regulator of lung cancer metastasis [7]. Importantly, the methods’ concept is of broad applicability and allows targeting of protein-coding genes as well as other lncRNAs using any kind of recently developed genome targeting tools, e.g., ZFNs, TALENs, or the CRISPR/Cas9 system.

Store all components according to manufacturer’s recommenda- tions. Use ultrapure water for nucleic acid analysis. ZFNs are com- mercially available from Sigma-Aldrich. Alternative methods were described that allow homemade generation of ZFNs [8, 9] or fast assembly of TALENs [10]. CRISPR/Cas9 plasmids are available from Addgene.

ZF

Fok I

ZF ZF2.1 Cloning

1. Plasmid containing a selection marker of choice, e.g., Green fluorescent protein (GFP) followed by a poly(A) signal, e.g., bovine growth hormone (bGH) poly(A) signal.

2. Genomic DNA from cell line(s) subjected to modifications.

3. Genomic DNA isolation kit.

4. Proofreading DNA Polymerase.

5. Cloning primer for homology arms with appropriate restric- tion sites.

6. Agarose and agarose gel chamber. 7. Gel purification kit. 8. Restriction enzymes needed for cloning of homology arms. 9. PCR purification kit.

10. T4 DNA Ligase.

11. Competent bacteria.

12. LB-Medium: 5 g/L yeast extract, 10 g/L Tryptone, 10 g/L NaCl.

13. LB-Agar plates: LB-Medium with 15 g/L Agar. 14. Antibiotics, e.g., Ampicillin, Kanamycin. 15. Plasmid DNA preparation kits.

1. Cell line of choice.

2. Appropriate complete cell culture medium for cell line of inter- est containing supplements, serum, and antibiotics where appropriate.

3. Transfection reagent of choice. 4. Cell culture plates (96-well, 24-well, 6-well, 10 and 15 cm). 5. 0.05 or 0.25 % Trypsin-EDTA. 6. Phosphate-buffered saline (PBS). 7. 12×75 mm tube with cell strainer cap. 8. Conical centrifuge tubes.

1. Cell sorter.

2. Power SYBR Green Cells-to-CT Kit (Life Technologies, Carlsbad, CA, USA).

3. qPCR primer for reference and target gene.

4. DirectPCR lysis reagent (Peqlab, Wilmington, DE, USA) or mammalian genomic DNA MiniPrep Kit.

5. Integration-PCR primer spanning the nuclease cleavage site. 6. DNA-Polymerase of choice suitable for genotyping PCR. 7. PCR strip tubes or 96-well PCR plates and adhesive films. 8. Thermocycler.

2.2 and Transfection

2.3 Analysis

Cell Culture

Single Cell

LncRNA Silencing with ZFNs 243

244 Tony Gutschner

3 Methods

3.1 Cloning of a Donor Plasmid

The targeting approach requires cloning of a donor plasmid (Subheading 3.1), its transfection into cells together with ZFNs (or any other gene editing tool) (Subheading 3.2). After cell expan- sion, cells need to be enriched using Fluorescence Activated Cell Sorting (FACS) (Subheading 3.3). FACS is also used to distribute single cells into 96-wells for clonal growth. Finally, cell clones are analyzed for site-specific integration events and target gene expres- sion levels (Subheading 3.4). See Fig. 2 for a protocol workflow. Design and cloning of gene-specific ZFNs or other gene-editing tools is highly user-specific and will not be covered here.

1. Use proofreading DNA polymerases and genomic DNA to PCR amplify about 800 nt long left and right homology arms (see Note 1).

2. Run PCR program for 30 cycles and with an elongation time of 1 min per 1 kb.

3. Load PCR products on an agarose gel (1 % w/v) and let run at 5–8 V/cm.

4. Purify PCR products using a Gel Extraction kit according to manufacturer’s recommendations. Elute in 30 μL pre-warmed water (50–60°C). Measure concentration of PCR products.

5. Use about 400 ng of PCR product and incubate for 1 h at 37 °C with appropriate restriction enzymes.

6. Purify PCR products using a PCR purification kit according to manufacturer’s recommendations. Elute in 20 μL pre-warmed water (50–60 °C) and determine concentrations.

7. In parallel, prepare the donor plasmid accordingly by digesting and purifying the plasmid with the same reagents and protocols.

8. Clone the first homology arm into the donor plasmid by ligat- ing the PCR product and the prepared plasmid using T4 DNA ligase. Use a 3:1 M ratio (PCR–Plasmid) for optimal ligation efficiency.

9. Transform competent E.coli, e.g., by heat shock (42 °C for 30–45 s, on ice for 2 min)

10. Streak E. coli on LB plates containing appropriate antibiotics.

11. Incubate plates for 12–16 h at 37 °C.

12. Pick single colonies and inoculate 2.5–5 mL LB-Medium con- taining antibiotics.

13. Grow colonies for 8–12 h and isolate plasmid DNA using a Mini-Prep kit.

14. Sequence-verify your clone harboring the first homology arm.

10d

10d

14-21d

5-10d

LncRNA Silencing with ZFNs 245 Cloning of ZFNs and donor plasmid

Transfection: ZFN and donor plasmid

Expansion of cells

1st FACS: enrich for GFP+ cells

Expansion of cells

2nd FACS: Single cell sort of GFP+ cells

Expansion of single cells clones

Transfer of clones to 24-well plates

Expansion of single cells clones

Transfer to 96-well and 6-well plates

1-2d 5-10d Genotyping or expression analysis Expansion and storage of clones

Identification of KO clones

Functional analysis of KO clones

Fig. 2 Workflow for lncRNA knockout. Single, homozygous clones can be obtained within 6–8 weeks after ZFN and donor plasmid transfection

15. Continue cloning the second homology arm into the plasmid obtained above.

Repeat steps 7–14 accordingly.

246 Tony Gutschner

3.2 Transfection of ZFNs and Donor Plasmid

3.3 Cell Sorting

16. Use 20–40 μL of starting culture used for Mini-Prep and inoculate 25–35 mL LB-Medium containing antibiotics.

17. Perform Plasmid DNA isolation using a Midi-Prep kit.

The optimal transfection protocol highly depends on the cell line that is subjected to manipulations. Transfection conditions should thus be optimized in advance. The protocol introduced here was successfully applied to human A549 lung cancer cells.

1. 2.

3.

4. 5.

6. 7.

8.

1 . 2. 3.

4.

5. 6.

7.

8.

9. 10.

Seed cells (2–3×105 per 6-well) in 2 mL cell culture medium (+10 % FBS, no antibiotics) (see Note 2).

The next day, prepare plasmid mix by combining 3 μg donor plasmid and 0.5 μg of ZFN plasmid each (1 μg ZFN plasmids in total) (see Note 3).

Combine plasmid mix (4 μg) with 8 μL Turbofect transfection reagent (Thermo Scientific) in serum-/antibiotics-free cell cul- ture medium (final volume = 200 μL). Mix briefly.

Incubate for 15 min at room temperature.

Add transfection mix dropwise to cells and shake plate back and forth for equal distribution.

Incubate cells for 4–6 h with transfection mix.

Remove medium and add fresh, complete growth medium to cells.

Cells might be evaluated for GFP expression prior to further processing.

Expand cells for 10 days after donor and ZFN plasmid transfection.

Remove medium, wash cells once with PBS and add Tr ypsin–EDTA.

Incubate cells at 37 °C and allow for detach (5–15 min).

Resuspend cells in complete cell culture medium and transfer into conical centrifuge tube.

Spin down cells at 500×g for 5 min.

Completely remove cell culture medium and resuspend cell pellet in 2–4 mL PBS/FBS (1 % v/v) by pipetting up and down (see Note 4).

Pipet cells into BD Falcon 12×75 mm Tubes using the cell strainer cap to filter the cell suspension.

Perform steps 2–7 with GFP-negative wild-type cells. Put cells on ice and continue with cell sorting.

Use GFP-negative cells to adjust instrument settings and set threshold for GFP-selection.

3.4 Analysis

11. Perform cell sorting to enrich for GFP-positive cells. Sort cells into 1.5 mL reaction tubes containing 50–100 μL complete cell culture medium (see Note 5).

12. Spin down cells in a tabletop centrifuge (800×g, 5 min) and remove supernatant.

13. Resuspend cells in complete growth medium and seed into appropriate cell culture plates (see Note 6).

14. Expand cells for about 10 days to obtain at least one confluent 10 cm plate for further processing.

15. Add 200 μL complete growth medium per well into 96-well plate. Prepare 5–10 plates per cell line/construct/ZFN (see Note 7).

16. Prepare cells and adjust instrument settings as described in steps 2–10.

17. Sort GFP-positive cells into 96-well plates. GFP-negative wild- type cells might be sorted as well to obtain appropriate nega- tive control clones for subsequent biological experiments.

18. Incubate cells at 37 °C. Add 100 μL complete medium after 5–7 days (see Note 8).

1. About 7–10 days after sorting inspect 96-well plates and mark wells that contain cells.

2. Replace cell culture medium in respective wells by carefully removing the old medium using a 200 μL—pipet and sterile tips.

3. Continuously inspect 96-wells and mark wells that contain cells.

4. About 14–21 days after cell sorting first single cell clones might be ready for transfer into 24-well plates: Remove medium, wash once with PBS and add about 40 μL Trypsin–EDTA per 96-well. After incubation at 37 °C inspect cells for complete detachment. Resuspend cell clones in about 150 μL complete medium and transfer into 24-wells containing additional 500 μL complete growth medium.

5. After another 5–10 days, cells in 24-well plates might be con- fluent and are assigned an identification number. Then, cell clones are simultaneously transferred to 96-well and 6-well plates: Remove medium, wash once with PBS and add about 100 μL Trypsin–EDTA per 24-well. After incubation at 37 °C inspect cells for complete detachment. Resuspend cell clones in about 400 μL complete medium and transfer 100 μL into 96-well and 400 μL into a 6-well containing additional 2 mL complete growth medium.

Cell Clone

LncRNA Silencing with ZFNs 247

248 Tony Gutschner

heterozygous

Integration

No Integration

Fig. 3 Genotyping of cell clones by Integration-PCR. Primers cover the ZFN cleavage site. Monoallelic and bial- lelic integration events can be detected due to the different product sizes. In this example, 1 out of 12 clones harbored a biallelic integration of the selection marker after the selection process and thus showed a strong reduction in lncRNA expression (not shown)

4 Notes

6. The next day, cells in 96-wells are subjected to gene expression or genotyping analysis using the Power SYBR Green Cells- to-Ct kit (Life Technologies) or the DirectPCR lysis reagent (Peqlab) or GenElute mammalian genomic DNA MiniPrep Kit (Sigma-Aldrich) according to manufacturer’s recommen- dations respectively.

7. For genotyping analysis an Integration-PCR is performed using primer pairs that span the ZFN cleavage site (see Note 9). A site-directed integration will lead to a longer PCR product (Fig. 3) (see Note 10).

8. Corresponding positive, homozygous clones in the 6-well plates are further expanded and transferred to 10 cm plates (see Note 11).

9. Single cell clones might be frozen and stored in liquid nitrogen.

1. Homology arms should be cloned from the same cell line that will be used for genome editing due to potential single nucleo- tide polymorphisms (SNPs). Homologous recombination strongly depends on perfect homology and can be impaired by SNPs.

2. The cell line(s) used for ZFN-mediated integration of exoge- nous DNA must possess a certain homologous recombination rate. Several cell lines might be tested, if no integration events are detected.

3. Although not absolutely required, linearization of the donor plasmid might increase integration rates. Please note that lin- earized plasmids are less stable and thus a modified transfection

1 kb DNA ladder

homozygous

wildtype

LncRNA Silencing with ZFNs 249

protocol might be used. In this case, ZFN plasmids might be transfected prior to the donor plasmid to allow ZFN protein expression.

4. Careful pipetting should be performed to prevent disruption of cells while obtaining a single cell suspension, which is critical for subsequent single cell sorting. Addition of EDTA (1 mM final conc.) to the PBS/1 % FBS solution might be beneficial to prevent cell aggregation.

5. A total of 1–3 % of GFP-positive cells can be anticipated, but this rate might vary and depends on multiple parameters. Depending on the instrument and exact settings up to 4×105 cells can be sorted into one 1.5 mL reaction tube.

6. Antibiotics should be added to the cell culture medium after cell sorting to avoid contaminations.

7. The cell lines’ capability to grow as a single cell colony should be tested beforehand. If a cell sorter (e.g., BD Bioscience FACS Aria II) is used, optimal sorting conditions should be determined in advance. Roughly, 10–40 single cell colonies can be expected per 96-well plate.

8. Some cell lines might show an improved single cell growth, if conditioned medium or medium with higher serum concen- tration is used (max. 20 % v/v). If conditioned medium is used, sterile filter before applying to single cells to avoid contaminations.

9. Alternatively, a Junction-PCR can be performed for genotyp- ing. Here, one primer anneals to a sequence region outside the homology arms and the second primer specifically binds to the newly integrated (exogenous) sequence, e.g., the selection marker (here: GFP).

10. Different amounts of donor plasmid should be tested, if high rates of random, nonspecific donor plasmid integrations are observed, i.e., GFP-positive cells that lack a site-specific inte- gration of the donor plasmid. Also, an efficient counter selec- tion strategy could be applied, e.g., cloning the herpes simplex virus thymidine kinase gene outside the homology arms. Nonspecific integration and expression of this suicide gene confers sensitivity towards ganciclovir [11].

11. In theory, targeted integration on both chromosomes is neces- sary to obtain an efficient gene knockdown. However, cancer cells might show diverse degrees of gene amplifications and deletions. Also, epigenetically silenced or imprinted genes as well as genes localized on the X or Y-chromosomes represent exceptions of the rule. Thus, a single, site-specific integration might already lead to an efficient silencing. On the other hand, multiple integration events must occur simultaneously

250 Tony Gutschner

in human polyploid cells (e.g., hepatocytes, heart muscle cells, megakaryocytes) or in amplified chromosome regions to significantly impair target gene expression.

The author wishes to acknowledge the support of his colleagues at the German Cancer Research Center (DKFZ) Heidelberg who helped to establish this method and to set up the protocol. A spe- cial thanks goes to Matthias Groß and Dr. Monika Hämmerle for critical reading of the manuscript. T.G. is supported by an Odyssey Postdoctoral Fellowship sponsored by the Odyssey Program and the CFP Foundation at The University of Texas MD Anderson Cancer Center.

Acknowledgement

References

1. Gutschner T, Diederichs S (2012) The hall- 7. marks of cancer: a long non-coding RNA point of view. RNA Biol 9(6):703–719. doi:10. 4161/rna.20481

2. Jackson AL, Linsley PS (2010) Recognizing and avoiding siRNA off-target effects for tar- get identification and therapeutic application. Nat Rev Drug Discov 9(1):57–67. doi:10.1038/nrd3010 8.

3. Gutschner T, Baas M, Diederichs S (2011) Noncoding RNA gene silencing through genomic integration of RNA destabilizing ele- ments using zinc finger nucleases. Genome Res 21(11):1944–1954. doi:10.1101/gr. 9. 122358.111

4. GutschnerT,HammerleM,DiederichsS(2013) MALAT1—a paradigm for long noncoding RNA function in cancer. J Mol Med 91(7):791– 801. doi:10.1007/s00109-013-1028-y

5. Ji P, Diederichs S, Wang W, Boing S, Metzger R, Schneider PM, Tidow N, Brandt B, Buerger H, Bulk E, Thomas M, Berdel WE, Serve H, Muller-Tidow C (2003) MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene 22(39):8031– 8041. doi:10.1038/sj.onc.1206928

6. Miller JC, Holmes MC, Wang J, Guschin DY, Lee YL, Rupniewski I, Beausejour CM, Waite AJ, Wang NS, Kim KA, Gregory PD, Pabo CO, Rebar EJ (2007) An improved zinc-finger nuclease architecture for highly specific genome editing. Nat Biotechnol 25(7):778– 785. doi:10.1038/nbt1319

Gutschner T, Hammerle M, Eissmann M, Hsu J, Kim Y, Hung G, Revenko A, Arun G, Stentrup M, Gross M, Zornig M, MacLeod AR, Spector DL, Diederichs S (2013) The non- coding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res 73(3):1180–1189. doi:10.1158/ 0008-5472.CAN-12-2850

Fu F, Voytas DF (2013) Zinc Finger Database (ZiFDB) v2.0: a comprehensive database of C(2)H(2) zinc fingers and engineered zinc finger arrays. Nucleic Acids Res 41(Database issue):D452–D455. doi:10.1093/nar/gks1167

Sander JD, Dahlborg EJ, Goodwin MJ, Cade L, Zhang F, Cifuentes D, Curtin SJ, Blackburn JS, Thibodeau-Beganny S, Qi Y, Pierick CJ, Hoffman E, Maeder ML, Khayter C, Reyon D, Dobbs D, Langenau DM, Stupar RM, Giraldez AJ, Voytas DF, Peterson RT, Yeh JR, Joung JK (2011) Selection-free zinc-finger- nuclease engineering by context-dependent assembly (CoDA). Nat Methods 8(1):67–69. doi:10.1038/nmeth.1542

10. Cermak T, Doyle EL, Christian M, Wang L, Zhang Y, Schmidt C, Baller JA, Somia NV, Bogdanove AJ, Voytas DF (2011) Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res 39(12):e82. doi:10.1093/nar/gkr218

11. Moolten FL, Wells JM (1990) Curability of tumors bearing herpes thymidine kinase genes transferred by retroviral vectors. J Natl Cancer Inst 82(4):297–300


Who Owns the Biggest Biotech Discovery of the Century?There’s a bitter fight over the patents for CRISPR, a breakthrough new form of DNA editing.

Control over genome editing could be worth billions. [Yes, there is already ample independent experimental evidence that "fractal defects" of the genome are linked to cancers, schizophrenia, autism, auto-immune diseases (etc). Of course, one needs first to find such "fractal defects"; see US patent in force 8,280,641 - such that one would know what to edit out. FractoGene is a result of "geometrization of genomics". Since mathematization of biology is rarely well received by non-mathematical-minded biologists, result of understanding the sensorimotor coordination function of cerebellar neural nets really broke through not in biology (AJP actually was denied continuation of his grant support since actual mathematics contradicted "Central Dogma" - though Francis Crick confessed later that he did not know either mathematics or what the word "Dogma" actually meant, it just "sounded good"). Since one of the most successful fighter jets in history, the F15 (Israel shot down all enemy aircraft without losing a single F15) could in fact be landed "on one wing" by a superb Israeli pilot, the patent-version of Pellionisz' "Tensor Network Theory" led to automation by NASA, such that landing could be done by any lesser pilot, purely by automation. Geometrization of the function of cerebellar neural net immediately yielded the Alexander von Humboldt Prize from Germany (such that on a 6-months lecture tour in Germany the concepts were widely disseminated, and the inventor faced the trilemma of either switching his professorship at New York University to one in Germany, or his native Hungary - or return to Silicon Valley - today's decisions also include BRICS countries, as the USA is without a streamlined "Genome Program" - genomics is scattered from NIH to NSF and DARPA, DoE and even Homeland Defense). For NASA, it took a decade from the blueprint to actually perform successful implementation. Indeed, intellectual property, especially when university and/or government parties are involved in invention and/or assignment can be mind-boggling, at the time Dr. Pellionisz turned to develop the advanced geometry of recursive genome function, he steered clear of any such cumbersome involvement. This, of course, meant that since the inventor financed the entire development "out of pocket", could not pay for "accelerated issuance" of his patent. It took more than a full decade for the USPTO to understand and to issue the patent 8,280,641 (though, in retrospect, may appear "yeah, sure" to some now - but the patent is in force till late March of 2026). There is a single inventor, and the patent is personal property (assigned to none other than the inventor). Now, some agencies need all the help they can get hard times explaining the $100 million project of "cataloging cancer mutations" (the number is not infinite, given the finite amount of information compressed into the genome, but certainly astronomical, and it makes no sense either scientifically or economically to waste taxpayer's money to "big data" projects that result mostly in prolonged suffering). At least three leading "cloud computing companies" are already set-up for hunting "fractal defects" - with myriads of "wet labs" to hone "genome editing" to clean up genomic glitches. Help is available, given appropriate arrangements - andras_at_pellionisz_dot_com]

By Antonio Regalado on December 4, 2014

Last month in Silicon Valley, biologists Jennifer Doudna and Emmanuelle Charpentier showed up in black gowns to receive the $3 million Breakthrough Prize, a glitzy award put on by Internet billionaires including Mark Zuckerberg. They’d won for developing CRISPR-Cas9, a “powerful and general technology” for editing genomes that’s been hailed as a biotechnology breakthrough.

Not dressing up that night was Feng Zhang (see 35 Innovators Under 35, 2013), a researcher in Cambridge at the MIT-Harvard Broad Institute. But earlier this year Zhang claimed his own reward. In April, he won a broad U.S. patent on CRISPR-Cas9 that could give him and his research center control over just about every important commercial use of the technology.

How did the high-profile prize for CRISPR and the patent on it end up in different hands? That’s a question now at the center of a seething debate over who invented what, and when, that involves three heavily financed startup companies, a half-dozen universities, and thousands of pages of legal documents.

“The intellectual property in this space is pretty complex, to put it nicely,” says Rodger Novak, a former pharmaceutical industry executive who is now CEO of CRISPR Therapeutics, a startup in Basel, Switzerland, that was cofounded by Charpentier. “Everyone knows there are conflicting claims.”

At stake are rights to an invention that may be the most important new genetic engineering technique since the beginning of the biotechnology age in the 1970s. The CRISPR system, dubbed a “search and replace function” for DNA, lets scientists easily disable genes or change their function by replacing DNA letters. During the last few months, scientists have shown that it’s possible to use CRISPR to rid mice of muscular dystrophy, cure them of a rare liver disease, make human cells immune to HIV, and genetically modify monkeys (see “Genome Surgery” and “10 Breakthrough Technologies 2014: Genome Editing”).

No CRISPR drug yet exists. But if CRISPR turns out to be as important as scientists hope, commercial control over the underlying technology could be worth billions.

The control of the patents is crucial to several startups that together quickly raised more than $80 million to turn CRISPR into cures for devastating diseases. They include Editas Medicine and Intellia Therapeutics, both of Cambridge, Massachusetts. Companies expect that clinical trials could begin in as little as three years.

Zhang cofounded Editas Medicine, and this week the startup announced that it had licensed his patent from the Broad Institute. But Editas doesn’t have CRISPR sewn up. That’s because Doudna, a structural biologist at the University of California, Berkeley, was a cofounder of Editas, too. And since Zhang’s patent came out, she’s broken off with the company, and her intellectual property—in the form of her own pending patent—has been licensed to Intellia, a competing startup unveiled only last month. Making matters still more complicated, Charpentier sold her own rights in the same patent application to CRISPR Therapeutics.

In an e-mail, Doudna said she no longer has any involvement with Editas. “I am not part of the company’s team at this point,” she said. Doudna declined to answer further questions, citing the patent dispute.

Few researchers are now willing to discuss the patent fight. Lawsuits are certain and they worry anything they say will be used against them. “The technology has brought a lot of excitement, and there is a lot of pressure, too. What are we going to do? What kind of company do we want?” Charpentier says. “It all sounds very confusing for an outsider, and it’s also quite confusing as an insider.”

Academic labs aren’t waiting for the patent claims to get sorted out. Instead, they are racing to assemble very large engineering teams to perfect and improve the genome-editing technique. On the Boston campus of Harvard’s medical school, for instance, George Church, a specialist in genomics technology, says he now has 30 people in his lab working on it.

Because of all the new research, Zhang says, the importance of any patent, including his own, isn’t entirely clear. “It’s one important piece, but I don’t really pay attention to patents,” he says. “What the final form of this technology is that changes people’s lives may be very different.”

The new gene-editing system was unearthed in bacteria—organisms that use it as a way to identify, and then carve up, the DNA of invading viruses. That work stretched across a decade. Then, in June 2012, a small team led by Doudna and Charpentier published a key paper showing how to turn that natural machinery into a “programmable” editing tool, to cut any DNA strand, at least in a test tube.

The next step was clear—scientists needed to see if the editing magic could work on the genomes of human cells, too. In January 2013, the laboratories of Harvard’s Church and Broad’s Zhang were first to publish papers showing that the answer was yes. Doudna published her own results a few weeks later.

Everyone by then realized that CRISPR might become an immensely flexible way to rewrite DNA, and possibly to treat rare metabolic problems and genetic diseases as diverse as hemophilia and the neurodegenerative disease Huntington’s.

Venture capital groups quickly began trying to recruit the key scientists behind CRISPR, tie up the patents, and form startups. Charpentier threw in with CRISPR Therapeutics in Europe. Doudna had already started a small company, Caribou Biosciences, but in 2013 she joined Zhang and Church as a cofounder of Editas. With $43 million from leading venture funds Third Rock Ventures (see “50 Smartest Companies: Third Rock Ventures”), Polaris Partners, and Flagship Ventures, Editas looked like the dream team of gene-editing startups.

In April of this year, Zhang and the Broad won the first of several sweeping patents that cover using CRISPR in eukaryotes—or any species whose cells contain a nucleus (see “Broad Institute Gets Patent on Revolutionary Gene-Editing Method”). That meant that they’d won the rights to use CRISPR in mice, pigs, cattle, humans—in essence, in every creature other than bacteria.

The patent came as a shock to some. That was because Broad had paid extra to get it reviewed very quickly, in less than six months, and few knew it was coming. Along with the patent came more than 1,000 pages of documents. According to Zhang, Doudna’s predictions in her own earlier patent application that her discovery would work in humans was “mere conjecture” and that, instead, he was the first to show it, in a separate and “surprising” act of invention.

The patent documents have caused consternation. The scientific literature shows that several scientists managed to get CRISPR to work in human cells. In fact, its easy reproducibility in different organisms is the technology’s most exciting hallmark. That would suggest that, in patent terms, it was “obvious” that CRISPR would work in human cells, and that Zhang’s invention might not be worthy of its own patent.


What’s more, there’s scientific credit at stake. In order to show he was “first to invent” the use of CRISPR-Cas in human cells, Zhang supplied snapshots of lab notebooks that he says show he had the system up and running in early 2012, even before Doudna and Charpentier published their results or filed their own patent application. That timeline would mean he hit on the CRISPR-Cas editing system independently. In an interview, Zhang affirmed he’d made the discoveries on his own. Asked what he’d learned from Doudna and Charpentier’s paper, he said “not much.”

Not everyone is convinced. “All I can say is that we did it in my lab with Jennifer Doudna,” says Charpentier, now a professor at the Helmholtz Centre for Infection Research and Hannover Medical School in Germany. “Everything here is very exaggerated because this is one of those unique cases of a technology that people can really pick up easily, and it’s changing researchers’ lives. Things are happening fast, maybe a bit too fast.”

This isn’t the end of the patent fight. Although Broad moved very swiftly, lawyers for Doudna and Charpentier are expected to mount an interference proceeding in the U.S.—that is, a winner-takes-all legal process in which one inventor can take over another’s patent. Who wins will depend on which scientist can produce lab notebooks, e-mails, or documents with the earliest dates.

“I am very confident that the future will clarify the situation,” says Charpentier. “And I would like to believe the story is going to end up well.”


NIH grants aim to decipher the language of gene regulation

Bethesda, Md., Jan. 5, 2015 - The National Institutes of Health has awarded grants of more than $28 million aimed at deciphering the language of how and when genes are turned on and off. These awards emanate from the recently launched Genomics of Gene Regulation (GGR) program of the National Human Genome Research Institute (NHGRI), part of NIH.

"There is a growing realization that the ways genes are regulated to work together can be important for understanding disease," said Mike Pazin, Ph.D., a program director in the Functional Analysis Program in NHGRI's Division of Genome Sciences. "The GGR program aims to develop new ways for understanding how the genes and switches in the genome fit together as networks. Such knowledge is important for defining the role of genomic differences in human health and disease."

With these new grants, researchers will study gene networks and pathways in different systems in the body, such as skin, immune cells and lung. The resulting insights into the mechanisms controlling gene expression may ultimately lead to new avenues for developing treatments for diseases affected by faulty gene regulation, such as cancer, diabetes and Parkinson's disease.

Over the past decade, numerous studies have suggested that genomic regions outside of protein-coding regions harbor variants that play a role in disease. Such regions likely contain gene-control elements that are altered by these variants, which increase the risk for a disease.

"Knowing the interconnections of these regulatory elements is critical for understanding the genomic basis of disease," Dr. Pazin said. "We do not have a good way to predict whether particular regulatory elements are turning genes off or activating them, or whether these elements make genes responsive to a condition, such as infection. We expect these new projects will develop better methods to answer these types of questions using genomic data."

[There is an interesting new scenario. This columnist (AJP; andras_at_pellionisz_dot_com) has devoted close to half a Century of very hard work to develop advanced geometrical understanding of the function of neural and genomic systems, as they arise from their so well known and so beloved structure. Geometrization (mathematization) of biology, however, is rather poorly received (when Mandelbrot was offered to lead, with very significant resources, declined the offer since "biologists were not ready"; Benoit upheld his proper impression through his life, as shown in his Memoirs).


End of cancer-genome project prompts rethink: Geneticists debate whether focus should shift from sequencing genomes to analysing function.

Nature, 2015 January 5.

A mammoth US effort to genetically profile 10,000 tumours has officially come to an end. Started in 2006 as a US$100-million pilot, The Cancer Genome Atlas (TCGA) is now the biggest component of the International Cancer Genome Consortium, a collaboration of scientists from 16 nations that has discovered nearly 10 million cancer-related mutations.

The question is what to do next. Some researchers want to continue the focus on sequencing; others would rather expand their work to explore how the mutations that have been identified influence the development and progression of cancer.

“TCGA should be completed and declared a victory,” says Bruce Stillman, president of Cold Spring Harbor Laboratory in New York. “There will always be new mutations found that are associated with a particular cancer. The question is: what is the cost–benefit ratio?”

Stillman was an early advocate for the project, even as some researchers feared that it would drain funds away from individual grants. Initially a three-year project, it was extended for five more years. In 2009, it received an additional $100 million from the US National Institutes of Health plus $175 million from stimulus funding that was intended to spur the US economy during the global economic recession.

The project initially struggled. At the time, the sequencing technology worked only on fresh tissue that had been frozen rapidly. Yet most clinical biopsies are fixed in paraffin and stained for examination by pathologists. Finding and paying for fresh tissue samples became the programme’s largest expense, says Louis Staudt, director of the Office for Cancer Genomics at the National Cancer Institute (NCI) in Bethesda, Maryland.

Also a problem was the complexity of the data. Although a few ‘drivers’ stood out as likely contributors to the development of cancer, most of the mutations formed a bewildering hodgepodge of genetic oddities, with little commonality between tumours. Tests of drugs that targeted the drivers soon revealed another problem: cancers are often quick to become resistant, typically by activating different genes to bypass whatever cellular process is blocked by the treatment.

Despite those difficulties, nearly every aspect of cancer research has benefited from TCGA, says Bert Vogelstein, a cancer geneticist at Johns Hopkins University in Baltimore, Maryland. The data have yielded new ways to classify tumours and pointed to previously unrecognized drug targets and carcinogens. But some researchers think that sequencing still has a lot to offer. In January, a statistical analysis of the mutation data for 21 cancers showed that sequencing still has the potential to find clinically useful mutations (M. S. Lawrence et al. Nature 505, 495–501; 2014).

On 2 December, Staudt announced that once TCGA is completed, the NCI will continue to intensively sequence tumours in three cancers: ovarian, colorectal and lung adenocarcinoma. It then plans to evaluate the fruits of this extra effort before deciding whether to add back more cancers.

Expanded scope

But this time around, the studies will be able to incorporate detailed clinical information about the patient’s health, treatment history and response to therapies. Because researchers can now use paraffin-embedded samples, they can tap into data from past clinical trials, and study how mutations affect a patient’s prognosis and response to treatment. Staudt says that the NCI will be announcing a call for proposals to sequence samples taken during clinical trials using the methods and analysis pipelines established by the TCGA.

The rest of the International Cancer Gene Consortium, slated to release early plans for a second wave of projects in February, will probably take a similar tack, says co-founder Tom Hudson, president of the Ontario Institute for Cancer Research in Toronto, Canada. A focus on finding sequences that make a tumour responsive to therapy has already been embraced by government funders in several countries eager to rein in health-care costs, he says. “Cancer therapies are very expensive. It’s a priority for us to address which patients would respond to an expensive drug.”

The NCI is also backing the creation of a repository for data not only from its own projects, but also from international efforts. This is intended to bring data access and analysis tools to a wider swathe of researchers, says Staudt. At present, the cancer genomics data constitute about 20 petabytes (1015 bytes), and are so large and unwieldy that only institutions with significant computing power can access them. Even then, it can take four months just to download them.

Stimulus funding cannot be counted on to fuel these plans, acknowledges Staudt. But cheaper sequencing and the ability to use biobanked biopsies should bring down the cost, he says. “Genomics is at the centre of much of what we do in cancer research,” he says. “Now we can ask questions in a more directed way.”

Nature 517, 128–129 (08 January 2015) doi:10.1038/517128a


Variation in cancer risk among tissues can be explained by the number of stem cell divisions

Cristian Tomasetti1,*, Bert Vogelstein2,*

Science 2 January 2015:

Vol. 347 no. 6217 pp. 78-81

DOI: 10.1126/science.1260825

- Author Affiliations

1Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Cancer Center, Johns Hopkins University School of Medicine and Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 550 North Broadway, Baltimore, MD 21205, USA.

2Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, 1650 Orleans Street, Baltimore, MD 21205, USA.

↵*Corresponding author. E-mail: ctomasetti@jhu.edu (C.T.); vogelbe@jhmi.edu (B.V.)

ABSTRACT

Some tissue types give rise to human cancers millions of times more often than other tissue types. Although this has been recognized for more than a century, it has never been explained. Here, we show that the lifetime risk of cancers of many different types is strongly correlated (0.81) with the total number of divisions of the normal self-renewing cells maintaining that tissue’s homeostasis. These results suggest that only a third of the variation in cancer risk among tissues is attributable to environmental factors or inherited predispositions. The majority is due to “bad luck,” that is, random mutations arising during DNA replication in normal, noncancerous stem cells. This is important not only for understanding the disease but also for designing strategies to limit the mortality it causes.

EDITOR'S SUMMARY

Crunching the numbers to explain cancer

Why do some tissues give rise to cancer in humans a million times more frequently than others? Tomasetti and Vogelstein conclude that these differences can be explained by the number of stem cell divisions. By plotting the lifetime incidence of various cancers against the estimated number of normal stem cell divisions in the corresponding tissues over a lifetime, they found a strong correlation extending over five orders of magnitude. This suggests that random errors occurring during DNA replication in normal stem cells are a major contributing factor in cancer development. Remarkably, this “bad luck” component explains a far greater number of cancers than do hereditary and environmental factors.

Cancer’s Random Assault

By DENISE GRADYJAN. 5, 2015

New York Times

It may sound flippant to say that many cases of cancer are caused by bad luck, but that is what two scientists suggested in an article published last week in the journal Science. The bad luck comes in the form of random genetic mistakes, or mutations, that happen when healthy cells divide.

Random mutations may account for two-thirds of the risk of getting many types of cancer, leaving the usual suspects — heredity and environmental factors — to account for only one-third, say the authors, Cristian Tomasetti and Dr. Bert Vogelstein, of Johns Hopkins University School of Medicine. “We do think this is a fundamental mechanism, and this is the first time there’s been a measure of it,” said Dr. Tomasetti, an applied mathematician.

Though the researchers suspected that chance had a role, they were surprised at how big it turned out to be.

“This was definitely beyond my expectations,” Dr. Tomasetti said. “It’s about double what I would have thought.”

The finding may be good news to some people, bad news to others, he added.

Smoking greatly increases the risk of lung cancer, but for other cancers, the causes are not clear. And yet many patients wonder if they did something to bring the disease on themselves, or if they could have done something to prevent it.

“For the average cancer patient, I think this is good news,” Dr. Tomasetti said. “Knowing that over all, a lot of it is just bad luck, I think in a sense it’s comforting.”

Among people who do not have cancer, Dr. Tomasetti said he expected there to be two camps.

“There are those who would like to control every single thing happening in their lives, and for those, this may be very scary,” he said. “ ‘There is a big component of cancer I can just do nothing about.’

“For the other part of the population, it’s actually good news. ‘I’m happy. I can of course do all I know that’s important to not increase my risk of cancer, like a good diet, exercise, avoiding smoking, but on the other side, I don’t want to stress out about every single thing or every action I take in my life, or everything I touch or eat.’ ” Dr. Vogelstein said the question of causation had haunted him for decades, since he was an intern and his first patient was a 4-year-old girl with leukemia. Her parents were distraught and wanted to know what had caused the disease. He had no answer, but time and time again heard the same question from patients and their families, particularly parents of children with cancer.

“They think they passed on a bad gene or gave them the wrong foods or exposed them to paint in the garage,” he said. “And it’s just wrong. It gave them a lot of guilt.”

Dr. Tomasetti and Dr. Vogelstein said the finding that so many cases of cancer occur from random genetic accidents means that it may not be possible to prevent them, and that there should be more of an emphasis on developing better tests to find cancers early enough to cure them.

“Cancer leaves signals of its presence, so we just have to basically get smarter about how to find them,” Dr. Tomasetti said.

Their conclusion comes from a statistical model they developed using data in the medical literature on rates of cell division in 31 types of tissue. They looked specifically at stem cells, which are a small, specialized population in each organ or tissue that divide to provide replacements for cells that wear out.

Dividing cells must make copies of their DNA, and errors in the process can set off the uncontrolled growth that leads to cancer.

The researchers wondered if higher rates of stem-cell division might increase the risk of cancer simply by providing more chances for mistakes.

Dr. Vogelstein said research of this type became possible only in recent years, because of advances in the understanding of stem-cell biology.

Continue reading the main story

RECENT COMMENTS

John 6 hours ago

As my doctors told me, "You're the healthiest guy I've ever seen, except for that life-threatening cancer."

Tim Hunter 7 hours ago

Caused by chance really means "caused by a reason we do not yet understand". I firmly believe that when we live the way we do, surrounded by...

imperato 7 hours ago

So why does a blue whale containing the largest number of cells of any organism on the planet not have a correspondingly high cancer rate?

SEE ALL COMMENTS WRITE A COMMENT

The analysis did not include breast or prostate cancers, because there was not enough data on rates of stem-cell division in those tissues.

A starting point for their research was an observation made more than 100 years ago but never really explained: Some tissues are far more cancer-prone than others. In the large intestine, for instance, the lifetime cancer risk is 4.8 percent — 24 times higher than in the small intestine, where it is 0.2 percent.

The scientists found that the large intestine has many more stem cells than the small intestine, and that they divide more often: 73 times a year, compared with 24 times. In many other tissues, rates of stem cell division also correlated strongly with cancer risk.

Some cancers, including certain lung and skin cancers, are more common than would be expected just from their rates of stem-cell division — which matches up with the known importance of environmental factors like smoking and sun exposure in those diseases. Others more common than expected were linked to cancer-causing genes. To help explain the findings, Dr. Tomasetti cited the risks of a car accident. In general, the longer the trip, the higher the odds of a crash. Environmental factors like bad weather can add to the basic risk, and so can defects in the car.

“This is a good picture of how I see cancer,” he said. “It’s really the combination of inherited factors, environment and chance. At the base, there is the chance of mutations, to which we add, either because of things we inherited or the environment, our lifestyle.”

Dr. Kenneth Offit, chief of the clinical genetics service at Memorial Sloan Kettering Cancer Center in Manhattan, called the article “an elegant biological explanation of the complex pattern of cancers observed in different human tissues.”


Finding the simple patterns in a complex world (Barnsley: "Cancers are fractals")

An ANU mathematician has developed a new way to uncover simple patterns that might underlie apparently complex systems, such as clouds, cracks in materials or the movement of the stockmarket.

The method, named fractal Fourier analysis, is based on new branch of mathematics called fractal geometry.

The method could help scientists better understand the complicated signals that the body gives out, such as nerve impulses or brain waves.

"It opens up a whole new way of analysing signals," said Professor Michael Barnsley, who presented his work at the New Directions in Fractal Geometry conference at ANU.

"Fractal Geometry is a new branch of mathematics that describes the world as it is, rather than acting as though it's made of straight lines and spheres. There are very few straight lines and circles in nature. The shapes you find in nature are rough."

The new analysis method is closely related to conventional Fourier analysis, which is integral to modern image handling and audio signal processing.

"Fractal Fourier analysis provides a method to break complicated signals up into a set of well understood building blocks, in a similar way to how conventional Fourier analysis breaks signals up into a set of smooth sine waves," Professor Barnsley said.

Professor Barnsley's work draws on the work of Karl Weierstrass from the late 19th Century, who discovered a family of mathematical functions that were continuous, but could not be differentiated

"There are terrific advances to be made by breaking loose from the thrall of continuity and differentiability," Professor Barnsley said.

"The body is full of repeating branch structures – the breathing system, the blood supply system, the arrangement of skin cells, even cancer is a fractal."

[Michael Barnsley - with the founder of the field, Benoit Mandelbrot gone - is a paramount leader of both the mathematics of fractals, as well as its applications. Though the hitherto most lucrative application (fractal prediction of the obviously non-derivable stock-price curves) was not led by either of them (see Elliot Wave Theory), chances are that the required mathematical/algorithmic/software development will call for so significant investment, that "cloud computing companies" might spearhead or even monopolize the industry of FractoGene. Cloud computing provides the capital, infrastructure and the built-in capacity of enforcing royalties for algorithms run on myriads of their servers. 2015 is likely to be the year when the horse-race fully unfolds - andras_at_pellionisz_dot_com ]


A fractal geometric model of prostate carcinoma and classes of equivalence

[There is no need to read the poster - or the paper in print. Just looking at the Broccoli Romanesca (and the Hilbert fractal similarly widespread) will remind everyone by 2015 that "fractal genome grows fractal organisms" (FractoGene). What other concept grasps the essence of Recursive Genome Function? - Pellionisz_dot_com