GENOM

It is a type password

They've been having solvering this it for years

Genomics

Genomics is the study of an organism's genome, i.e. the entire complement of its genetic material (genes plus non-coding sequences).

I entried genes plus

From: Biomass and Bioenergy, 2010

Related terms:

Proteomics

Biodiversity

DNA

Ecology

Enzyme

Micro-Organism

View all Topics

Plant and Nanoparticle Interface at the Molecular Level

Gausiya Bashri, ... Sheo M. Prasad, in Nanomaterials in Plants, Algae, and Microorganisms, 2018

15.5 Effect of NPs on Genomics

Genomics deals with changes at gene or DNA level in any organism. It plays an important role in the study of stress physiology for understanding the mechanisms of toxicity through high-throughput methods such as cDNA microarrays or quantitative real-time polymerase chain reaction (Xu et al., 2011). As discussed earlier, NPs negatively affect plants, and Table 15.2 has a brief account of nanotoxicity at the genomics level in plants. Gene expression analyses are performed to test the changes at the genetic level for any morphological and/or physiological pathways and modes of action of any compound (Ankley et al., 2006; Dietz and Herth, 2011). Gene expression analysis helps to identify sensitive or tolerant genes for particular stresses, which can help in the development of transgenic plants for particular stresses (Merrick and Bruno, 2004; Thomas et al., 2011). NPs showed changes in gene expression at very low dose, which may help to investigate the cellular impact of chronic toxicity associated with NPs (Poma and Di Giorgio, 2008; Poynton and Vulpe, 2009). Therefore genomics analysis is very useful for the correlation of any morphological/physiological changes to the genetic level of the plants and also provides the mechanism of toxicity of NPs. Thus genomics analysis of NP-mediated toxicity is associated with disruption of the basic processes of electron transport chain signaling, which ultimately leads to the impairment of the cell cycle of the organism, as shown in Fig. 15.3.

Table 15.2. Genome Analysis of Nanoparticle-Mediated Toxicity

NanomaterialSize/DoseModel SystemEffect on the Cellular SystemReferencesAg NPs60 nmVicia fabaImpairment in mitosis causing chromosomal aberration and micronucleus induction, which caused genotoxicityPatlolla et al. (2012)Ag NPs10 nmTriticum aestivumEnhancement in oxidative stress that resulted in accumulation of oxidized glutathione (GSSG), and metallothionein encoding gene expression was enhancedDimkpa et al. (2013)Ag NPs20 nmArabidopsisEnhancement in the expression of genes associated with sulfur assimilation, glutathione S-transferase, and glutathione reductaseNair and Chung (2014a)Ag NPs20 nmArabidopsisVariable expression of genes; some were upregulated, while others were downregulated; downregulation of DNA mismatch repair protein (MSH), while upregulation of proliferating cell nuclear antigen (PCNA)Nair and Chung (2014b)Ag NPs20 nmArabidopsisUp- and downregulation of genes associated with protein family domain (PFAM) and interpro proteinKohan-Baghkheirati and Geisler-Lee (2015)Ag NPs45–47 nmArabidopsisAccumulation of Cu/Zn superoxide dismutase (CSD2), cell-division-cycle kinase 2 (CDC2), protochlorophyllide oxidoreductase (POR), and fructose-1, 6-bisphosphate aldolase (FBA). Enhancement in expression of genes associated with indoleacetic acid protein 8 (IAA8), 9-cis-epoxycarotenoid dioxygenase (NCED3), and dehydration-responsive RD22, while decline in expression of aminocyclopropane-1-carboxylic acid (ACC)-derivatives, ACC synthase 7 (ACS7), and ACC oxidase 2 (ACO2)Syu et al. (2014)Ag NPs, ZnO, and TiO2–MedicagoInduced shift in expression profiles of genes associated with biological pathways such as nitrogen metabolism, nodulation, metal homeostasis, and stress responsesChen et al. (2015)Ag–silica hybrid complex30 nmArabidopsisUpregulation of pathogenesis related (PR) genes, which is implicated in systemic acquired resistance (SAR)Chu et al. (2012)Al2O3Not specifiedNicotiana tabacumEnhancement in the expression of miRNA involved in mediating stress responsesBurklew et al. (2012)CuONot specifiedBrassica junceaUpregulation in genes associated with CuZn superoxide dismutase (CuZnSOD), while no significant change in expression of catalase (CAT) and ascorbate peroxidase (APX) genesNair and Chung (2015)Multiwalled carbon nanotubes (MWCNTs)20 nmNicotiana tabacumIncrement in expression of genes associated with aquaporin, cell division, and cell wall extensionKhodakovskaya et al. (2012)MWCNTs15–40 nmGlycine max, Hordeum vulgare, Zea maysIncrease in expression of gene encoding water channel proteinsLahiani et al. (2013)Single-walled carbon nanotubes (SWCNTs)1–2 nmArabidopsis, Oryza sativaEnhanced cell aggregation and chromatin condensation, negative impact on protoplast leading to oxidative stressShen et al. (2010)SWCNTs1–2 nmZ. maysVariable expression of genes, such as genes associated with increase in seminal root;epigenetic modification enzymes were enhanced, while those of root hair were reducedYan et al. (2013)SWCNTs50–100 nmHordeum vulgare, Z. mays, O. sativa, G. max, Panicum virgatum, Solanum lycopersicum, N. tabacumGenes associated with stress responses, cellular responses, and metabolic processes were variably expressedLahiani et al. (2015)TiO225 nmNicotiana tabacumExpression profiles of miRNA involved in regulating important genes related to tolerance were affectedFrazier et al. (2014)TiO2Not specifiedArabidopsisDifferential expression of genes involved in DNA metabolism, hormone metabolism, tetrapyrrole synthesis, and photosynthesisTumburu et al. (2015)Graphene oxide (GO)40–50 nmArabidopsisAltered expression of genes involved in development of abiotic stress tolerance and induction in oxidative stressWang et al. (2014)CeO2, In2O310–30 nm

20–70 nmArabidopsisAlteration in expression of major stress-responsive genes, i.e., glutathione biosynthetic gene and sulfur assimilationMa et al. (2013)ZnO20 nmArabidopsisDifferential expression of genes, such as genes associated with stress, were upregulated, while those associated with cell organization and biogenesis and DNA/RNA metabolism were downregulatedLanda et al. (2015)ZnO, CeO27–8 nmG. maxDamage to DNA and mutation, expression of new bandsLópez-Moreno et al. (2010)ZnO

TiO2<100 nm

<150 nmA. thalianaUpregulation in genes associated with response under biotic and abiotic stress and downregulation in genes involved in translation, nucleosome assembly, and processes involved with microtubule and cell organization and biogenesisLanda et al (2012)Ag

TiO210–80 nm

10–40 nmA. thalianaRepression in transcriptional factors related to pathogenesis and phosphate starvationGarcía-Sánchez et al. (2015)Ag20 nmA. thalianaGenes related to response to oxidative stress, namely, vacuolar cation/proton exchanger, superoxide dismutase, cytochrome P450-dependent oxidase, and peroxidase, were upregulated, while those related with pathogenesis and hormonal stimuli were downregulatedKaveh et al. (2013)Ag

TiO2

ZnO

Quantum dots20 nm

5 nm

20 nm

6–10 nmChlamydomonas reinhardtiiDecline in expression of genes associated with photosynthesis and enhancement in transcripts encoding cell wall and flagella componentsSimon et al. (2013)

Figure 15.3. Cellular uptake and cyto-/genotoxicity mediated by nanoparticles.

Modified from Magdolenova, Z., Collins, A., Kumar, A., Dhawan, A., Ston, V., Dusinska., M., 2013. Mechanisms of genotoxicity. A review of in vitro and in vivo studies with engineered nanoparticles. Nanotoxicology, 1–46. http://dx.doi.org/10.3109/17435390.2013.773464.

View chapterPurchase book

Issues, Challenges, Scientific Bottlenecks and Perspectives

Denis Faure, Dominique Joly, in Insight on Environmental Genomics, 2016

1.2.9 Genomic observatories

Genomics observatories (GO) are first rate research facilities that produce genomic-level biodiversity observations that are contextualized, localized in territories and in compliance with international data acquisition standards. There are currently 15 of them. They represent marine and continental ecosystems for which genomic data acquisition is a long-term activity. These facilities aim to quantify the biotic interactions of ecosystems and to build models of biodiversity to predict the quality and distribution of ecosystem services. They are spread all around the globe: two in the Asia-Pacific area, including a French one in Polynesia - http://usr3278.univ-perp.fr/moorea/?lang=en, eight in Europe including the Rothamsted site used by the TerraGenome program and two French marine stations in Roscoff and Banyuls, involved in the aforementioned Oceanomics and Idealg programs - two in the Arctic and Antarctic Polar zones as well as three in the USA. They form a network that represents the "pulse of the planet" and whose main goal is to promote sustainable development through a better understanding of the interactions between humans and their environment. The approach consists of applying cutting-edge genomics technologies to monitor the stream of genetic variations in human and natural ecosystems. Genetic data is systematically related to biophysical and socio-economical data (metadata), which enables the integration of all the information into predictive models. Such models aim at mapping the quality and distribution of biodiversity as well as the ecosystem services it provides, according to various scenarios of future change and human activity. These observatories also play a major role in the promotion of training, technical support, resources and guidelines in the form of a learning portal. This internet resource is available for new sites or organizations that wish to perfom genomic observation and especially for new facilities from developing countries where the levels of biodiversity vulnerability are often high.

View chapterPurchase book

Genetics is Involved in Everything, but not Everything is Genetic

R.G. Ramos, K. Olden, in Encyclopedia of Environmental Health, 2011

Initiatives to Study the Role of Genomics in Human Disease

Genomics is defined as 'the study of the complete genetic material, including both the structure and function of genes, of an organism.' In 1990, the NIH, in collaboration with the Department of Energy and international partners, sought to sequence the human genome hence known as the HGP. The goal of the HGP was to provide new knowledge research tools that would further the understanding of the genetic contribution to human disease. In 1998, the National Institute of Environmental Health Sciences (NIEHS), one of the institutes within the NIH, established the Environmental Genome Project (EGP). The EGP's primary mission was to investigate the role of genetic polymorphisms in susceptibility to human illness induced by environmental exposures. Such studies will improve the understanding of the interaction between genetic susceptibility and environmental exposures in human disease incidence and prevalence. This project supported research activities at both the NIH and the university level. In 2005, using the EGP as a model, the US Department of Health & Human Services launched the Genes, Environment and Health Initiative (GEI). The GEI is a collaborative effort between the NIEHS and the National Human Genome Research Institute (NHGRI), another institute at the NIH. This initiative consists of two components: the Exposure Biology Program and the Genetics Program. The Exposure Biology Program is currently led by the NIEHS and focuses on developing tools capable of measuring biomarkers that would delineate the relationship between the disease phenotype and exposure to chemicals, poor diet, reduced/no physical activity, psychosocial stress, and addictive substances. The Genetics Program, led by the NHGRI, will further the development of scientific tools (i.e., bioinformatics, high-throughput sequencing, and so forth) that will permit the analysis of the genetic variation between individuals with specific phenotypes. In December 2005, the Cancer Genome Atlas pilot project was announced. This project, being led by the National Cancer Institute (NCI), seeks to improve the understanding of the molecular mechanisms involved in cancer, including development and metastasis, and reflects the NIH roadmap that prioritizes translational research. Ultimately, the knowledge gained from the Cancer Genome Atlas pilot project will translate into new strategies for the prevention, diagnosis, and treatment of cancer.

At the public health level, the Centers for Disease Control and Prevention (CDC) recently celebrated the 10th anniversary of the Public Health Genomics at the CDC. The office is now known as the National Office for Public Health Genomics (NOPHG), and its primary mission is to 'integrate genomics into public health research, policy, and programs.' For this, the NOPHG has prioritized integrating genomics into national CDC surveys, promoting increased utilization of family history in assessing the risk of disease, and developing evidenced-based recommendations for the use of genetic tests to improve population health. At the National Institute of Occupational Safety and Hazards (NIOSH), the CDC's branch for occupational health research, the individual variability in worker response to occupational hazards is now an emerging and exciting field of gene–environment research. Thus, it is becoming obvious that the powerful tool of genomics has redefined the public health approach to the study, prevention, and control of environmentally and occupationally related diseases. Furthermore, the need for public health professionals has resulted in the establishment of public health genomic and human genetic programs at graduate schools of public health in the United States.

On the global front, the International HapMap Project and the Public Population Project in Genomics (P3G) are initiatives that promote the establishment of an international consortium. The former seeks to identify the distribution of individual haplotypes (alterations in DNA expression) within and between populations, so that other researchers can link the risk of specific diseases to a specific genotype. The latter will allow the biomedical research community to unravel the complex genetic and environmental interactions responsible for most common diseases and thus deliver improved disease prevention and treatment.

Several examples of gene–environment interactions have been reported recently. In the next section, examples of diseases that are known to have a significant contribution from both genetics and the environment in their development will be discussed. It is clear from the examples cited that the effects that mutated genes can have on the phenotype of the organism can vary in both severity and time of onset, suggesting that genes are not the sole determinants and that nongenetic factors also play a role in the expression of the phenotype. Irrespective of the disease, it is obvious that the reduction of human suffering and the improvement in quality of life will continue to be primary beneficiaries of gene–environment research.

View chapterPurchase book

Drinking Water Treatment and Distribution

Charles P. Gerba, Ian L. Pepper, in Environmental Microbiology (Third Edition), 2015

28.3.3 Microbial Community Structure

Genomic-based molecular methods have greatly increased our ability to understand microbial community dynamics in drinking water distribution systems. Recent studies indicate that these communities are complex and influenced by the source water (ground vs. surface), chemical properties of the water, treatment and type of disinfectant residual. Studies have shown Alphaproteobacteria, Betaproteobacteria or Gammaproteobacteria are in predominance (Hwang et al., 2012). The abundance of different groups of bacteria has been found to vary between distribution systems that have a free chlorine residual and those that use chloramines (Gomez-Alvarez et al., 2012). Such changes in community structure can be significant to protection of community health as it was found that disinfection type can cause changes in abundance of opportunistic pathogens (Tables 28.7 and 28.8).

Table 28.7. Distribution of Opportunistic Pathogens in Distributions with Different Disinfection Residuals

OrganismPercent of Total inFree ChlorineChloramineMycobacterium1.2919.65Legionella0.310.09Amoeba0.03>0.0001

Table 28.8. Distribution of Members of Bacteria Domain Determined via Taxonomic Identifications of Annotated Proteins at the Class Level

DomainFree Chlorine %aChloramine %Actinobacteria6.227.8Cytophaga02.3Flavobacteria02.3Sphingobacteria02.0Chlamydiae0.40.1Chlorobia1.40Chloroflexi1.30Gloeobacteria1.30Cyanobacteria9.00Bacilli4.10Clostridia5.60Planctomycetacia1.30Alphaproteobacteria35.122.5Betaproteobacteria6.224.1Deltaproteobacteria11.910.5Gammaproteobacteria0.50.1Other classes representing <1%15.48.4Total100100

aEach number in brackets=% total sequences in each group.

Source: Gomez-Alvarez et al. (2012).

View chapterPurchase book

Technological Revolutions: Possibilities and Limitations

Denis Faure, Dominique Joly, in Insight on Environmental Genomics, 2016

2.2 Genomic DNA sequencing from a single cell: "single cell genomics"

Genomic DNA sequencing from a single cell, "single cell genomics" (SCG), is an innovative use of NGS.

SCG is achieved through several steps:

cell isolation by dilution, micromanipulation or automated sorting by flow cytometry;

lysis of cells;

WGA (Whole Genome Amplification)-based amplification of DNA, mostly using the DNA polymerases Phi29 or Bst with the "multiple displacement amplification" (MDA) method;

NGS sequencing. This original method still suffers from some defects and biases. One of them is linked to difficulties in the lysis, i.e. in the disintegration of the cell wall of organisms. Another is the biases induced by the MDA amplification, as it generates DNA chimeras. Some of these biases can be reduced by decontaminating the reactives, miniaturizing and, therefore, reducing reaction volumes from nanoliters (10− 9 liters) to picoliters (10− 12 liters).

Completion rates of the resulting genomes are variable (0 to 100%) and depend on many factors such as the amount of available DNA after the cell anlysis, the contamination by other DNA sequences and the genome complexity. The SCG approach is expected to enable the identification of rare or still unknown organisms from one single cell. This would give access to non-cultivable organisms, to the characterization of infra-specific biodiversity to document genomic variability or the inter-specific variability within communities of organisms and even to the study of their evolution and adaptation by detecting mutations within populations of organisms. Other fields of research that the SCG approach opens up include studies on epigenomes, embryology, organogenesis and neurobiology as well as medical and clinical studies on humans, related for example to research on cancer processes [WAN 15].

View chapterPurchase book

Public driven and public perceptible innovation of environmental sector

Attila Gere, ... Howard Moskowitz, in Innovation Strategies in Environmental Science, 2020

8 Conclusions

Mind genomics and text mining are complementary tools. Text mining looks at the way people emit information when communicating. Text mining is the big data of human behavior. To understand behavior, there must be rules created in an ad hoc way. Only then do the data make sense. Text mining provides a great deal of information, but it requires heavy-duty computation to extract the patterns. Metaphorically, text minding can be seen as a concentrated effort with a sea of information, increasing the concentration and usability of information so that the originally dilute input is more structured and usable, with patterns waiting to emerge. However, the patterns must be realized by the researcher who makes the interpretation.

Mind genomics can be likened to working with highly concentrated materials, in which the objective is to run simple experiments to extract vital information. One does not need to pore through masses of data. Rather, as few as 50 respondents in a rapid, 4-h effort, from start to finish, allows the research to understand the mind-sets, and in turn to create a PVI. What mind genomics lacks in scope in terms of materials, it more than compensates for in terms of insight, data usability, archival information, and future applications beyond the scientific experiment itself.

Mind genomics and text mining work on different inputs with different analytic tools and produce outputs substantially different in nature and scope. They are complementary to each other, rather than competitors. They should be considered companions that help each other to achieve the task of understanding, predicting, and using. We have shown the beginning of this, using text mining, based on millions of data points, to inform the inputs of mind genomics, the experiment.

View chapterPurchase book

Foreword

Farooq Azam, in Biogeochemistry of Marine Dissolved Organic Matter (Second Edition), 2015

Genomic predictions have provided powerful constraints. They tell us the molecular interactions among DOM molecules and microbes that are possible. However, predicting the biogeochemical dynamics of the complex DOM pool also requires ecophysiological and biochemical studies of DOM-bacteria interactions to determine: what DOM transformations do take place, at what rates, by what biochemical mechanisms, subject to what regulatory forces and in what ecosystem context. This is indeed a tall order; but the problem is critical to solve because of the central role of DOM-bacteria interactions in predicting the carbon cycle of the future ocean. Ocean acidification and warming are likely to affect the nature and rates of microbial production and transformation of DOM with potential influence on the carbon balance between the ocean and the atmosphere. Understanding what renders some of the DOM semi-labile or refractory will also require such mechanistic studies. This important research on dissolved phase carbon cycling and sequestration requires new methods, model systems, and concepts (e.g., Microbial Carbon Pump; Jiao et al., 2010) addressing in situ dynamics and interactions among microbes and (DOM) molecules.

Read full chapterView PDF

Pyrethroids

M.K. Ross, in Encyclopedia of Environmental Health, 2011

Modern 'Omic' Approaches to Study Pyrethroids

Genomic and toxicogenomic studies on pyrethroids have focused on the underlying metabolic differences between insecticide-resistant and insecticide -sensitive strains of mosquitoes (A. gambiae). Few 'omic' studies have focused on the effects of pyrethroids in mammalian species, although some have examined the neurotoxicological impact of these compounds. For example, the type II pyrethroid cyfluthrin was shown to both up- and downregulate several genes in primary human fetal astrocytes. The affected genes were shown to encode chaperones, transcription factors, transporters, and signal transducers. A recent study indicated that pyrethroids induced changes in gene expression in the rat frontal cortex that influence branching morphogenesis, suggesting that these compounds may act as developmental neurotoxicants that affect normal neuronal morphology. In other studies, exposure of mice to pyrethroids (deltamethrin and permethrin) elevated the amount of dopamine transporter (DAT) in the brain. It has been suggested that increased levels of DAT may contribute to Parkinson disease due to the upregulation of a protein that acts as a gateway for toxicants into dopaminergic neurons. This finding may have significance in terms of the hypothesis that pesticides contribute to Parkinson disease in humans.

View chapterPurchase book

Functional Genomics and Molecular Analysis of a Subtropical Harmful Algal Bloom Species, Karenia brevis

T.I. McLean, M. Pirooznia, in Encyclopedia of Environmental Health, 2011

Functional genomics is an attempt not only to identify the complement of genes that a particular organism contains in its genome, but also to understand how the expression of those genes are regulated and how the gene products interact to produce the biology associated with an organism. Genomics-based studies initially focused on biomedically important model organisms, but now they are being applied to a myriad of organisms for a multitude of reasons. The study of dinoflagellates is now benefiting from these techniques and analyses. This article describes some new findings surrounding the molecular nature of a particular dinoflagellate, Karenia brevis, which is endemic to the Gulf of Mexico and causes near-yearly harmful algal blooms to one or more coastlines within the Gulf. Four independent expressed sequence tag (EST) libraries have been constructed for this dinoflagellate creating a large genomics resource for investigating facets of this organism's biology such as why and how it forms blooms, how it produces toxins, how it regulates its growth, and how it interacts with its environment. Some preliminary work is highlighted that shows how the organism may be sensing its environment and regulating its gene expression and growth accordingly.

View chapterPurchase book

Pharmaceuticals

Bryan G. Reuben, in Encyclopedia of Physical Science and Technology (Third Edition), 2003

VIII.B Genomics and Proteomics

Genomics is the branch of biotechnology dealing with the structure of DNA. It is possible today to sequence the units in the DNA of different organisms, and that of the TB microorganism, among others, has already been fully identified. The human genome was elucidated in 2000. So far, about 100 genes involved with disease have been identified, most notably those responsible for Huntingdon's chorea and sickle cell anemia. It was estimated that, by 2000, the pharmaceutical industry would have invested nearly $5 billion in genomics.

Knowledge of the DNA sequence enables one to predict the protein that a cell might synthesize. It should produce an abundance of well-defined disease-related targets for drug therapy. It was thought at one time that, since genes encoded for proteins, a cell's protein state would in effect be a projection of its genetic state, but this hope proved unfounded. Correlation between the gene structure and protein production is low, showing that that the presence and abundance of specific proteins cannot be predicted from the presence of the genes that encode them.

The fact that a gene's structure did not predict protein production was a setback to the biotechnology industry. The new field of proteomics developed, and was concerned with protein-level analysis. This included the liberalization of protein within the cell, the other proteins with which it gave short- or long-lived complexes, and modifications such as phosphorylation and glycosylation, which are central to the function of the protein but which are not coded genetically. There promises to be an explosion of knowledge of the protein level on the same scale as has been seen on the genetic level.

Some of these applications are intellectually and commercially exciting. For example, tamoxifen (110) is the most widely prescribed anticancer drug with 2.8 million prescriptions per year. Meanwhile, only 40% of women with breast cancer respond to it. Presumably, the other women have an enzyme that destroys it before it reaches the tumor. If this enzyme could be blocked, tamoxifen could be more widely used. Alternatively, it might be possible to find which people did not produce enzymes to metabolize certain drug products and would therefore suffer adverse reactions when exposed to them. If that section of the population could be excluded, certain drugs not permitted at present could become usable.

There are many obstacles to the third chemotherapeutic revolution. Innovation is not a linear process, and the science simply may not work. It may work but take too long and be too expensive for the people who fund it. Political pressure on the drug companies to set lower prices may mean that resources are not available. On almost any scenario, there is a mismatch between what the drug companies are hoping to deliver and what the paymasters–governments, insurance companies and so on—are prepared to pay for. How this conflict will be resolved is still in doubt.

Clinical genome sequencing (GS) requires additional elements that go well beyond the technology.

•

GS can be applied appropriately in many clinical situations, but it is important to define how and when this test should be used, and to establish physician/patient communication and support, quality laboratory process, analytical validity, ongoing proficiency testing, specific bioinformatics analyses and filters, and interpretation and reporting processes specific to that clinical application.

•

There may be multiple approaches that are appropriate depending on the clinical question(s) being assessed.

Implementing clinical GS in a clinical laboratory is labor intensive and requires a robust laboratory and bioinformatics process, but can be implemented successfully to improve patient outcomes.

This field is evolving extremely rapidly and will undoubtedly create opportunities for new professions in the area of bioinformatics and genetic analyses.

Advantages include the abilities to rapidly query many genomic regions, detect a wide range of variants. Genomic approaches can be used in a targeted or hypothesis-free fashion, and data can be banked for future study as data sets grow and knowledge evolves.

Disadvantages of genomic sequencing

Genomic sequencing can result in a potentially overwhelming amount of information, and tools must be used to help best manage derived data.

Genomic sequencing will inevitably result in both false-negative and false-positive data, and analysis must take into account these and related issues.

Comparison: exomes versus genomes

•

While genome sequencing may eventually replace exome sequencing, exome sequencing may require less resources and offer faster results with higher coverage

•

Genome sequencing includes many potentially important regions that are outside the exome, and can detect certain types of variants (such as structural variants) that may be missed in exome sequencing.

•

Analyzing individual and multiple data sets for causal mutation discovery

Phenotypically similar unrelated probands

•

Unrelated probands who clinically have highly similar phenotypes can be used to identify common genes or pathways with potentially pathogenic variants.

Familial studies

•

Analysis of small or large families through genomic sequencing can help prioritize causal variants, and even in cases when the inheritance pattern is unclear.

Using databases of population variation

•

Filtering through publicly available and private databases of population variation is a key step in ranking potentially causal variants.

Incorporation of pathway-related data

•

Candidate genes may be suggested by knowledge of relevant biological networks, and a pathway-related approach may suggest candidate genes.

Recognizing and managing artifacts

•

The large amount of data derived through genomic sequencing will result in the detection of artifacts, and the ability to recognize these will improve analysis ability.

Functional interpretation of variants

•

Predictive models of variant effect are an important part of genomic analysis.

•

Ultimately, in order to understand disease processes and basic biological mechanisms, predictive functional models cannot substitute for bench-based investigations.

Combinatorial approaches

•

The above considerations and approaches can and should be used combinatorially in order to expedite and strengthen analysis.

•

Genomic sequencing in the clinical realm

Clinical utility: Translating genomic knowledge to more general health care situations

•

The nature of clinical medicine includes critical time-pressure, and there is strong incentive to generate, analyze, and interpret genomic sequencing data quickly and accurately.

•

Preplanning for the management of incidental medical information derived through clinical sequencing will benefit genomicists, health care workers, and patients.

Consequences of genomic sequencing

The increased use of genomic sequencing is changing how we view the clinical spectrum and severity of many constitutional disorders, and will affect penetrance estimates in numerous conditions, as well as how we view inheritance patterns.

Genetic counseling and ethical issues

Though not the focus of this chapter, the use of genomic sequencing in clinical contexts raises many critical issues related to genetic counseling and bioethics.

Conclusions and future directions

Genomic sequencing data will be most powerful when considered in the context of other biomedical and health parameters.

Genomicists must strive to ensure that the benefits of new genomic sequencing methodologies are not limited to certain individuals in specific parts of the world.

View chapterPurchase book

Fungal Genomics

Dee A. Carter, ... Raul E. Vera, in Applied Mycology and Biotechnology, 2004

7 CONCLUSION

Screening genome sequencing data can provide researchers with a relatively quick and cost-effective method of isolating and characterising highly polymorphic genetic markers. The number of fungal genomic projects is small when compared with other organisms. Of the 379 genome project websites currently available, only sixteen (~ 4%) include fungal projects, and these cover only twelve fungal species. The continuing interest in fungal genetics, their importance as crop, animal and human pathogens, their relatively small size compared with other eukaryote genomes, and the gradually diminishing cost of genome sequencing mean this number can be expected to grow steadily in the near future. Molecular markers are indispensable for characterising and identifying fungal species and strains, thus their development and application will be of use to all researchers concerned with fungal biology and genetics.

View chapterPurchase book

Carbohydrate Chains: Enzymatic and Chemical Synthesis

T.J. Tolbert, C-H Wong, in Encyclopedia of Biological Chemistry (Second Edition), 2013

Recombinant expression of glycosyltransferases

Genomic sequencing efforts have made the DNA sequences of many mammalian and bacterial glycosyltransferases freely available. This has enabled efforts to recombinantly overexpress glycosyltransferases for use in the synthesis of carbohydrate chains. The use of bacterial and yeast expression systems has allowed several mammalian glycosyltransferases to be produced on relatively large scale, and it has also been recognized that many bacterial glycosyltransferases, which are often easier to express, can be utilized to produce mammalian-type carbohydrate structures. These efforts are ongoing and are continually increasing the range of carbohydrate structures that can be formed using glycosyltransferases.

View chapterPurchase book

Organellar and Metabolic Processes

Laure Michelet, ... D. Lemaire, in The Chlamydomonas Sourcebook, 2009

a. The GRX family

Genome sequencing helped to identify a multiplicity of glutaredoxins (Lemaire, 2004; Rouhier et al., 2004b). In Arabidopsis, thirty-one genes coding for three different types of glutaredoxins have been identified: six CPYC-type, four CGFS-type, and twenty-one CC-type (Figure 11.9). As already observed for the thioredoxin multigenic family, Chlamydomonas contains fewer glutaredoxin isoforms than Arabidopsis, but in that case, not all types are represented. Indeed, Chlamydomonas contains six glutaredoxins, distributed only into two different types: two CPYC-type glutaredoxins and four CGFS-type glutaredoxins (Figure 11.9 and Table 11.1).

Figure 11.9. Glutaredoxins in Chlamydomonas and Arabidopsis. Phylogenetic tree of the GRX family in Chlamydomonas and Arabidopsis. The unrooted tree was constructed with Clustal X, and gaps were excluded. Accession numbers for Chlamydomonas glutaredoxins are given in Table 11.1.

View chapterPurchase book

Modern Methods in Natural Products Chemistry

Lew Mander, in Comprehensive Natural Products II, 2010

Genome sequencing projects have provided a wealth of information on gene identities in many prokaryotic and eukaryotic organisms. The daunting challenge for proteomic research is now to assign the molecular, cellular, and physiological functions for the full complement of proteins encoded by the genome. Rising to this challenge, a new powerful chemical proteomic strategy, termed activity-based protein profiling ('ABPP') has been systematically explored by Sieber and coauthors (Chapter 9.17) to characterize enzyme function and regulation directly in native biological systems. The principal concept of this technology is the covalent active site labeling of distinct enzyme classes by functionalized small molecules, which are used as chemical probes.

View chapterPurchase book

Judgment and Decision Making in Genome Sequencing

William M.P. Klein, ... Erin Turbitt, in Clinical Genome Sequencing, 2019

Abstract

Genome sequencing involves a host of decisions on the part of the provider and client, including whether to be tested, when and how to receive sequencing results, whether to inform biological relatives, and whether to take protective actions to reduce risk. In this chapter, we consider the relevance of research in the decision sciences to decisions in the genome sequencing context. We review the role of cognitive factors (e.g., use of heuristics, low numeracy), affective factors (e.g., incidental emotions, affective forecasting), and motivational factors (e.g., regarding oneself positively, adhering to social norms) in judgment and decision-making. In so doing, we consider how knowledge of these factors might inform practice toward the end of improving preference-based decision-making regarding the results of genome sequencing.

View chapterPurchase book

MOLECULAR CHARACTERIZATION

Aharon Razin, Paul Renbaum, in Molecular and Diagnostic Procedures in Mycoplasmology, 1995

Genomic DNA Sequencing

Genomic sequencing can be used to analyze the status of methylation of all cytosine residues. In the original method for genomic sequencing, the 5-methylcytosine residues resist the C reaction and, therefore, appear as a blank on a sequencing gel. Since this method detects each 5-methylcytosine residue in the DNA sequence, this technique complements the restriction enzyme analysis which is limited to sites for which methyl-sensitive restriction enzymes exist. Following the conventional reactions of the chemical sequencing method, the DNA fragments are subjected to electrophoresis through an 80 × 40-cm 8% acrylamide gel. The gel is electrotransferred to Gene screen nylon mesh and the nylon sheet is hybridized with a labeled sequence-specific probe. A missing band on the autoradiogram in a cytosine position reflects the presence of 5-methylcytosine in this position in the sequence. The disadvantage of this procedure is that it will not allow distinction between unmethylated and partially methylated cytosine residues in a population of DNA molecules. This drawback has been overcome by a modified method in which cytosine residues are deaminated to uracil, whereas 5-methylcytosine remains nonreactive. The sequence under investigation is then amplified by polymerase chain reaction (PCR) with two sets of strand-specific primers to yield a pair of fragments, one from each strand, in which all uracil and thymine residues have been amplified as thymine and only 5-methylcytosine residues have been amplified as cytosine. The PCR products can be sequenced directly to provide a strand-specific average sequence for the population of molecules or can be cloned and sequenced to provide methylation maps of single DNA molecules. Detailed description of the methods can be found in Saluz and Jost

The Human Genome Project

The Human Genome Project differs from any previous biological or medical project in size and cost. Its ambitious goal is the deciphering (sequencing) of all 3 billion building blocks of our genetic make-up—the so-called DNA—by 2005, the identification of all genes encoded in that DNA, and the understanding of the role of these genes in health and disease. The knowledge of these genes and their function is crucial for basic biological research as well as for the improvement of prevention, diagnostics, and therapy of disease. It is the prerequisite for a targeted design of pharmaceuticals and for novel approaches like gene therapy. This knowledge poses hope to millions of affected people and contains also an immense economic potential.

The Human Genome Project was started in the USA in 1990, James Watson, the codiscoverer of the DNA structure being its first co-ordinator. It was clear from the beginning, due to the estimated cost of US $ 3 billion and the immense amount of work involved, that the Human Genome Project had to include many countries. The Human Genome Organization (HUGO), an independent international organization of genome scientists, was established to co-ordinate the duties. The US Human Genome Project started off with considerable public funding (US $ 87 million in 1990) and was soon followed by the UK and France. Between 1991 and 1996 France contributed a comprehensive genetic map of the human genome. It is noteworthy that this work was predominantly financed by private money from a patients' association. The British Wellcome Trust for its part set up the world's largest sequencing facility, the Sanger Centre. Smaller initiatives later emerged, for instance in Japan and Canada. No such activity, however, was seen in Germany until 1995. By 2001 a nearly complete 'working draft' of the human genome has been presented by the publicly funded Human Genome Project, including a significant German contribution. The complete sequence will be available in public databases some time ahead of schedule, probably by 2003. In 1998 an emerging competition to the Human Genome Project by private companies, namely by Craig Venter's Celera, had speeded up the deciphering of the human genetic code tremendously. But the knowledge of our comprehensive genetic make-up is not considered as a benefit for mankind by everybody. Profound ethical issues are raised by the possibilities inherent in this knowledge. This had been realized by the founders of the project and therefore, about 3 percent of the budget was dedicated to exploring the ethical, legal, and social implications of the Human Genome Project. In most countries participating in the Human Genome Project, an extensive discussion took place about the opportunities and risks attached to these novel technologies. In Germany a parliamentary committee was established to elucidate the topic. But the discussion was much more polarized in Germany than in other countries.

View chapterPurchase book

Human Genome Project: Japanese Perspective

T. Gojobori, in International Encyclopedia of the Social & Behavioral Sciences, 2001

The Japanese human genome project began in 1988 as a response to the progress in the corresponding activities in the United States and Europe. At the outset the human genome project in Japan was confronted by criticism from various sources. There was a strong belief amongst certain group that this kind of project was not true research but rather mere routine work. Moreover, difficulties were experienced in relation to the scarcity of technicians skillful in this area who could support the genome researchers and also in the reluctance of government and private institutions to invest large grants in a single project. Japan overcame these difficulties by redefining the human genome project in the following way. In Japan, the aim of the human genome project was redirected not only to sequence the genome but also to include functional analysis of genes and elucidation of tertiary protein structures. Although this redefinition of the human genome project has successfully eased various criticisms, the main focus of the project has been diffused to a considerable extent, leading to an ambiguity of the Japanese contribution to the international effort of human genome sequencing. A successful outlook for the Japanese human genome project will be in placing more emphasis on the promotion of functional genomics, comparative genomics, analysis of genomic diversity, and the development of DNA chips or microarrays.

View chapterPurchase book

Measurement

Jules J. Berman Ph.D., M.D., in Principles of Big Data, 2013

Gene Counting

The Human Genome Project is a massive bioinformatics project in which multiple laboratories helped to sequence the 3 billion base pair haploid human genome (see Glossary item, Human Genome Project). The project began its work in 1990, a draft human genome was prepared in 2000, and a completed genome was finished in 2003, marking the start of the so-called postgenomics era. There are about 2 million species of proteins synthesized by human cells. If every protein had its own private gene containing its specific genetic code, then there would be about 2 million protein-coding genes contained in the human genome. As it turns out, this estimate is completely erroneous. Analysis of the human genome indicates that there are somewhere between 20,000 and 150,000 protein-coding genes. The majority of estimates come in at the low end (about 25,000 genes). Why are the current estimates so much lower than the number of proteins and why is there such a large variation in the lower and upper estimates (20,000 to 150,000)?

Counting is difficult when you do not fully understand the object that you are counting. The reason that you are counting objects is to learn more about the object, but you cannot always count an object accurately until you have already learned a great deal about the object. Perceived this way, counting is a bootstrapping problem. In the case of proteins, a small number of genes can account for a much larger number of protein species, because proteins can be assembled from combinations of genes, and the final form of a unique protein can be modified by so-called post-translational events (folding variations, chemical modifications, sequence shortening, clustering by fragments, etc.). The methods used to count protein-coding genes can vary.80 One technique might look for sequences that mark the beginning and the end of a coding sequence, whereas another method might look for segments containing base triplets that correspond to amino acid codons. The former method might count genes that code for cellular components other than proteins, and the latter might miss fragments whose triplet sequences do not match known protein sequences.81 Improved counting methods are being developed to replace the older methods, but a final number evades our grasp.

The take-home lesson is that the most sophisticated and successful Big Data projects can be stumped by the simple act of counting.

View chapterPurchase book

Alcoholism: Genetic Aspects

K.E. Browman, J.C. Crabbe, in International Encyclopedia of the Social & Behavioral Sciences, 2001

2.2 Quantitative Trait Loci (QTL) mapping strategies

The Human Genome Project has also led to genome mapping and DNA sequencing in a variety of other organisms including the laboratory mouse. Late twentieth-century developments in the physical mapping of the mouse make positional cloning of genes involved in various behaviors more likely. However, most behaviors (including responses to alcohol) are influenced by multiple genes. Behaviors, or complex traits, influenced by a number of genes are often termed quantitative traits. Within a population, a quantitative trait is not all-or-none, but differs in the degree to which individuals possess it. A section of DNA thought to harbor a gene that contributes to a quantitative trait is termed a quantitative trait locus (QTL). QTL mapping identifies the regions of the genome that contain genes affecting the quantitative trait, such as an alcohol response. Once a QTL has been located, the gene can eventually be isolated and its function studied in more detail. Thus, QTL analysis provides a means of locating and measuring the effects of a single gene on alcohol sensitivity.

In tests of sensitivity to convulsions following alcohol withdrawal, QTLs have been found on mouse chromosomes 1, 2, and 11. The QTL on chromosome 11 is near a cluster of GABAA receptor subunit genes. A number of subunits are needed to make a GABAA receptor, and the ability of a drug to act on the receptor seems to be subunit dependent. A polymorphism in the protein-coding sequence for Gabrg2 (coding for the γ2 subunit of the GABAA receptor) has been identified. This polymorphism is genetically correlated with duration of loss of righting reflex and a measure of motor incoordination following alcohol administration.

The use of QTL analysis has allowed us to begin the process of identifying the specific genes involved in alcohol related traits. Because each QTL initially includes dozens of genes, not all of which have yet been identified, it will require much more work before each QTL can be reduced to a single responsible gene. For the time being, one important aspect of QTL mapping in mice is that identification of a QTL in mice points directly to a specific location on a human chromosome in about 80 percent of cases. Thus, the animal mapping work can be directly linked to the human work in studies such as the COGA described in Sect. 1.1, which is in essence a human QTL mapping project. By using transgenic animal models (mice in which there has been a deliberate modification of the genome), such as null mutants, QTLs can be further investigated.

View chapterPurchase book

Y-chromosomes and Evolution

A. Ruiz-Linares, in International Encyclopedia of the Social & Behavioral Sciences, 2001

6 Conclusion

Developments resulting from the Human Genome Project have recently catapulted the use of Y-chromosome markers into the forefront of the study of human population origins and diversification. The large number of markers currently available together with novel highly efficient technologies enable analyses of an unprecedented resolution and scale. Evolutionary analyses are facilitated by the fact that slowly evolving markers allow the unambiguous assessment of the evolutionary relationship between Y-chromosomes. Rapidly evolving markers can refine analyses within specific lineages or populations. These studies illuminate not only questions related to the origin of our species and its early diversification but also allow the probing of more recent demographic events. The synthesis of genetic data with information obtained from sources including geology, paleoanthropology, archaeology, and historical demography is allowing a refined reconstruction of human evolution stretching from our origins as a species all the way to the exploration of quite recent historical events.

View chapterPurchase book

Cytomics: From Cell States to Predictive Medicine

G. Valet, ... A. Kriete, in Computational Systems Biology, 2006

A Single-cell image analysis

One of the most important outcomes of the Human Genome Project is the realization that there is considerably more biocomplexity in the genome and the proteome than previously appreciated (Herbert 2004). Not only are there many splice variants of each gene system, but some proteins can function in entirely different ways (in different cells and in different locations of the same cell), lending additional importance to the single-cell analysis of laser scanning cytometry and confocal microscopy. These differences would be lost in the mass spectroscopy of heterogeneous cell populations. Hence, cytomics approaches may be critical to the understanding of cellular and tissue functions.

Fluorescence microscopy represents a powerful technology for stoichiometric single-cell-based analysis in smears or tissue sections. Whereas in the past the major goal of microscopy and imaging was to produce high-quality images of cells, in recent years an increasing demand for quantitative and reproducible microscopic analysis has arisen. This demand came largely from the drug discovery companies, but also from clinical laboratories. Slide-based cytometry is an appropriate approach for fulfilling this demand (Tarnok and Gerstner 2002). Laser scanning cytometry (Gerstner et al. 2002; Tarnok and Gerstner 2002; Megason et al. 2003) was the first of this type of instrument to become commercially available, but today several different instruments are on the market (Jager et al. 2003; Molnar et al. 2003; Schilb et al. 2004).

These types of instruments are built around scanning fluorescence microscopes that are equipped with either a laser (Tarnok and Gerstner 2002; Schilb et al. 2004) or a mercury arc lamp as the light source (Bajaj et al. 2000; Molnar et al. 2003). The generated images are processed by appropriate software algorithms to produce data similar to flow cytometry. Slide-based cytometry systems are intended to be high-throughput instruments, although at present they have a lower throughput than flow cytometers. These instruments allow multicolor measurements of high complexity (Gerstner et al. 2002; Ecker and Steiner 2004) comparable to or exceeding that of flow cytometers.

A substantial advantage over flow cytometry is that cells in adherent cell cultures and tissues can be analyzed without prior disintegration (Smolle et al. 2002; Kriete et al. 2003; Ecker et al. 2004; Gerstner et al. 2004). In addition, due to the fixed position of the cells on the slide or in the culture chamber cells can be relocated several times and reanalyzed. Even restaining and subsequent reanalysis of each individual cell is feasible. Because a high information density on the morphological and molecular pattern of single cells can be acquired by slide-based cytometry, it is an ideal technology for cytomics.

Although at present not realized, the information density per cell can be increased further by implementing technologies such as spectral imaging (Ecker et al. 2004), confocal cytometry (Pawley 1995), fluorescence resonance energy transfer (FRET) (Jares-Erijman and Jovin 2003; Ecker et al. 2004; Peter and Ameer-Beg 2004), near-infrared Raman spectroscopy (Crow et al. 2004), fluorescence lifetime imaging (FLIM) (Murata et al. 2000; Peter and Ameer-Beg 2004), optical coherence tomography (Boppart et al. 1998), spectroscopic optical coherence tomography (Xu et al. 2004), and second harmonic imaging (Campagnola et al. 2003). All of these technologies mark the progress in optical bio-imaging.

In the future, developments in imaging resulting from a family of concepts that allows image acquisition far beyond the resolution limit (down to the nm range) are expected. These include multiphoton excitation (Manconi et al. 2003), ultrasensitive fluorescence microscopes (Hesse et al. 2004), stimulated emission depletion (STED) microscopy (Hell 2003), spectral distance microscopy (Esa et al. 2000), atomic force microscopy (AFM) and scanning near-field optical microscopy (SNOM) (Rieti et al. 2004), and image restoration techniques (Holmes and Liu 1992). Using laser ablation in combination with imaging, even thick tissue specimens can be analyzed on a cell-by-cell basis (Tsai et al. 2003).

View chapterPurchase book

Ethical Dilemmas: Research and Treatment Priorities

M. Betzler, in International Encyclopedia of the Social & Behavioral Sciences, 2001

2.2 Ethical Dilemmas in Biotechnology

The far-reaching potential advances in gene therapy and genetic engineering for humans (The Human Genome Project), and the implications for humans of cloning, have given rise to ethical dilemmas scientists have to face. The following examples can illustrate how progress in genetic engineering generates dilemmas between conflicting obligations and/or conflicts due to risk assessment (Chadwick 1992).

The identification of human genes, for example, can lead to the following conflict: on the one hand genetic knowledge enhances therapies for hereditary disease, on the other hand, it poses problems about the potentially exploitative use of resources and genetic information. The attempt to sequence the entire human genome raises ethical questions whether the risks of exploitation outweigh the benefits of knowledge.

Genetic alterations passed on to future generations through so-called germline therapy raise further problems regarding consent while, at the same time, 'improving' human genetic potential for future generations. Do we have an obligation to present generations to relieve suffering by seeking treatments for genetic disease or do future generations have a right to an unmodified genetic inheritance? Equally, the advantages of genetic screening can be outweighed by the costs of stigmatization on the basis of a person's genetic make-up. Such individuals might find themselves unemployable or uninsurable. There also arises the question whether the very existence of genetic screening can exert pressure on individuals with regard to their reproductive decisions, thus impairing their autonomy. How can genetic public health be fostered without practicing eugenics? Arguments about the moral urgency of relieving suffering also conflict with arguments about human dignity that discredit the production of 'designer babies' (Annas and Elias 1992).

The incorporation of foreign genes into the genome of an organism is commonly discussed in connection with animals and plants. One dilemmatic issue concerns the interests of the host organism (particularly in the case of animals), the consequences for human health and for other species, and the risks of releasing genetically engineered organisms into the environment.

A related issue has been the matter of justification of treatment. Animal experimentation poses the question whether animals have moral status to the detriment of life-enhancing research results for humans. As subjects of genetic engineering, for example, farm animals have suffered from unintended deleterious effects, while research animals have suffered the consequences of being intentionally bred for propensity to develop debilitating diseases. A further important issue is related to the question whether an agent who had moral status can cease to have that status. The human cases of the brain dead, anencephalic infants, and those in a permanently vegetative state are cases in point. If so, then the case of xenotransplantation or the transplanting of animal organs into humans is affected. The interlocking questions of moral standing, justification of treatment, and loss of moral considerability can thus cause conflicts as to which consideration should be given more weight. Ethical dilemmas in research are thus a challenge to those within and those outside research, to debate whether research practices and their effects are right and just. For further treatment see Animal Rights in Research and Research Application; Bioethics: Examples from the Life Sciences; Euthanasia; Genetic Counseling: Historical, Ethical, and Practical Aspects; Reproductive Medicine: Ethical Aspects; Research Subjects, Informed and Implied Consent of.

View chapterPurchase book

Introduction to Human Genome Computing Via the World Wide Web

Lincoln D. Stein, in Guide to Human Genome Computing (Second Edition), 1998

3.5 GDB

GDB, the Genome Database, is the main repository for all published mapping information generated by the Human Genome Project. It is a species-specific database: only Homo sapiens maps are represented. Among the information stored in GDB is:

▪

genetic maps

▪

physical maps (clone, STS and fluorescent in situ hybridization (FISH) based)

▪

cytogenetic maps

▪

physical mapping reagents (clones, STSs)

▪

polymorphism information

▪

citations

To access GDB, connect to its home page (Figure 1.15). GDB offers several different ways to search the maps:

Figure 1.15. The GDB home page provides access to the main repository for human genome mapping information.

▪

A simple search. This search, accessible from GDB's home page, allows you to perform an unstructured search of the database by keyword or the ID of the record. For example, a keyword search for 'insulin' retrieves a list of clones and STSs that have something to do either with the insulin gene or with diabetes mellitus.

▪

Structured searches. A variety of structured searches available via the link labeled 'Other Search Options' allow you to search the database in a more deliberate manner. You may search for maps containing a particular region of interest (defined cytogenetically, by chromosome, or by proximity to a known marker) or for individual map markers based on a particular attribute (e.g. map position and marker type). GDB also offers a 'Find a gene' interface that searches through the various aliases to find the gene that you are searching for.

Searches that recover individual map markers and clones will display them in a list of hypertext links similar to those displayed by Entrez and PDB. When you select an entry you will be shown a page similar to Figure 1.16. Links on the page lead to citation information, information on maps this reagent has been assigned to, and cross-references to the GenBank sequence for the marker or clone. GDB holds no primary sequence information, but the Web's ability to interconnect databases makes this almost unnoticeable.

Figure 1.16. GDB displays most entries using a text format like that shown here.

A more interesting interface appears when a search recovers a map. In this case, GDB launches a Java applet to display it. If multiple maps are retrieved by the search, the maps are aligned and displayed side by side (Figure 1.17). A variety of settings allows you to adjust the appearance of the map, as well as to turn certain maps on and off. Double clicking on any map element will display its GDB entry in a separate window.

Figure 1.17. GDB maps are displayed using an interactive Java applet.

View chapterPurchase book

Human Evolutionary Genetics

J.L. Mountain, in International Encyclopedia of the Social & Behavioral Sciences, 2001

7.1 Human Genome Project

The rapid development of human evolutionary genetics over the final decade of the twentieth century owes a great deal to the Human Genome Project. Much of the technology developed under the auspices of that project (e.g., improved speed and accuracy of DNA sequencing methods) is currently applied in the field. In 1991, a group of human evolutionary geneticists led by L. Cavalli-Sforza proposed that the Human Genome Project be extended to include consideration of variation among individuals. Almost immediately, the proposal encountered criticism from groups concerned that geneticists might exploit individuals contributing DNA samples. Nonetheless, a subsequent Human Genome Project plan did include consideration of variation across individuals as a major goal. New methods of detecting variation among individuals were developed. The result is hundreds of thousands of nucleotide sites known to be polymorphic in humans across the entire human genome. Very few, however, have been studied in a broad set of human populations.

View chapterPurchase book

Familial Studies: Genetic Inferences

A. Vetta, C. Capron, in International Encyclopedia of the Social & Behavioral Sciences, 2001

11 The Future?

McGuffin et al. (2001) envisage a new science of behavior genomics. We suspect that as the unscientific nature of behavior genetic analysis becomes known, researchers will eschew heritability analysis. HGP has made the identification of a genetic disorder easier. If, for example, a large number of individuals suffering from a disorder have mutation at a locus as compared with the normal type, this provides some evidence of the genetic nature of the disorder. Heritability analysis is useless as it relates to a population and not an individual. To find remedies for genetic disorders, type I models are useful. Venter (2001, The Independent, February 12) succinctly summarizes our view when he says, HGP indicates 'to me that we are not hard wired. The idea that there is a simple deterministic explanation—that is: we are the sum total of our genes—makes me as a scientist, want to laugh and cry.'

Future directions

Toward the perfect genome: updating the genome and annotation

A reference genome with minimum assembly and annotation mistakes is desired by scientists in the common carp research community. There are at least three aspects that improve genome quality. First, the accuracy at the single base level needs further improvement. The accuracy of reference genomes is important for downstream analysis. Errors in CDS regions might lead to a frame shift or premature stop codons, resulting in protein errors or short proteins, respectively. The aspiration set by the Human Genome Project contained a maximum of one error per 10 kb of finished sequence (Lander et al., 2001), for a maximum error rate of 0.01%. However, due to the sequencing error of second-generation sequencing reads or third-generation sequencing reads for genomes assembled based on Illumina sequencing or other new generation sequencing platforms, the error rate could be greater than 0.01%. Moreover, the genomes assembled based on early 454 sequencing reads would have higher error rates (up to 1.07%) (Gilles et al., 2011). The published common carp genome was sequenced and assembled by largely relying on 454 sequencing reads generated in 2010. Therefore, it is reasonable to predict that the common carp genome may have a higher rate of error bases compared to genomes based on Illumina sequencing reads. A new set of genome sequencing reads based on newly developed sequencing technologies with a higher accuracy rate are desired to improve the genome assembly in the future. Second, the genome integration rate needs to be improved. Previously, we anchored approximately 875 Mb of the common carp genome onto the genetic map, accounting for only half of the whole genome. The low integration rate was largely due to the relatively low marker density of the genetic map, which retained only 4243 genetic markers. The tetraploid genome of the common carp also causes some problems for genome integration with the genetic map. The common carp underwent one more round of WGD compared to most other teleosts. Therefore, the genetic marker might have more than one best-aligned scaffolds or contig, thereby generating controversial mapping regions during the integration that are usually discarded. An ultrahigh density genetic map with more genetic markers would be the best solution to improve the integration rate for the common carp genome. Currently, an ultrahigh density genetic map based on Carp 250K SNP array genotyping is under construction. We expect to produce a map with over 10,000 markers in the near future. Third, updated gene annotation is necessary for the common carp genome. A high-quality dataset of protein-coding genes is essential for functional genomics and evolutionary analyses. Two indicators could be used to estimate the quality of the annotated genes: (1) homologous gene proportions and (2) the gene length distribution among closely related species. It is estimated that approximately 70% of human genes have at least one obvious zebrafish ortholog. Therefore, it is reasonable that the homologous gene proportion between two teleosts should be over 70% (Howe et al., 2013). Indeed, the proportions of homologous genes to other model fish in many published fish genomes was over 80% (Tine et al., 2014; Wu et al., 2014). A homologous gene proportion threshold of 80% is suitable to estimate the quality of annotated genes. Up to June 2015, ten fish genomes were available in the Ensembl genome database (Flicek et al., 2014). A comparison with the protein lengths among these fish revealed that the median protein length ranged from 350 to 430 aa and that the length distributions were similar. Hence, the protein length distribution is a second reliable indicator to measure quality. Although the homologous gene proportion of the V1.0 gene prediction in common carp is as high as 90.84% (47,795 out of 52,610), the median length of the common carp protein is just 303 aa, which is significantly shorter than other teleosts. Hence, it is necessary to improve the completeness of gene prediction of the common carp genome and release the V2.0 genes.

Construction of comprehensive genome databases

Giga data of the common carp genome have been developed, including genome sequences, resequencing, transcriptome sequencing of multiple strains (Xu et al., 2012), small RNA-sequencing (Zhu et al., 2012), genetic maps (Zhao et al., 2013), QTL analysis (Laghari et al., 2015), SNP arrays (Xu et al., 2014b), and BAC-end sequences (Xu et al., 2011b). The genome database of the common carp (named Carpbase) has been developed and used to store all of the aforementioned datasets. The web interface of the database has been constructed to allow data retrieval, blast searches, and genome viewing (http://www.carpbase.org). However, collecting more data and adding new features are necessary to facilitate better data usage in the common carp research community worldwide. First, we will add more genome data from closely related teleosts into the databases, which will enable comparative genomic analysis among the common carp and close species. Second, we will add more commonly used bioinformatics tools to facilitate online analysis of common carp data. Third, the newly generated genome data and updated genome assembly, gene prediction, and annotation will be updated in the databases.

Further develop genome resources from diverse populations

The common carp is one of the most successful species and exhibits great genetic diversity. There are hundreds of geographic populations and domesticated strains worldwide that have adapted to various wild environments with broad ecological spectra, as well as distinct aquaculture patterns in ponds, lakes, and cages. These populations have developed numerous distinct phenotypes in terms of growth rate, temperature and hypoxia tolerance, body color, scale pattern, and body shape, which are in part attributable to genetic diversity due to multiple rounds of WGD events. Although we have collected relatively abundant genetic and genome resources over the past decades for various genomic and genetic studies (especially the complete reference genome and millions of SNP loci of the common carp) (Xu et al., 2012, 2014a,b), we still just see "the tip of the iceberg." Numerous diverse loci in the genome remain undiscovered, many of which are essential resources for genetic breeding and are associated with important traits. Therefore, it is necessary to focus on population genomics and genetic diversity identification. We need to collect sufficient samples representing major wild populations of common carp that harbor rich genetic diversity and collect domesticated strains harboring distinctive traits. Whole genome resequencing with 10- to 20-fold genome coverage of these selected samples will produce millions of markers of genetic diversity, including SNPs and short INDELs. Therefore, the genetic variation map will be constructed for certain populations and strains. However, diversity in terms of single base or several bases would not represent the majority of the genetic diversity. There are still a large number of diverse loci across relatively long genome regions [i.e., large INDELs, copy number variations (CNVs), and chromosome rearrangements] that usually serve as the basis of phenotypic variations and are important for genetic breeding (Li et al., 2014b). However, resequencing strategies based on short reads cannot identify these loci. Therefore, it is necessary to perform pan-genome sequencing and assembly strategies to construct multiple sequencing libraries with various jumping lengths, sequence the genome to achieve 50 to 100-fold coverage, and then assemble the draft genome for certain strains; ultimately, this approach would generate another reference genome for certain strains. Comparative studies between multiple assembled genomes will more comprehensively unveil genetic diversity. The abundant genome resources and tools from diverse populations will facilitate uncovering the genetic basis of important traits, developing new strains with better performance, and understanding the genome and evolution history.

Unveiling the genetic basis of important traits for breeding applications

Dissection of the genetic basis of economically important traits of the common carp is one the major goals of carp geneticists. Genetic mapping and QTL localization followed by positional cloning are the most common genetic approaches used to unveil the genetic basis of target traits and have been successfully applied to genetic studies of crops and farm animals over the past few decades; however, there are few successful instances in aquaculture species, mainly due to the shortage of genetic markers and genome resources. We have developed sufficient genetic markers and a reference genome for the common carp to overcome these limitations over the past few years. The high density SNP array and high throughput genome resequencing will easily generate genome-wide genotyping data for whole genome association and localization. In the next 5–10 years, the genetic bases of important traits of the common carp, including but not limited to traits involved in growth, meat quality, disease resistance, extreme habitat tolerance, reproduction, sex determination, color pattern, and scalation, are expected to be dissected by employing comprehensive approaches, including GWAS, transcriptome analysis, gene ontology and pathway analysis, epigenomics, and gene editing with CRISPR/Cas9 technology. Gradually, the results will facilitate future highly efficient genetic breeding applications.

View chapterPurchase book

Next Generation Sequencing Data Analysis

Ranjeev Hari, Suhanya Parthasarathy, in Encyclopedia of Bioinformatics and Computational Biology, 2019

Read Mapping

Whenever a reference genome becomes available, instead of a de novo assembly strategy, reads are mapped or aligned to the reference genome prior to subsequent analysis steps. The goal of mapping is to realign the vast number of reads back to the respective regions it likely originated from. The mapping of the reads to the reference genome typically involves the alignment of millions of short reads to the genome using fast algorithms. The algorithms are able to function parallelly while taking into account mutations such as polymorphisms, insertions and deletions in order to produce the alignment. In well-known aligners, for example BLAST, the individual query sequence is searched against a reference database using hash tables and seed and extend approaches. With NGS data, often similar methods are adapted to scale the alignment of short query sequences that are in the millions against a single reference genome of large sequences. Advances in mapping algorithms using various other techniques has improved alignment speed while reducing memory and space requirements. Examples of mapping software that are well known and used in NGS data includes SOAP2 (Short Oligonucleotide Alignment Program), BWA (Burrow-Wheels Alignment), NovoAlign and Bowtie 2.

The widely used format of storing mapping information of the reads to the genome is SAM (Sequence Alignment Map) format or its compressed binary form called BAM. While the BAM file is smaller and optimized for machine reading, the SAM file is human readable albeit slower for computer operations. There are 11 mandatory fields in the SAM format specification. Commonly, SAMtools software is used to manipulate and read both BAM and SAM formats.

Typically, mapping of reads to the reference genome is followed by collecting data about the mapping statistics. The summary statistics that is mainly of interest is the percentage of aligned reads, or the mapping rate. The mapping rate of reads to the reference genome are usually only 60%–75%. Besides the limitation due to the intrinsic properties of NGS data and technique that it was generated from, the inability to map to the reference genome can be ascribed to challenging regions in the genome, such as repeat rich regions, that aligners are not able to map to. Moreover, short read lengths in most high throughput NGS technology limits the alignment in mapping to span a small region hence limiting its coverage to convenient areas of the genome. Besides that, limitations such as NGS sequencing error, algorithmic robustness, mutational load and variation contributes to the low mapping rate.

The mapping file generated can be further inspected by region in-depth using visualization tools such as genome viewers which plot pileups (the stacked alignment of the reads). Visualization of reads mapped, for instance, can be important in diagnosing problems in read alignment in certain regions, detecting duplicates, and visualizing variations. Commonly used genome browsers that enable the reading of SAM files include Integrated Genome Viewer (IGV), and Tablet while some web-based browser that achieve similar visualizations include JBrowse, NGB and UCSC genome browser.

View chapterPurchase book

Chromosome Organization

In Cell Biology (Third Edition), 2017

The Human Genome: Variations on a Theme

The human "reference genome" sequence does not come from a single person, but is instead an idealized assembly derived from the DNA of a number of people. Constructing an artificial reference genome is necessary, because although we might imagine that there is only one "human genome," data from sequencing many thousands of genomes have shown that there are dramatic variations in DNA content and sequence among individuals. Famously, analysis of some particularly variable regions of repetitive sequences forms the basis for DNA testing in criminology and paternity testing. Given the large number of genomes sequenced to date, it makes sense to talk of a "typical" genome and how this differs from the reference. Prepare to be amazed. A typical genome has 4 to 5 × 106 differences from the reference! The largest number of affected base pairs are in 2100 to 2500 "structural variants" (changes involving >50 bp). These include deletions, more than 120 LINE and more than 900 SINE insertions, and other changes not found in the reference genome. Overall, they encompass 20 × 106 bp and often occur in regions of repeated DNA sequence. Other variations occur in genes, with a typical genome having approximately 165 mutations that truncate proteins, approximately 11,000 mutations that change protein sequences, and a staggering 520,000 mutations in regions thought to be involved in regulating gene expression. Occasionally, these variations are linked to inherited human disease, and genome-wide association studies (GWAS) correlating sequence changes with human disease are a major ongoing focus of these sequencing efforts. At centromere regions of chromosomes, the content of repeated DNA sequences commonly varies by over 106 bp between different individuals. Overall, this rather staggering variability leads to the question, "What is a 'normal' human genome?"

View chapterPurchase book

Taxonomy of Prokaryotes

Rainer Borriss, ... Hans-Peter Klenk, in Methods in Microbiology, 2011

(b) Similarity search

Similarities between query and reference genomes are determined by using long-established tools for nucleotide-based sequence similarity searches. Currently, NCBI-BLAST, WU-BLAST (Altschul et al., 1990), BLAT (Kent, 2002), BLASTZ (Schwartz et al., 2003) and MUMmer (Kurtz et al., 2004) are available on the web server. High-scoring segment pairs, HSPs, or maximally unique matches, MUMs, are determined by performing similarity searches for each combination of query genome and reference genome. Due to the asymmetric nature of heuristic similarity search strategies, the search is performed twice, first using the reference genome as 'subject sequence' and the query genome as 'query sequence', and secondly, using the reference genome as 'query sequence' and the query genome as 'subject sequence'. The HSPs (or MUMs) are stored in a condensed format using the CGVIZ (Henz et al., 2005), which comprises the start and stop coordinates of the matches together with statistical data (e-value, score, alignment length and percentage identical characters for HSPs, alignment length for MUMs). The resulting data is sufficient for the distance calculation, while preserving storage space (Auch et al., 2010a).

View chapterPurchase book

RNA Modification

Shunpei Okada, ... Tsutomu Suzuki, in Methods in Enzymology, 2015

5.2 Preparation of the Genome and Transcriptome References

For the human genome reference, download the "hg18.2bit" file from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/. If using a genomic reference for another taxon, download it via http://hgdownload.cse.ucsc.edu/downloads.html. Create the BWA index with the genomic reference according to the BWA manual. Prepare an hg18.size file that contains chromosome names and sizes. For the transcriptome reference, prepare a gene coordinate-sorted file that is a tab-delimited text composed of the transcript ID, chromosome, strand, exonStarts, and exonEnds. We implemented this protocol with an hg18 knownGene track retrieved from the UCSC Genome Bioinformatics site. It can be downloaded from the UCSC table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). For example, Edit the file name to "knowngenes.coord." The latest version of genomic and transcriptome sequence files should be used as a reference in your study.

View chapterPurchase book

Cladogenesis☆

M. Ponomarenko, ... N. Kolchanov, in Reference Module in Life Sciences, 2017

In Silico Cladogenesis

The sequencing of the reference genomes of biological species and the initiation of the sequencing of individual genomes represent a challenge to commercial plant and animal breeding: the polygeny of most of the commercially valuable traits requires their adequate computer-based measurement, documentation, systematization, and analysis in each individual.

In 2011, Alexey Doroshkov, Tatyana Pshenichnikova, and Dmitry Afonnikov developed a computer-based simulations for coordinated analysis of morphological and genetic characteristics of leaf hairiness in wheat. Using this tool, the authors found a significant positive correlation between trichome length and trichome density: the longer the trichomes, the denser they are. Additionally, their in silico calculations demonstrated significant correlations between the quantitative measures of leaf hairiness and cultivation and growth conditions and significant differences between wheat cultivars in these measures. Finally, they found abnormally long (probably, heterotic) trichomes in the F1 generation. The segregation ratio of offspring classes with different leaf trichome patterns in crosses was 7:9, which indicates that there is a significant dominant contribution from two unknown genes to the manifestation of the trichome development and shows the way toward identification of these genes.

The computer-based phenomics of individuals is an integral part of the bridge between classic genetics with Mendelian characters and postgenomic genetics with the manifestation of polygenic characters from individual genomes.

View chapterPurchase book

Microbial Ecology in Extreme Acidic Environments

Elena González-Toril, Ángeles Aguilera, in Microbial Diversity in the Genomic Era, 2019

14.4.2 Transcriptomes

For species without a reference genome, mRNA sequencing technology can detect transcripts corresponding to the existing genomic sequences and provide abundant information for a wide range of biological studies (Surget-Groba and Montoya-Burgos, 2010; Liu et al., 2014b; Hong et al., 2017). Transcriptomic studies related to acidophilic eukaryotes are still scarce and mainly related to heavy metal resistance. Thus, the genus Dunaliella is particularly attractive for studies on adaptive mechanisms to extreme environmental conditions since two of the most extremophilic eukaryotic species described until now, Dunaliella salina and Dunaliella acidophila, belong to it. Besides, Dunaliella species are well known for their extraordinarily high tolerance to salinity, temperature, nutrient limitation, and irradiance (Ben-Amotz et al., 2009; García-Gomez et al., 2012; Hong et al., 2017). These facts strongly suggest that this microalga has acquired efficient adaptive mechanisms to cope with the stresses associated to these ecosystems.

Comparative transcriptomic approaches, using massive Illumina sequencing for a de novo transcriptome assembly, identified changes in response to high cadmium concentrations and natural metal-rich water D. acidophila (Puente-Sánchez et al., 2014). The results strongly suggest a constitutive biotic response produced in D. acidophila, even when the stress metal conditions and possible natural chelators are not present in the growth media. Thus, a higher variety of transcripts related to general and oxidative stress within the nondifferentially expressed transcripts than in the differentially expressed ones is observed. Besides, the abundances of these nondifferentially expressed transcripts were also significantly higher, reaching up to five times the levels found in the differentially expressed ones. It is worth noting that these high levels of transcripts related to general and oxidative stress were also observed in the control library suggesting a constitutive high expression of these defense mechanisms in D. acidophila, even after several years growing under absence of metal stress. This is particularly relevant in the case of phytochelatin synthase, metallothioneins, and glutathione S-transferase related transcripts, since all of them are directly related to heavy metal response and detoxification (Hirata et al., 2005; Nishikawa et al., 2006; Rea, 2012), showing up to ten times higher transcripts abundances than the average for the nondifferentially expressed transcripts found in this study. Similar results have been reported for C. acidophila, in which constitutive high levels of glutathione S-transferase have been found, suggesting that this microalgae has developed detoxification system to prevent cell destruction due to Cd toxicity (Nishikawa et al., 2006).

Furthermore, a significantly higher basal metabolic and energetic activity under natural metal-rich water growth conditions has been observed. It is tempting to speculate that this higher metabolism could be linked to a higher protein turnover. This assertion was further supported by the fact that, in the presence of natural metal-rich water, D. acidophila transcriptome was significantly enriched in proteasomal catabolic process transcripts, revealing a more active protein degradation in comparison to the other conditions. In addition, the higher nitrogen metabolism observed in this transcriptomic library involving amino acids metabolism and nitrogen compounds transport may also be linked, at least in part, to a higher protein turnover implying proteasome-mediated degradation. As heavy metals are known to lead to an alteration of the functionality of the proteins, such behaviors could help this acidophile to maintain the functional integrity of its proteome under environmental conditions (Casiot et al., 2004; Jacobson et al., 2012; Halter et al., 2015).

The transcriptomic response to copper has also been assayed in C. acidophila (Olsson et al., 2015). Although copper is a trace metal that aerobic organisms need in small amounts, at high concentrations it is toxic. Local concentrations of copper can reach up to 6 mM in some acidic environments, such as Río Tinto (SW, Spain). To thrive under these environmental conditions, C. acidophila have developed different molecular mechanisms of tolerance and resistance. For example, there were no alterations in transcripts that code for antioxidant defense in spite of the extremely high copper levels assayed (0.5 mM). Also surprisingly, even very high amounts of copper that are lethal to other species seem not to significantly reduce photosynthetic activity in C. acidophila.

The differentially expressed genes in this study included transcripts involved in sugar metabolism, signaling proteins, transporters and membrane proteins, and there were also both up- and downregulated genes with unknown function. A large portion of copper responsive genes in Chlamydomonas reinhardtii were uncharacterized prior to the study performed by Castruita et al. (2011), demonstrating that copper homeostasis in green algae is not yet well understood. Thus, it was not surprising that also many genes in C. acidophila did only get annotated as predicted proteins. The fact that many transcripts did not get any blast hit at all tells us that there are big differences between C. reinhardtii and C. acidophila, some of them being key factors that allow this species to live in an extreme environment.

A recent transcriptomic study regarding the daily photosynthetic performance of a natural E. mutabilis biofilm reveals that this photosynthetic protist undergo large-scale transcriptomic reprogramming during midday due to a dynamic photoinhibition and solar radiation stress (Puente-Sánchez et al., 2016b). In this particular case, photoinhibition is due to UV radiation and not to light intensity, as revealed by pulse amplitude modulated (PAM) fluorometry analysis. In order to minimize the negative effects of solar radiation, the transcriptomic results support the presence of a circadian rhythm in this euglenophyte that increases their opportunity to survive, with photosynthetic related processes enhanced in the morning, DNA synthesis occurring in the evening, and cell division taking place at night. The transcription pattern during the day is mainly altered in genes involved in Photosystem II stability and repair, UV damaged DNA repair, nonphotochemical quenching and oxidative stress, supporting the photoinhibition detected by PAM fluorometry at midday. This low light acclimatation has also been reported in Galdieria and Cyanidium, which often grow endolithically, and, therefore, autotrophic cell growth is restricted to upper rock layers and/or to periods of high photosynthetically active irradiation. However, although Galdieria maintains a high photosynthetic rate, even at high light irradiation (2000 μE/m2/s), photoinhibition occurs abruptly at about 250 μE/m2/s which means a pronounced sensitivity to high levels of light (Puente-Sánchez et al., 2016b).

View chapterPurchase book

An improved version of the Atlantic cod genome and advancements in functional genomics: implications for the future of cod farming

Ole Kristian Tørresen, ... Sissel Jentoft, in Genomics in Aquaculture, 2016

An improved Atlantic cod genome—a valuable resource for biological inferences, fisheries management and the future cod aquaculture

The new and improved cod reference genome assembly and annotation will become an important resource for future research by facilitating in-depth functional genomics fundamental for biological inference. Additionally, comprehensive analysis of whole genome resequencing data from contemporary specimens and their historical counterparts would enlighten our understanding of the genomic basis of natural selection and how Atlantic cod adapt to changing environments. This is information that will aid management programs and bring conservation genetics to the next level. Further, the new reference genome assembly and annotation will also be essential for addressing genomic effects of artificial selection related to breeding programs and thus, enhance the efficiency of selecting family material for improved growth, disease resistance, delayed sexual maturation, and other economically important traits for the aquaculture industry. The goal should be to facilitate breeding programs that build upon utilizing the genome information from both wild and captive populations. For a viable future for an emerging cod aquaculture these approaches will be inherent components towards a successful breeding program for Atlantic cod.

2 The Present Situation of the Human Genome Project in Japan

As genome research advances internationally, there has been increasing recognition that genome research constitutes the necessary foundation for the future development of biology, medical research, and biotechnology. In particular, the necessity of establishing a large-scale genome-sequencing center has been accepted widely. However, the realization of this plan has been difficult because of budget limitations.

Since the basic law of science and technology development was passed in 1997, the budget limitations have, to some extent, disappeared under the national goal that Japan should become a country of creative science and technology. Because of this basic law, the government invested approximately $US 200 million in the genome research project in the 1998 fiscal year alone. Presently, various activities in JHGP are conducted under the initiative of governmental organizations and private industries. MONBUSHO successfully established the Center for Information Biology in NIG to support DDBJ and the newly established Human Genome Research Center in Tokyo University. The Ministry of International Trade and Industry (MITI) and the MHW coordinated a promotion business venture of genome-related enterprise and the STA created the genome science center in Yokohama. Moreover, large research grants were made to many genome research groups through various funding mechanisms. However, not all of these activities are along the lines of well-planned strategies of genome research.

The general focus of JHGP is toward the determination of large-scale genome sequences and a functional analysis of the genes that will be discovered by sequence determination. JHGP is placing more emphasis on the areas of genomic structure, function, and information. As for genome sequence determination, Kitazato University, Japan Marine Science and Technology Center, and STA and the Product Evaluation Center under MITI are pushing the genome sequencing of microorganisms by producing 122Mb each year. In 1995, the Japanese Science and Technology Corporation initiated the human genome-sequencing project. The subgenomic segments such as HLA, immunogloblin variable, Down's syndrome, and oncogene-related regions, have been determined for about 15Mb by Kitazato University, Cancer Institute, Keio University and Tokai University.

In total, the sequence production in Japan is about 5Mb per year, which is one order of magnitude smaller than that of the Sanger Center in the United Kingdom and the Sequence Center at Washington University in the United States. Therefore, JHGP has great expectations of the role of the newly established Genome Science Center, which is planned to produce 30–50Mb per year. Kazusa Research Institute is producing several Mb genome sequences of Arabidopsis per year. Moreover, the genome project team supported by MAFF has a ten-year plan of sequence determination of the complete rice genome of 400Mb, producing 20Mb per year for the initial several years. In summary, the sequencing capacity of genomes in Japan has so far been 15–20Mb per year. Judging from recent increases of the number of genome research groups focusing on bacteria, fungi, lower organisms, insects, and fish and the enhancement of genome research on humans and the mouse, 200Mb sequence production per year is expected in the near future.

JHGP has a slight edge over other countries in the field of functional genomics. In particular, the Osaka University genome team was the first to analyze a cDNA expression profile of human genetic data (called 'body mapping') and other genome teams have quickly developed a full-length cDNA determination technique. Although these activities were not directly connected to a bioindustry as they are in the United States, MITI recently pushed the utilization of these technical advances to industry. Moreover, human genome diversity has become recognized as important for disease gene analysis and functional analysis. In particular, the so-called 'complex diseases' such as multi component diseases are becoming important targets of the human genome diversity project (see Human Genome Diversity and its Value). This project will require a lot of genetic data concerning the Japanese population, in particular a large amount of polymorphism data from several thousand people (see Race: Genetic Aspects).

Several preliminary research projects have already started in Japan. The cDNA project of Caenorhabditis elegans (C. elegans) genome has been conducted by NIG in cooperation with the Sanger Center and Washington University. The B. subtilius and E. coli genome teams have already begun functional analysis of all genes contained in their respective organisms. Many researchers are participating in these genome projects by conducting bioinformatics research and constructing databases. Thus, JHGP has resumed its original goals in a more integrated and coordinated way.

Last Chapter Next Chapter