Exp Neurobiol 2022; 31(1): 1-16
Published online February 28, 2022
© The Korean Society for Brain and Neural Sciences
1Department of Oral Pathology & Life Science in Dentistry, School of Dentistry, 2Dental Life Science Institute, 3Periodontal Disease Signaling Network Research Center, Pusan National University, Yangsan 50612, Korea
Correspondence to: *To whom correspondence should be addressed.
TEL: 82-51-510-8259, FAX: 82-51-510-8249
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Patients suffering from rare human diseases often go through a painful journey for finding a definite molecular diagnosis prerequisite of appropriate cures. With a novel variant isolated from a single patient, determination of its pathogenicity to end such “diagnostic odyssey” requires multi-step processes involving experts in diverse areas of interest, including clinicians, bioinformaticians and research scientists. Recent efforts in building large-scale genomic databases and
Keywords: Rare diseases, Nervous system diseases, Invertebrates,
The number of genes linked to rare Mendelian genetic diseases has been estimated around 4,000 out of a total of approximately 30,000 genes in human genome [1-3]. It is clearly an underestimate of genetic contributions to rare diseases, as there are nearly 6,000 to 13,000 more genes to be identified for their roles in disease pathology . Of approximately 6,100 unique rare diseases curated in Orphanet (www.orpha.net), those of genetic origins could make up to as much as 72%. A recent report has estimated the point prevalence, representing the population burden, less than 1/1,000,000 for the majority of these rare diseases . Based upon this estimation, the overall population prevalence of rare diseases ranges from 3.5% to nearly 6%, affecting as many as 446 million people worldwide , with neurological illnesses as the most prevalent category . While relatively low in their prevalence in the general population, the psychological and social burden of these rare diseases cannot be easily ignored, considering the cost and effort to find a definite molecular diagnosis that could take 4.8 years or longer on average [7, 8]. Such a long journey, or so-called “the diagnostic odyssey”, can be further complicated with the uncertain nature of pathogenicity associated with genetic variations in individual patients. The lack of our knowledge in functional characteristics of these variations can be devastating, given that it is virtually impossible to develop therapeutic options without prior knowledge of their physiological consequences.
The current effort to tackle rare human diseases has been facilitated by the widely used genome sequencing technologies, including whole-genome and whole-exome sequencing. However, the rate of success in identifying genetic causes with this methodological approach is rather limited, reaching less than 30% among patients referred for diagnosis [9, 10]. Furthermore, the ultimate quest to the pathogenicity of individual genetic variants remains largely unresolved even in approximately 8% of cases with identified genes . While recent advances in
In this review, we provide a brief overview of invertebrate model organism (MO)-based approaches to overcome limitations of the current research efforts and to recapitulate the core values of two representative models,
Invertebrate MOs have helped us to broaden our understanding of biological phenomena across phyla for decades, mostly attributed to the significant degree of conservation observed in molecular mechanisms underlying major biological processes essential for cellular functions. For instance, discoveries in
Recent achievements in
Since its early contribution by Thomas H. Morgan in 1910s [29, 30],
Experimental strategies to investigate rare human diseases using
In the post-genome era, discoveries of genetic variants in the human genome further emphasize the role of bioinformatical analyses in deciphering the pathogenicity of each variant associated with rare human diseases of a definitive genetic origin. In line with this idea, large-scale sequencing centers have been established to facilitate the effective identification of potentially pathogenic disease variants, applying whole-genome and whole-exome sequencing to individuals or cohorts with suspected Mendelian diseases. For instance, the Centers for Mendelian Genomics was established in 2012, with a financial support from NIH in US . A recent report from this institution in collaboration with investigators in 36 countries included an analysis of over 18,000 samples representing approximately 1,050 Mendelian phenotypes, which led to identification of nearly 1,000 genes associated with disorders, including 375 genes that have never been mapped to human diseases . Identification of candidate variants from such large-scale sequencing approaches often requires initial validation of their pathogenicity with bioinformatical and computational means. For this purpose, a number of
This step of pathogenicity prediction can be further aided with bioinformatical analyses of genetic and physical interaction databases that provide valuable information about interactions between a gene or protein of interest and others previously associated with a specific Mendelian disease. Databases generated from large-scale proteomics screens in diverse species are included in this type of analysis, along with text-mining strategies for relevant publications based in MO studies. Recently, a few bioinformatical tools have been developed to optimize such analysis, including STRING (Search Tool for Recurring Instances of Neighboring Genes) and MIST (Molecular Interaction Search Tool). STRING provides a database for established and predicted genetic and protein interactions based upon systematic co-expression patterns, shared signals across multiple genomes, inferred interaction knowledge from different organisms and text-mining analysis of the scientific literature . Similarly, the MIST platform integrates previously identified genetic and protein interactions from humans and multiple MOs, including yeast, worm, fly, zebrafish, frog, rat and mouse, to predict interactions inferred from “interlogs”, i.e. interactions between orthologous genes or proteins in different organisms . A more comprehensive analysis of both sequence- and interaction-based datasets could further facilitate research efforts to delineate the pathogenicity of a rare variant potentially causative of undiagnosed human disease phenotypes.
Aside from the aforementioned large-scale screening efforts, discoveries of pathogenic variants often stem from isolate cases first identified by clinicians in practice. A translational approach from this individual discovery to MO-based research usually begins with integration of curated information across organisms. This step requires extensive literature and database search to see if a specific gene or variant of interest has been previously implicated in human diseases. The components of this initial analysis include, but not limited to, 1) the presence of prior reports of similar disease phenotypes, 2) the allele frequency of a specific variant in the general vs. disease population, 3) the availability of orthologs or homologs of implicated human genes in MOs, 4) their functional characteristics and expression profiles, 5) the site of variants in the gene structure, i.e. whether it is located in functional domains or not, and 6) the conservation of amino acids affected in variants across phyla. Manual curation and analysis of such diverse information is a near-impossible task, considering the sheer number of databases subject to analysis, thus requiring development of efficient tools to integrate multiple human and MO databases. One such tool available is MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) [45-47] (see also Fig. 1 for its integration in MO-based research networks described below). The current scope of MARRVEL encloses information obtained from 115,000 control cohorts and 12.3 million variants, genotype-phenotype relationships described for 6.96 million cases, and over 20,000 Gene Ontology terms corresponding to nearly 236,000 pairs of human genes and MO orthologs or homologs (MARRVEL v2.0). The human genetic databases incorporated in the MARRVEL analysis include ExAc , Geno2MP (NHGRI/NHLBI University of Washington-Center for Mendelian Genomics (UW-CMG)), ClinVar , DGV , DECIPHER  and OMIM (https://omim.org/). In addition, multiple MO databases are incorporated in the process of information curation, including SGD , PomBase , WormBase , FlyBase , ZFIN , MGI  and RGD . The MARRVEL platform allows users to identify orthologs of a human gene of interest in MOs using DRSC Integrative Ortholog Prediction Tool (DIOPT) . Following identification of available orthologs, it also provides comparison data at the level of protein, specifically concerning amino acid sequence alignment, annotation of functional domains and the presence of conserved residues.
While bioinformatical toolkits clearly help scientists to narrow the scope of putative targets of their research, there are some cautions to be made in interpretating the data generated with these means. First of all, the possibility of generating pseudo-positive and pseudo-negative results from
Cost-ineffectiveness of conventional approaches to identify pathogenic variants as well as limitations of
The Undiagnosed Disease Network (UDN, https://undiagnosed.hms.harvard.edu) was initiated from a NIH-funded project to establish a system that would integrate clinical and research activities in deciphering the molecular basis of rare human diseases [60, 61]. It consists of an interrelating network of centers and cores that encompass expert researches in clinical medicine, bioinformatics and MO-based genetics. The network includes 12 clinical sites, a single coordinating center, sequencing and metabolomics cores, and three Model Organisms Screening Centers (MOSCs) along with a central biorepository. The three MOSCs comprises
The Rare Diseases Models and Mechanisms (RDMM) network (http://www.rare-diseases-catalyst-network.ca/) was established in 2014 to facilitate collaborations between clinicians and MO scientists in Canada . Unlike UDN described above, its core structure is committee-based to identify and catalyze the connections between clinicians and scientists and supplemented with the Canadian RDMM Registry for its data curation. The overall project flow starts with a submission of a “Connection Application” concerning novel variants of interest to the advisory committee, which then evaluates the proposal based on the following aspects: 1) the quality of genetic data for diseases, 2) the severity of a disease reported and the need for medical intervention to affected patients, 3) a possibility of developing therapeutic options, 4) the population burden on either a specific or the general population and 5) the novelty of a biological pathway implicated in a disease of interest . Once the application is approved by the advisory committee, the next decision is made by the scientific advisory committee to match it to MO scientists enrolled in the Canadian RDMM Registry. If matched, experts in MO research are invited to submit a “Model Organism Proposal Application” to be approved by the committee as well as the clinician who submits the “Connection Application”. Following an approval of this proposal, MO scientists would receive a catalyst fund to initiate an immediate collaborative research (Fig. 2).
The scope of RDMM network encloses the vast majority of both clinical and research communities in Canada. For instance, the RDMM aims to engage the clinical community dealing with rare diseases through established projects, including FORGE Canada Consortium  and the Treatable Intellectual Disability Endeavor protocol in British Columbia (TIDE BC) . In conjunction with this end, the network also engages the MO research community by offering genetic knowledgebase for MO-based research. As a part of this effort, a total of nearly 12,200 genes are classified into three categories. This dataset includes 1) tier 1 for the genes of which functions were directly investigated with MOs in previous studies, 2) tier 2 for those awaiting immediate investigation with MOs and 3) tier 3 genes with some degree of relevance inferred from their previous studies based upon Gene Ontology terms . The RDMM network has received 135 Connection Applications, in addition to direct submissions of 116 understudied candidate genes suggested from FORGE, Care4Rare and UDN. Among these initial submissions, a total of 105 functional studies were funded, including 85 MO catalyst projects connecting clinicians and MO experts in mouse, fly, zebrafish, worm, yeast and some protozoa . The contribution of invertebrate MOs to recent achievements of the RDMM network is significant in that 17 out of 105 studies are built around
When conducting functional experiments to study rare human diseases in MOs, it is possible that the phenotypes described in MOs may have no clear relationship to human phenotypes reported from individual patients. This class of complications can be further divided into two different scenarios. First, the phenotypes observed in MOs may not be fully rescued by introduction of human cDNA. In this case, human protein orthologous to MO counterparts may not be functional or cause lethality in the MO background. For instance, a human protein induced for rescue may not form a functional complex with endogenous proteins, or induce toxicity in a mode similar to GOF alleles. In addition, the effects of WT and variant forms of human gene products may not differ from each other on restoring endogenous functions of a MO gene in the LOF background. In such case, an alternative investigation in different MOs should be further sought to rule out the potential pathogenicity of variants.
It should also be noted that MOs carrying a variant of interest may display phenotypes that are relatively unrelated to human diseases, but still allowing to be quantitatively scored. These phenotypes, often referred to as “phenologs”, should be carefully examined, as they may provide valuable insights into the basic biology of human gene of interest albeit a lack of phenotypic homology. The significance of a “phenolog” is mostly based upon the general principle of conservation observed in key cellular signaling pathways. Even with nearly interchangeable components of a signaling pathway, disrupted endogenous activity of these components may manifest completely different scorable phenotypes among different species. For instance, mutations in
The pathophysiology of certain human diseases may require organ- or tissue-specific contexts for the experimental studies. In that case, functional alterations need to be modeled in human- or vertebrae-specific organs, thus prohibiting invertebrate MOs from maximizing its usage for intended studies. A zebrafish model can be an appropriate alternative as it possesses virtually all of the same organs as humans. In addition to this context, it is also possible that a human gene of interest may not have orthologs or homologs in invertebrate MOs. Therefore, it is not likely that these models present direct functional evidence to validate the pathogenicity of human variants. Nevertheless, the data obtained from invertebrate MOs can still provide basic knowledge about the function of novel variants or genes of interest in the context of conserved core signaling pathways, thus acting as
As evidenced by recent success stories from network-based approaches such as UDN and RDMM, efficient modeling and molecular diagnosis of rare neurological disorders requires well-coordinated collaborations among clinicians responsible for identification of patients carrying rare genetic variants, bioinformatics specialists analyzing multi-omics and literature-based datasets and research scientists specialized in different MOs. Therefore, recruitment of sufficient numbers of experts in each field as well as experienced research coordinators should be granted to facilitate a systematic analysis of currently available information, to design and employ solid experimental paradigms, and to collectively interpret the findings, in order to provide an appropriate molecular diagnosis for neurological disorders of an unknown nature.
With increasing numbers of genomic data available for analysis, an inter-database match has become a critical step in identification of patient pools with similar genotype-phenotype relationships. Besides, an initial clinical identification of rare human diseases often comes with just a single patient carrying an uncharacterized variant. Therefore, it is necessary to recruit more cases with similar genotype-phenotype relationships to enhance the statistical power of subsequent analyses. Several tools or platforms have been recently established, including Matchmaker Exchange (MME, https://www.matchmakerexchange.org/) [80-82] that incorporates multiple databases such as GeneMatcher , Phenome Central , DECIPHER [51, 85], MyGene2 , AGHA Patient Archive (https://mme.australiangenomics.org.au), PatientMatcher (https://github.com/Clinical-genolmics/patientmatcher/), seqr (https://seqr.broadinstitute.org/matchmaker/matchbox), RD-Connect GPAP (https://platform.rd-connect.eu/), and IRUD . So far, MME provides the most extensive coverage of inter-database comparisons for genotype-phenotype relationships, employed in at least 40 reports of gene discoveries from nearly 190,000 participants or cases (https://www.matchmakerexchange.org/). With a genomic analysis of patients with undiagnosed neurological phenotypes becoming a widely available diagnostic tool in clinics, accurate assessment of inter-database information will be an indispensable prerequisite of MO-based approaches before initiation of model-based functional studies.
While speedy and effective analyses can be performed in invertebrate MOs, one can still take advantage of other vertebrate MOs, including zebrafish and mouse, to fill the gaps left behind. Interactive collaborative efforts with the mouse model-based research groups such as International Mouse Phenotyping Consortium (https://www.mousephenotype.org/)  and Knockout Mouse Phenotyping Program (KOMP2, https://commonfund.nih.gov/komp2) will further facilitate the molecular diagnosis of rare neurological disorders (Fig. 3). In case of context-specific modeling of human diseases, invertebrate MOs may not be ranked as the choice of studies, as it needs to re-create a patient-specific condition. This can be alternatively built with the use of patient-derived induced pluripotent stem cells and subsequent generation of context-specific cell types and organoids [89, 90] (Fig. 3).
The most of our efforts in studying rare genetic diseases with a Mendelian basis has been restricted to potentially pathogenic variants within the coding regions. However, with more frequent applications of whole-genome sequencing technology, it is likely that we discover a significant fraction of variants residing in non-coding regions, including microRNAs, long non-coding RNAs and repetitive DNA sequences. The degree of conservation tends to be significantly low in regulatory genomic regions between human and invertebrate MO genomes. In addition, potentially pathogenic variants occurring in these regions of human genome may cause noticeable changes in expression profiles of other genes. With limited degrees of conservation, invertebrate MOs can be alternatively used to analyze the effect of altered gene expression portraited in human patients on disease phenotypes .
With the help of technical advances, the rate of novel gene or variant discoveries responsible for rare neurological diseases have continued to increase in the last few decades. Recent reports have estimated that there are nearly 6,000 to 13,000 more genes to be studied for their roles in Mendelian genetic disorders, many of which functions likely affect the integrity of the human nervous system [4, 6]. Among these candidates, a significant fraction ranging from 4,000 to 10,000 is expected to be novel discoveries , thus still representing formidable tasks remaining ahead. With an increasing rate of genotype-based identification of novel variants linked to rare neurological diseases, concerted team efforts among clinicians, bioinformaticians and MO scientists will be in greater demand (Fig. 3). These expert communities providing interactive support to one another will serve as cornerstones to help patients with undiagnosed rare neurological disorders to end their pain-staking journey in finding answers for a molecular diagnosis and cures.
This work was supported by the Financial Supporting Project of Long-term Overseas Dispatch of PNU's Tenure-track Faculty, 2019 to Ji-Hye Lee.
|Model organism studied||Human gene of interest||Neurological disease or phenotype associated||Year reported|
|Hypotonia, ataxia, delayed development||2017 |
|Infantile developmental delay, ataxia||2017 |
|Severe neurodevelopmental regression, hypotonia, ataxia, seizures, abnormal motor behaviors||2018 |
|Neurooculocardio-genitourinary syndrome||2019 |
|SWI/SNF-related intellectual disability disorder||2020 |
|Hypotonia, dystonia, ataxia, white matter abnormalities||2020 |
|Epileptic encephalopathy, hypotonia, general developmental delay||2020 |
|Glial loss (Schwann cell loss)||2020 |
|General developmental delay, neurologic deficits||2021 |
|Craniofacial and vertebral abnormalities, neurological deficits||2021 |
|Intellectual disability, seizures, behavioral abnormalities||2021 |
|Neurodevelopmental delay, early childhood epilepsy||2021 |
The list of