GENO is an OWL model of genotypes, their more fundamental sequence components, and links to related biological and experimental entities. At present many parts of the model are exploratory and set to undergo refactoring. In addition, many classes and properties have GENO URIs but are place holders for classes that will be imported from an external ontology (e.g. SO, ChEBI, OBI, etc). Furthermore, ongoing work will implement a model of genotype-to-phenotype associations. This will support description of asserted and inferred relationships between a genotypes, phenotypes, and environments, and the evidence/provenance behind these associations.
Documentation is under development as well, and for now a slidedeck is available at http://www.slideshare.net/mhb120/brush-icbo-2013
Used to annotation axioms that define identity criteria for instances of a class.
is_identity_criteria
proabalistic_quantifier
Used to flag terms that are created for organizational purposes, e.g. to support groupings useful for defining GENO-based data models.
mixin
gene symbol
begin
end
location
The reference is the resource that the position value is anchored to. For example, a contig or chromosome in a genome assembly.
reference
is part of
has part
A relation used to link sequence entities (sequences, features, qualified features, and collections thereof) to their 'attributes'.
Used in lieu of RO/BFO has_quality as this relation is definend to apply to independent contiinuant bearers, wheras sequence entities are generically dependent continuants.
http://purl.obolibrary.org/obo/so_has_quality
has_sequence_attribute
A relation between a material information bearer or material genetic sequence bearer and generically dependent continuant that carries information or sequence content that the bearer encodes
materializes
Shortcut relation expanding to bearer_of some (concretizes some . . . ), linking a material information bearer or sequence macromolecule to some ICE or GDC sequence.
bears_concretization_of
is_genotype_of
A relationship that holds between a biological entity and some level of genetic variation present in its genome.
This relation aims to be equally as broad/inclusive as RO:0002200 ! has_phenotype.
The biological entity can be an organism, a group of organism that share common genotype, or organism-derived entities such as cell lines or biospecimens. The genotype can be any of the various flavors of genotypes/allelotypes defined in GENO (intrinsic genotype, extrinsic genotype, effective genotype), or any genetic variation component of a genotype including variant alleles or sequence alterations.
has_genotype
An antisymmetric, irreflexive (normally transitive) relation between a whole and a distinct part (source: SIO)
No proper part relation anymore in RO/BFO?
http://semanticscience.org/resource/SIO_000053
has_proper_part
A relationship between an entity that carries a sequence (e.g. a sequence feature or collection), and the sequence it bears.
has_sequence_component
has_state
VMC:state
'Sequence' in the context of GENO is an abstract entity representing an ordered collection of monomeric units as carried in a biological macromolecule.
has_sequence
A geno:intrinnsic genotype 'specifies' a SO:genome.
A geno:karyotype 'specifies' a geno:karyotype feature collection.
A relationship between an information content entity representing a specification, and the entity it specifies.
obsolete_specifies
Created subproperties 'approximates_sequence' and 'resolves to sequence'. Genotypes and other sequence variant artifacts are not always expected to completely specify a sequence, but rather provide some approximation based on available knowledge. The 'resolves_to_sequence' property can be used when the sequence variant artifact is able to completely resolve a sequence, and the 'approximates_sequence' property can be used when it does not.
obsolete_approximates_sequence
Created subproperties 'approximates_sequence' and 'resolves to sequence'. Genotypes and other sequence variant artifacts are not always expected to completely specify a sequence, but rather provide some approximation based on available knowledge. The 'resolves_to_sequence' property can be used when the sequence variant artifact is able to completely resolve a sequence, and the 'approximates_sequence' property can be used when it does not.
obsolete_resolves_to_sequence
An asymmetric, irreflexive (normally transitive) relation between a part and its distinct whole.
http://semanticscience.org/resource/SIO_000093
is_proper_part_of
is_sequence_of
is_subject_of
obsolete_is_specified_by
shortcut relation used to link a phenotype directly to a genotype of an organism
is_phenotype_of_organism_with_genotype
is_phenotype_with_genotype
phenotype_has_genotype
Might expand to something like:
phenotype and (is_phenotype_of some (organism and (has_part some ('material genome' and (is_subject_of some (genome and (is_specified_by some genotype)))))))
obsolete_is_phenotype_of_genotype
A relation to link variant loci, phenotypes, or disease to the type of inheritance process they are involved in, based on how the genetic interactions between alleles at the causative locus determine the pattern of inheritance of a specific phenotype/disease from one generation to the next.
Exploratory/temporary property, as we formalize our phenotypic inheritance model.
obsolete_participates_in_inheritance_process
A relation between a sequence entity (i.e. a sequence, feature, or qualified feature) and a part of this entity that is variant in terms of its sequence, position, or expression.
has_variant_part
is_variant_part_of
A relation between a sequence entity (i.e. a sequence, feature, or qualified feature) and a part of this entity that is not variant.
has_reference_sequence_part
has_reference_part
is_reference_part_of
The allele instance <fgf8a^ti282a> is_allele_of the gene class 'danio rerio fgf8a'.
Note that the allele <fgf8a^ti282a> may not be an instance of the danio rerio fgf8a gene class, given that we adopt the SO definition of genes as 'producing a functional product'. If the <fgf8a^ti282a> allele is nonfunctional or null, it is an allele_of the danio rerio fgf8a gene class, but not an instance (rdf:type) of this class. It is, however, an instance of the 'danio rerio fgf8a gene allele' class, as being a 'gene allele' as defined in GENO requires only occupying the genomic position where for a gene, but not necessarily producing a functional product.
A relation linking an instance of a variable feature (aka an allele) to a class of genomic feature it is an instance of (typically a gene class).
Domain = allele
Range = punned gene class
In owl models where alleles are instances of gene classes, this relation links an owl:Individual to an owl:Class, and thus 'puns' the gene class IRI.
is_sequence_variant_of
is_allele_of
A relation used to link a variant locus instance to the gene class it is a variant of (in terms of its sequence or expression level).
is_variant_instance_of
formerly grouped is_allele_of and is_expression_variant_of proerpties under feature to class proeprty (now renmaed has_affected_locus)
Domain = genomic feature instance
Range = punned gene class IRI
obsolete_is_genetic_variant_of
A relation linking a gene class to a sequence-varaint or expression-variant of the gene.
has_variant_instance
formerly grouped has_allele and has_expression_variant proerpties under cllass to feature property (now renamed locus_affected_by)
Domain = punned gene class
Range = genomic feature
obsolete_has_genetic_variant
A relation linking a gene class to one of its sequence-variant alleles.
Domain = punned gene class
Range = allele
has_sequence_variant
has_allele
A relation between a gene targeting reagent (e.g. a morpholino or RNAi) and the class of gene it targets.
This is intended to be used as an instance-class relation, used for linking an instance of a gene targeting reagent to the class of gene whose instances it targets.
targets_gene
A relation that holds between an instance of a genotype or variant sequence feature or collection, and a genomic feature class (typically a gene) that is affected in its sequence or expression.
This is an organizational grouping class to collect all relations used to link instances of sequence features or qualified sequence features to genomic feature classes. For example, is_allele_of links a gene allele instance to its gene class (genes are represented as classes in our OWL model). Such links support phenotype propagation from alleles to genes for Monarch Initiative use cases. Use of these properties effectively puns gene class IRIs into owl:individuals in a given rdf datset.
has_affected_feature
A relation between an expression-variant gene (ie integrated transgenes or knockdown reagent targeted genes), and the class of gene it represents.
Domain = expression variant feature.
Range = punned gene class
This relation links an expression-variant gene instance (targeted or transgenic) to the class of gene that it preresents. For transient transgenes, this is the gene, the coding sequence need only to contain as part an expressed region from a given gene to stand in an is_expression_variant_of relation to the gene class.
is_expression_variant_of
A relation between a genomic feature class (typically a gene class) and an instance of a sequence feature or qualified sequence feature that represents or affects some change in the sequence or expression of the genomic feature.
class_to_feature_relation
This is an organizational grouping class to collect all relations used to link genomic feature classes (typically genes) to instance of a genomic feature sequence feature or qualified sequence feature. For example, linking a gene class IRI to an instance of an allele of that gene class. Such links support phenotype propagation from features/variants to genes (e.g. for Monarch Initiative use cases)
is_feature_affected_by
A relation between a gene class and a gene targeting reagent that targets it.
is_target_of
Domain = punned gene class
Range = gene knockdown reagent
is_gene_target_of
A relation linking a gene class to one of an expression-variant of that gene..
Domain = punned gene class
Range = expression variant feature
has_expression_variant_instance
has_expression_variant
A relation between two sequence features at a given genomic locus that vary in their sequence or level of expression.
Decided there was no need for a contrasting is_expression_variant_with property, so removed it and this parent grouping property.
This proeprty is most commonly used to relate two different alleles of a given gene. It is not a relation between an allele and the gene it is a variant of.
obsolete_is_variant_with
A relation between two instances of a given gene that vary in their level of expression as a result of external factors influencing expression (e.g. gnee-knockdown reagents, epigenetic modification, alteration of endogenous gene-regulation pathways).
obsolete_is_expression_variant_with
A relation used to describe a context or conditions that define and/or identify an entity.
Used in Monarch Data to link associations to qualifying contexts (e.g. environments or developmental stages) where the association applies. For example, a qualifying environment represents a context where genotype-phenotype associations apply - where the environment is an identity criteria for the association.
Used in GENO to describe physical context of materialized sequence features that represent identifying criteria for instances of qualified sequence features.
has_qualifying_context
has_qualifier
a relation to link a single locus complement to its zygosity.
has_zygosity
A relationship between a reference locus/allele and the gene class it is an allele of.
is_reference_allele_of
Consider obsoleting - it is likely sufficeint to use the parent has_sequence_attribute property - a separate proeprty to link to the staining intensity attribute is not really needed.
has_color_value
Used to link a gross chromosomal sequence feature (chromosome part) to a color value quality that inheres in the sequence feature in virtue of the staining pattern of the chromosomal DNA in which the sequence is materialized.
has_staining_intensity
Used to link a gene targeting reagent such as a morpholino, to an instance of a reagent targeted gene variant.
relation between an molecular agent and its molecular target
is_targeted_by
1. Used to specify derivation of transgene components from a gene class, or a engineered construct instance.
2. Used to specify the genetic background/strain of origin of an allele (i.e. that an allele was originally isolated from a specific background strain, and propagated into new genetic backgrounds.
3. Used to indicate derivation of a variant mouse genotype from an ES cell line used in generating the modified mice (IMPC)
Relationship between a sequence feature and a distinct, non-overlapping feature from which it derives part or all of its sequence.
sequence_derives_from
A relationship between a variant allele and the gene class it is an allele of.
is_variant_allele_of
Relationship between a sex-qualified genotype and intrinsic genotype, created specifically to support propagation of phenotypes asserted on the former to the later for Monarch Initiative use cases.
has_sex_agnostic_part
A relation between a mutant allele (ie rare variant present in less than 1% of a population, or an experimentally-altered variant such as a knocked-out gene in a model organism), and the gene it is a variant of.
is_mutant_allele_of
A relationship between a polymorphic allele and the gene class it is an allele of.
is_polymorphic_allele_of
A relationship between a wild-type allele and the gene class it is an allele of.
is_wild_type_allele_of
An organizational class to hold relations of parthood between sequences/features.
has_sequence_part
is_sequence_part_of
Relationship between an intrinsic genotype and a sex-qualified genotype, created specifically to support propagation of phenotypes asserted on the latter to the former for Monarch Initiative use cases.
is_sex_agnostic_part_of
A relation between two sequence features at a particular genomic locus that vary in their sequence (in whole or in part).
This property is most commonly used to relate two different alleles of a given gene (e.g. a wt and mutant instance of the BRCA2 gene). It is not a relation between an allele and the class-level gene it is a variant of (for this use is_allele_of)
varies_with
organizational property to hold imports from faldo.
faldo properties
A relation linking a qualified sequence feature to its component sequence feature.
has_sequence_feature_component
In GENO we define three levels of sequence artifacts: (1) biological sequences, (2) sequence features, and (3) qualified sequence features. The identify criteria for a 'biological sequence' include only its inherent sequence (the ordered string of units that comprise it). The identity criteria for a 'sequence feature' include its sequence and position (where it resides - i.e. its location based on how it maps to a reference or standard) The identity criteria for a 'qualified sequence feature' include its component sequence feature (defined by its sequence and position), and the material context of its bearer in a cell or organism. This context can include direct epigenetic modification, or being targeted by gene knockdown reagents such as morpholinos or RNAi, or being transiently overexpressed from a transgenic construct in a cell or organism.
has_sequence_feature
has_inferred_phenotype
Property chain to propagate inferred phenotype associations 'up' a genotype partonomy in the direction of sequence alteration -> VL -> VSLC -> GVC -> genotype.
Property chain to propagate inferred phenotype associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype.
Property chain to propagate inferred phenotype associations 'down' a genotype partonomy in the direction of genotype -> GVC -> VSLC -> VL -> sequence alteration.
Property chain to propagate inferred phenotype associations from an intrinsic genotype component (e.g. a (sequence-)variant locus instance) to a gene class.
Property chain to propagate inferred phenotype associations from a (sequence-)variant locus instance to a gene class (to support cases where the phenotype association is made at the level of the variant gene locus).
Property chain to propagate inferred phenotype associations from an extrinnsic genotype component (e.g. a expression-variant gene instance) to a gene class.
Property chain to propagate inferred phenotype associations from an expression-variant gene instance to a gene class (to support cases where the phenotype association is made at the level of the expression-variant gene).
Property chain to propagate inferred phenotype associations 'down' a genotype partonomy just from a sex-qualified intrinsic genotype to the immediate sex-agnostic intrinsic genotype. (An additional property chain is needed to then propagate to the intrinsic genotype components)
Proposal for a property linking variants to smaller components that are regulatory, and therefore should not inherit phenotypes.
obsolete_has_regulatory_part
A relation linking a sequence_alteration to the gene it alters.
is_within_allele_of
obsolete_is_alteration_within
has_asserted_phenotype
Proposal for a property linking regulatory elements to larger features of which they are a part.
is_regulatory_part_of
A relation linking a sequence feature to its component Position that represents an identifying criteria for sequence feature instances.
For representing positional data, we advocate use of the FALDO model, which links to positional information through an instance of a Region class that represents the mapping of the feature onto some reference sequence. The positional_component property in GENO is meant primarily to formalize the identity criteria or sequence features and qualified sequence features, to illustrate the distinction between them.
obsolete_has_position_component
A relation between a nucleic acid or amino acid sequence or sequence feature, and one of its monomeric units (nucleotide or amino acid residues)
has_sequence_unit
A relation between two seqeunces or features that are considered variant with each other along their entire extents.
completely_varies_with
related_condition
Note that we currently do not have a property chain to propagate phenotypes to genes across sequence_derives_from relation (e.g. in cases where a Tg insertion derives expressed sequence from some gene)
The property chains below are defined as explicitly as possible, but many could be shortened if we used the inferred_to_cause_condition property to construct the property chains. Where this is the case, it is noted in the annotations on the property chains.
Below are the different kinds/paths of propagation we desire:
1. Propagation 'down' a genotype (from larger components to smaller ones)
2. Propagation 'up' a genotype (from smaller components to larger ones)
3. From sex-qualified genotypes down to the sex-agnostic genotype and its components (but not 'up' to a sex-qualified genotype).
4. From an effective genotype to its intrinsic and extrinsic components.
5. From genotype components to genes (note here that a separate chain is needed to propagate conditions asserted on a sequence alteration to the gene, because of the fact that the link to the gene is from the variant locus/allele).
6. (Exploratory). There are cases where we may also want inter-genotype propagation (i.e. propagation that extends beyond moving up or down a single genotype). For example, if a phenotype is asserted on a sex-qualified intrinsic genotype, we want it to infer down through its component sex-agnostic intrinsic genotype and then up to any effective genotypes of which this sex-agnostic intrinsic genotype is a part. Given the data in hand, however, the conditions for this will likely never occur, so probably ok not to implement a chain to support this.
Note that we do not want to propagate phenotypes up from sex-agnostic genotyeps to sex-qualified ones (e.g.from shha<tbx392>/shha<tbx392> [AB] to shha<tbx392>/shha<tbx392> [AB](male)) - because it may not be the case that a phenotype assessed without consideratioon to sex will apply on a sex-specific background. So we would not create a property chain to propagate inferred condition associations from sex-agnaostic intrinsic genotypes and their parts to sex-qualified intrinsic genotypes and effective genotypes that contain them (such as: has_variant_part o has_sex_agnostic_part o has_variant_part o 'causes condition')
inferred_to_cause_condition
This is a case of inter-gneotype phenotype propagation, requiring propagation down oen genotype and then up another. Given the data in hand, however, the conditions for this will likely never occur, so probably ok not to have this chain.
This property chain propagates a phenotype asserted on a sex-qualified intrinsic genotype, down to its sex-agnostic genotype part, and then up to a parent effective genotype that has it as a variant part. I think this is OK in all cases, so we can implement this as the one case where we can have inter-genotype pheno propagation. But as noted, there will likely be no data that actually meets criteria to use this chain, so we can probably leave it out.
Property chain to propagate inferred condition associations 'up' a genotype partonomy in the direction of sequence alteration -> VL -> VSLC -> GVC -> genotype.
Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype, through a sex-agnostic intrinsic genotype, to the coompnent variant parts of this sex-agnostic genotype.
Property chain to propagate inferred condition associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. This chain in particuular is needed to get the conditions to move past the sex-agnostic genotype and down to its parts.
The following shorter chain would also suffice here:
is_variant_part_of o inferred_to_cause_condition
Property chain to propagate inferred condition associations 'down' a genotype partonomy in the direction of genotype -> GVC -> VSLC -> VL -> sequence alteration.
Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype, through a sex-agnostic intrinsic genotype, through the coompnent variant parts of this sex-agnostic genotype, and to the affected gene.
Property chain to propagate inferred condition associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. This chain in particuular is needed to get the conditions to propagate to genes.
The shorter chain below would also suffice for this propagation:
has_allele o inferred_to_cause_condition
Property chain to propagate inferred condition associations from an sequence alteration through the variant locus to a gene class. (separate chains are needed to propagate from the variant locus to the gene class, and another to propagate from a genotype, GVC, or VSLC to the gene class).
NOTE that i dont need this property chain if I have a property chain to infer a has_affected_locus link from a sequence alteration to a gene when the link is asserted from the variant locus to the gene:
is_variant_part_of o has_affected_locus --> has_affected_locus
Obsolete comment: Property chain to propagate inferred condition associations from an intrinsic genotype, GC, or VLSC to a gene class. (a separate chain is needed to propagate from the variant locus to the gene class, and another to propagate from a sequence alteration to the gene class).
The following, shorter chain, would also suffice here:
has_allele o inferred_to_cause_condition -> inferred_to_cause_condition
Property chain to propagate inferred condition associations from an intrinsic genotype, GVC, or VLSC to an affected gene class, or from an extrinsic gneotype or component to an affected gene class.
The following, shorter chain, would also suffice here:
has_affected_locus o inferred_to_cause_condition -> inferred_to_cause_condition
Note that a separate chain is needed to propagate from the variant locus to the gene class, and another to propagate from a sequence alteration to the gene class in cases where the link to gene is through the variant locus rather than the seq alteration).
Property chain to propagate inferred condition associations from a variant locus instance to a gene class (to support cases where the phenotype association is made directly at the level of the variant locus/allele).
Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype to a sex-agnostic intrinsic genotype.
Property chain to propagate inferred condition associations 'down' a genotype partonomy just from a sex-qualified intrinsic genotype to the immediate sex-agnostic intrinsic genotype. (An additional property chain is needed to then propagate to the intrinsic genotype components)
inferred_to_contribute_to_condition
inferred_to_correlate_with_condition
LOINC:LA6668-3
pathogenic_for_condition
LOINC:LA26332-9
likely_pathogenic_for_condition
Relation between an entity and a condition (disease, phenotype) which it does not cause or contribute to.
non-causal_for_condition
LOINC:LA6675-8
benign_for_condition
LOINC:LA26334-5
likely_benign_for_condition
LOINC:LA26333-7
has_uncertain_significance_for_condition
A relation used to describe a process contextualizing the identity of an entity.
has_qualifying_process
A relation used to describe an environment contextualizing the identity of an entity.
has_qualifying_environment
is_candidate_variant_for
A relation linking a sequence feature to the location it occupies on some reference sequence.
occupies
has_location
Can be used to a genomic feature to the chromosomal strand it resides on in the genome (+ or - strand, or both strands). Commonly used to link a gene to the strand it is transcribed from.
on strand
is_about is a (currently) primitive relation that relates an information artifact to an entity.
is about
Denotes is a primitive, instance-level, relation obtaining between an information content entity and some portion of reality. Denotation is what happens when someone creates an information content entity E in order to specifically refer to something. The only relation between E and the thing is that E can be used to 'pick out' the thing. This relation connects those two together. Freedictionary.com sense 3: To signify directly; refer to specifically
Consdier if this is the best relation for linking genotypes to the genomic entities they specify. We could use the more generic 'is about', or define a new 'specifies' relation that holds between ICEs and something it specifies the nature or creation of.
denotes
A relation between a planned process and a continuant participating in that process that is not created during the process. The presence of the continuant during the process is explicitly specified in the plan specification which the process realizes the concretization of.
has_specified_input
A relation between a planned process and a continuant participating in that process. The presence of the continuant at the end of the process is explicitly specified in the objective specification which the process realizes the concretization of.
has_specified_output
a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence
inheres_in
a relation between an independent continuant (the bearer) and a specifically dependent continuant (the dependent), in which the dependent specifically depends on the bearer for its existence
bearer of
a relation between a continuant and a process, in which the continuant is somehow involved in the process
participates in
a relation between a process and a continuant, in which the continuant is somehow involved in the process
has participant
A journal article is an information artifact that inheres in some number of printed journals. For each copy of the printed journal there is some quality that carries the journal article, such as a pattern of ink. The quality (a specifically dependent continuant) concretizes the journal article (a generically dependent continuant), and both depend on that copy of the printed journal (an independent continuant).
A relationship between a specifically dependent continuant and a generically dependent continuant, in which the generically dependent continuant depends on some independent continuant in virtue of the fact that the specifically dependent continuant also depends on that same independent continuant. Multiple specifically dependent continuants can concretize the same generically dependent continuant.
concretizes
a relation between an independent continuant (the bearer) and a quality, in which the quality specifically depends on the bearer for its existence
has quality
has_role
a relation between an independent continuant (the bearer) and a disposition, in which the disposition specifically depends on the bearer for its existence
has disposition
derives from
starts during
ends during
x overlaps y if and only if there exists some z such that x has part z and z part of y
overlaps
x is in taxon y if an only if y is an organism, and the relationship between x and y is one of: part of (reflexive), developmentally preceded by, derives from, secreted by, expressed.
in taxon
A relationship that holds between a biological entity and a phenotype. Here a phenotype is construed broadly as any kind of quality of an organism part, a collection of these qualities, or a change in quality or qualities (e.g. abnormally increased temperature). The subject of this relationship can be an organism (where the organism has the phenotype, i.e. the qualities inhere in parts of this organism), a genomic entity such as a gene or genotype (if modifications of the gene or the genotype causes the phenotype), or a condition such as a disease (such that if the condition inheres in an organism, then the organism has the phenotype).
has phenotype
phenotype of
temporally related to
p has direct input c iff c is a participant in p, c is present at the start of p, and the state of c is modified during p.
has input
p has output c iff c is a participant in p, c is present at the end of p, and c is not present at the beginning of p.
has output
is member of
Example 1: a collection of sequences such as a genome being comprised of separate sequences of chromosomes
Example 2: a collection of information entities such as a genotype being comprised of a background component and a variant component
has member is a mereological relation between a collection and an item.
has member
input of
output of
obsolete_formed as result of
Holds between molecular entities a and b when the execution of a activates or inhibits the activity of b
molecularly controls
x has subsequence y iff all of the sequence parts of x are sequence parts of y
has subsequence
is subsequence of
x overlaps the sequence of x if and only if x has a subsequence z and z is a subsequence of y.
http://biorxiv.org/content/early/2014/06/27/006650.abstract
overlaps sequence of
inverse of downstream of sequence of
is upstream of sequence of
x is downstream of the sequence of y iff either (1) x and y have sequence units, and all units of x are downstream of all units of y, or (2) x and y are sequence units, and x is either immediately downstream of y, or transitively downstream of y.
is downstream of sequence of
Relation between a research artifact and an entity it is used to study, in virtue of its replicating or approximating features of the studied entity.
To Do: decide on scope of this relation - inclusive of computational models in domain, or only physical models? Restricted to linking biological systems and phenomena? Inclusive of only diseases in range, or broader?
Matthew Brush
The driving use case for this relation was to link a biological model system such as a cell line or model organism to a disease it is used to investigate, in virtue of the model system exhibiting features similar to that of the disease of interest.
is model of
The genetic variant 'NM_007294.3(BRCA1):c.110C>A (p.Thr37Lys)' casues or contributes to the disease 'familial breast-ovarian cancer'.
An environment of exposure to arsenic causes or contributes to the phenotype of patchy skin hyperpigmentation, and the disease 'skin cancer'.
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has some causal or contributing role that influences the condition.
Note that relationships of phenotypes to organisms/strains that bear them, or diseases they are manifest in, should continue to use RO:0002200 ! 'has phenotype' and RO:0002201 ! 'phenotype of'.
Genetic variations can span any level of granularity from a full genome or genotype to an individual gene or sequence alteration. These variations can be represented at the physical level (DNA/RNA macromolecules or their parts, as in the ChEBI ontology and Molecular Sequence Ontology) or at the abstract level (generically dependent continuant sequence features that are carried by these macromolecules, as in the Sequence Ontology and Genotype Ontology). The causal relations in this hierarchy can be used in linking either physical or abstract genetic variations to phenotypes or diseases they cause or contribute to.
Environments include natural environments or exposures, experimentally applied conditions, or clinical interventions.
causes or contributes to condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has a causal role for the condition.
causes condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has some contributing role in the manifestation of the condition.
contributes to condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity influences the severity with which a condition manifests in an individual.
contributes to expressivity of condition
contributes to severity of condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity influences the frequency of the condition in a population.
contributes to penetrance of condition
contributes to frequency of condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity prevents or reduces the severity of a condition.
Genetic variations can span any level of granularity from a full genome or genotype to an individual gene or sequence alteration. These variations can be represented at the physical level (DNA/RNA macromolecules or their parts, as in the ChEBI ontology and Molecular Sequence Ontology) or at the abstract level (generically dependent continuant sequence features that are carried by these macromolecules, as in the Sequence Ontology and Genotype Ontology). The causal relations in this hierarchy can be used in linking either physical or abstract genetic variations to phenotypes or diseases they cause or contribute to.
Environments include natural environments or exposures, experimentally applied conditions, or clinical interventions.
is preventative for condition
A relationship between an entity and a condition (phenotype or disease) with which it exhibits a statistical dependence relationship.
correlated with condition
association has object
association has predicate
association has subject
The position value is the offset along the reference where this position is found. Thus the only the position value in combination with the reference determines where a position is.
position
Property linking a sequence or sequence feature to an integer representing its length iin terms of the number of units in the sequence.
has_extent
Shortcut relation linking a sequence feature directly to a string representing the 'state' of its sequence - i.e. the ordering of units that comprise it (e.g. 'atgcagctagctaccgtcgatcg').
has_sequence_string
ObsoleteDataProperty
The 'rank' quantifier in Bgee gene-anatomy associations, that indicates the imporatnace/specificity of a gene expression in a given anatommy relative to expressionin other anatomies for the same gene.
Property to link an assertion or association with some value quantifying its relevance or ranking.
has_quantifier
The starting position of a sequence region in 0-based coordinates.
ClinGen Allele Model (http://dataexchange.clinicalgenome.org/allele/)
start_position
The ending position of a sequence region in 0-based coordinates.
ClinGen Allele Model (http://dataexchange.clinicalgenome.org/allele/)
end_position
Property linking a biological sequence to a string representing the ordered units that comprise the sequence (e.g. 'atgcagctagctaccgtcgatcg').
has_string
Describes the number of features comprising a sequence feaature complement.
has_feature_count
Both strands
A position that is exactly known.
Exact position
Positive strand
Superclass for the general concept of a position on a sequence. The sequence is designated with the reference predicate.
Placing the FALDO:Position class under GENO:genomic locus, as it represents a type of genomic location with the same start and end coordinates (i.e. a single position as opposed to a location spanning a longer region)
Position
1
1
A region describes a length of sequence with a start position and end position that represents a feature on a sequence, e.g. a gene.
From what I can tell, feature instances in data whose position is to be defined using FALDO are always mapped to a Region, and then the position of this Region is defined according to its location within some larger reference sequence. The exception may be feature instances that are explicitly part of the reference sequence on which its location is being defined (such that no 'mapping' to a reference is required). This suggests that, conceptually, we can think of a FALDO:Region as a subregion of a reference sequence that is mapped to from a feature of interest, in order to define its position with respect to that reference sequence.
Region
Negative strand
Part of the coordinate system denoting on which strand the feature can be found. If you do not yet know which stand the feature is on, you should tag the position with just this class. If you know more you should use one of the subclasses. This means a region described with a '.' in GFF3. A GFF3 unstranded position does not have this type in FALDO -- those are just a 'position'.
Stranded position
Julius Caesar
Verdi’s Requiem
the Second World War
your body mass index
BFO 2 Reference: In all areas of empirical inquiry we encounter general terms of two sorts. First are general terms which refer to universals or types:animaltuberculosissurgical procedurediseaseSecond, are general terms used to refer to groups of entities which instantiate a given universal but do not correspond to the extension of any subuniversal of that universal because there is nothing intrinsic to the entities in question by virtue of which they – and only they – are counted as belonging to the given group. Examples are: animal purchased by the Emperortuberculosis diagnosed on a Wednesdaysurgical procedure performed on a patient from Stockholmperson identified as candidate for clinical trial #2056-555person who is signatory of Form 656-PPVpainting by Leonardo da VinciSuch terms, which represent what are called ‘specializations’ in [81
Entity doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example Werner Ceusters 'portions of reality' include 4 sorts, entities (as BFO construes them), universals, configurations, and relations. It is an open question as to whether entities as construed in BFO will at some point also include these other portions of reality. See, for example, 'How to track absolutely everything' at http://www.referent-tracking.com/_RTU/papers/CeustersICbookRevised.pdf
An entity is anything that exists or has existed or will exist. (axiom label in BFO2 Reference: [001-001])
entity
BFO 2 Reference: Continuant entities are entities which can be sliced to yield parts only along the spatial dimension, yielding for example the parts of your table which we call its legs, its top, its nails. ‘My desk stretches from the window to the door. It has spatial parts, and can be sliced (in space) in two. With respect to time, however, a thing is a continuant.’ [60, p. 240
Continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example, in an expansion involving bringing in some of Ceuster's other portions of reality, questions are raised as to whether universals are continuants
A continuant is an entity that persists, endures, or continues to exist through time while maintaining its identity. (axiom label in BFO2 Reference: [008-002])
continuant
continuant
BFO 2 Reference: every occurrent that is not a temporal or spatiotemporal region is s-dependent on some independent continuant that is not a spatial region
BFO 2 Reference: s-dependence obtains between every process and its participants in the sense that, as a matter of necessity, this process could not have existed unless these or those participants existed also. A process may have a succession of participants at different phases of its unfolding. Thus there may be different players on the field at different times during the course of a football game; but the process which is the entire game s-depends_on all of these players nonetheless. Some temporal parts of this process will s-depend_on on only some of the players.
Occurrent doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. An example would be the sum of a process and the process boundary of another process.
Simons uses different terminology for relations of occurrents to regions: Denote the spatio-temporal location of a given occurrent e by 'spn[e]' and call this region its span. We may say an occurrent is at its span, in any larger region, and covers any smaller region. Now suppose we have fixed a frame of reference so that we can speak not merely of spatio-temporal but also of spatial regions (places) and temporal regions (times). The spread of an occurrent, (relative to a frame of reference) is the space it exactly occupies, and its spell is likewise the time it exactly occupies. We write 'spr[e]' and `spl[e]' respectively for the spread and spell of e, omitting mention of the frame.
An occurrent is an entity that unfolds itself in time or it is the instantaneous boundary of such an entity (for example a beginning or an ending) or it is a temporal or spatiotemporal region which such an entity occupies_temporal_region or occupies_spatiotemporal_region. (axiom label in BFO2 Reference: [077-002])
occurrent
occurrent
a chair
a heart
a leg
a molecule
a spatial region
an atom
an orchestra.
an organism
the bottom right portion of a human torso
the interior of your mouth
b is an independent continuant = Def. b is a continuant which is such that there is no c and no t such that b s-depends_on c at t. (axiom label in BFO2 Reference: [017-002])
independent continuant
independent continuant
a process of cell-division, \ a beating of the heart
a process of meiosis
a process of sleeping
the course of a disease
the flight of a bird
the life of an organism
your process of aging.
p is a process = Def. p is an occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t. (axiom label in BFO2 Reference: [083-003])
BFO 2 Reference: The realm of occurrents is less pervasively marked by the presence of natural units than is the case in the realm of independent continuants. Thus there is here no counterpart of ‘object’. In BFO 1.0 ‘process’ served as such a counterpart. In BFO 2.0 ‘process’ is, rather, the occurrent counterpart of ‘material entity’. Those natural – as contrasted with engineered, which here means: deliberately executed – units which do exist in the realm of occurrents are typically either parasitic on the existence of natural units on the continuant side, or they are fiat in nature. Thus we can count lives; we can count football games; we can count chemical reactions performed in experiments or in chemical manufacturing. We cannot count the processes taking place, for instance, in an episode of insect mating behavior.Even where natural units are identifiable, for example cycles in a cyclical process such as the beating of a heart or an organism’s sleep/wake cycle, the processes in question form a sequence with no discontinuities (temporal gaps) of the sort that we find for instance where billiard balls or zebrafish or planets are separated by clear spatial gaps. Lives of organisms are process units, but they too unfold in a continuous series from other, prior processes such as fertilization, and they unfold in turn in continuous series of post-life processes such as post-mortem decay. Clear examples of boundaries of processes are almost always of the fiat sort (midnight, a time of death as declared in an operating theater or on a death certificate, the initiation of a state of war)
process
process
an atom of element X has the disposition to decay to an atom of element Y
certain people have a predisposition to colon cancer
children are innately disposed to categorize objects in certain ways.
the cell wall is disposed to filter chemicals in endocitosis and exocitosis
BFO 2 Reference: Dispositions exist along a strength continuum. Weaker forms of disposition are realized in only a fraction of triggering cases. These forms occur in a significant number of cases of a similar type [89
b is a disposition means: b is a realizable entity & b’s bearer is some material entity & b is such that if it ceases to exist, then its bearer is physically changed, & b’s realization occurs when and because this bearer is in some special physical circumstances, & this realization occurs in virtue of the bearer’s physical make-up. (axiom label in BFO2 Reference: [062-002])
disposition
disposition
the disposition of this piece of metal to conduct electricity.
the disposition of your blood to coagulate
the function of your reproductive organs
the role of being a doctor
the role of this boundary to delineate where Utah and Colorado meet
To say that b is a realizable entity is to say that b is a specifically dependent continuant that inheres in some independent continuant which is not a spatial region and is of a type instances of which are realized in processes of a correlated type. (axiom label in BFO2 Reference: [058-002])
realizable entity
realizable entity
the ambient temperature of this portion of air
the color of a tomato
the length of the circumference of your waist
the mass of this piece of gold.
the shape of your nose
the shape of your nostril
a quality is a specifically dependent continuant that, in contrast to roles and dispositions, does not require any further process in order to be realized. (axiom label in BFO2 Reference: [055-001])
quality
quality
Reciprocal specifically dependent continuants: the function of this key to open this lock and the mutually dependent disposition of this lock: to be opened by this key
of one-sided specifically dependent continuants: the mass of this tomato
of relational dependent continuants (multiple bearers): John’s love for Mary, the ownership relation between John and this statue, the relation of authority between John and his subordinates.
the disposition of this fish to decay
the function of this heart: to pump blood
the mutual dependence of proton donors and acceptors in chemical reactions [79
the mutual dependence of the role predator and the role prey as played by two organisms in a given interaction
the pink color of a medium rare piece of grilled filet mignon at its center
the role of being a doctor
the shape of this hole.
the smell of this portion of mozzarella
b is a relational specifically dependent continuant = Def. b is a specifically dependent continuant and there are n > 1 independent continuants c1, … cn which are not spatial regions are such that for all 1 i < j n, ci and cj share no common parts, are such that for each 1 i n, b s-depends_on ci at every time t during the course of b’s existence (axiom label in BFO2 Reference: [131-004])
b is a specifically dependent continuant = Def. b is a continuant & there is some independent continuant c which is not a spatial region and which is such that b s-depends_on c at every time t during the course of b’s existence. (axiom label in BFO2 Reference: [050-003])
Specifically dependent continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. We're not sure what else will develop here, but for example there are questions such as what are promises, obligation, etc.
specifically dependent continuant
specifically dependent continuant
John’s role of husband to Mary is dependent on Mary’s role of wife to John, and both are dependent on the object aggregate comprising John and Mary as member parts joined together through the relational quality of being married.
the priest role
the role of a boundary to demarcate two neighboring administrative territories
the role of a building in serving as a military target
the role of a stone in marking a property boundary
the role of subject in a clinical trial
the student role
BFO 2 Reference: One major family of examples of non-rigid universals involves roles, and ontologies developed for corresponding administrative purposes may consist entirely of representatives of entities of this sort. Thus ‘professor’, defined as follows,b instance_of professor at t =Def. there is some c, c instance_of professor role & c inheres_in b at t.denotes a non-rigid universal and so also do ‘nurse’, ‘student’, ‘colonel’, ‘taxpayer’, and so forth. (These terms are all, in the jargon of philosophy, phase sortals.) By using role terms in definitions, we can create a BFO conformant treatment of such entities drawing on the fact that, while an instance of professor may be simultaneously an instance of trade union member, no instance of the type professor role is also (at any time) an instance of the type trade union member role (any more than any instance of the type color is at any time an instance of the type length).If an ontology of employment positions should be defined in terms of roles following the above pattern, this enables the ontology to do justice to the fact that individuals instantiate the corresponding universals – professor, sergeant, nurse – only during certain phases in their lives.
b is a role means: b is a realizable entity and b exists because there is some single bearer that is in some special physical, social, or institutional set of circumstances in which this bearer does not have to be and b is not such that, if it ceases to exist, then the physical make-up of the bearer is thereby changed. (axiom label in BFO2 Reference: [061-001])
role
role
The entries in your database are patterns instantiated as quality instances in your hard drive. The database itself is an aggregate of such patterns. When you create the database you create a particular instance of the generically dependent continuant type database. Each entry in the database is an instance of the generically dependent continuant type IAO: information content entity.
the pdf file on your laptop, the pdf file that is a copy thereof on my laptop
the sequence of this protein molecule; the sequence that is a copy thereof in that protein molecule.
b is a generically dependent continuant = Def. b is a continuant that g-depends_on one or more other entities. (axiom label in BFO2 Reference: [074-001])
generically dependent continuant
generically dependent continuant
the function of a hammer to drive in nails
the function of a heart pacemaker to regulate the beating of a heart through electricity
the function of amylase in saliva to break down starch into sugar
BFO 2 Reference: In the past, we have distinguished two varieties of function, artifactual function and biological function. These are not asserted subtypes of BFO:function however, since the same function – for example: to pump, to transport – can exist both in artifacts and in biological entities. The asserted subtypes of function that would be needed in order to yield a separate monoheirarchy are not artifactual function, biological function, etc., but rather transporting function, pumping function, etc.
A function is a disposition that exists in virtue of the bearer’s physical make-up and this physical make-up is something the bearer possesses because it came into being, either through evolution (in the case of natural biological entities) or through intentional design (in the case of artifacts), in order to realize processes of a certain sort. (axiom label in BFO2 Reference: [064-001])
function
a flame
a forest fire
a human being
a hurricane
a photon
a puff of smoke
a sea wave
a tornado
an aggregate of human beings.
an energy wave
an epidemic
the undetached arm of a human being
BFO 2 Reference: Material entities (continuants) can preserve their identity even while gaining and losing material parts. Continuants are contrasted with occurrents, which unfold themselves in successive temporal parts or phases [60
BFO 2 Reference: Object, Fiat Object Part and Object Aggregate are not intended to be exhaustive of Material Entity. Users are invited to propose new subcategories of Material Entity.
BFO 2 Reference: ‘Matter’ is intended to encompass both mass and energy (we will address the ontological treatment of portions of energy in a later version of BFO). A portion of matter is anything that includes elementary particles among its proper or improper parts: quarks and leptons, including electrons, as the smallest particles thus far discovered; baryons (including protons and neutrons) at a higher level of granularity; atoms and molecules at still higher levels, forming the cells, organs, organisms and other material entities studied by biologists, the portions of rock studied by geologists, the fossils studied by paleontologists, and so on.Material entities are three-dimensional entities (entities extended in three spatial dimensions), as contrasted with the processes in which they participate, which are four-dimensional entities (entities extended also along the dimension of time).According to the FMA, material entities may have immaterial entities as parts – including the entities identified below as sites; for example the interior (or ‘lumen’) of your small intestine is a part of your body. BFO 2.0 embodies a decision to follow the FMA here.
A material entity is an independent continuant that has some portion of matter as proper or improper continuant part. (axiom label in BFO2 Reference: [019-002])
material entity
material entity
Stub class to serve as root of hierarchy for imports of molecular entities from ChEBI ontology.
molecular entity
nucleic acid
A cultured cell population that represents a genetically stable and homogenous population of cultured cells that shares a common propagation history (i.e. has been successively passaged together in culture).
cell line
Stub class to serve as root of hierarchy for imports of cell types from CL or other cell terminologies.
cell
1. Stub class to serve as root of hierarchy for imports from an ontology of environment and experimental conditions.
2. Need to consdier how to model environments in a way that covers ENVO and XCO content in a consistent and coherent way. A couple classes under Exploratory Class are relvant here. Consider how we might approach environments/condisitons using an EQ aproach analogous to how phenotypes are defined (i.e. consider environments/coonditions as qualities inhereing in some entity).
In ENVO's alignment with the Basic Formal Ontology, this class is being considered as a subclass of a proposed BFO class "system". The relation "environed_by" is also under development. Roughly, a system which includes a material entity (at least partially) within its site and causally influences that entity may be considered to environ it. Following the completion of this alignment, this class' definition and the definitions of its subclasses will be revised.
environmental system
A technique is a planned process used to accomplish a specific activity or task.
technique
A stem cell line comprised of embryonic stem cells, totipotent cells cultured from an early embryo.
embryonic stem cell line
A cell line comprised of stem cells,relatively undifferentiated cells that retain the ability to divide and proliferate provide progenitor cells that can differentiate into specialized cell types.
stem cell line
Example zebrafish intrinsic genotype:
Genotype = fgf8a<ti282a/+>; shha<tb392/tb392> (AB)
reference component (genomic background) = AB
variant component ('genomic variation complement') = fgf8a<ti282a/+>; shha<tb392/tb392>
. . . and within this variant component, there are two 'variant single locus complements' represented:
allele complement 1 = fgf8a<ti282a/+>
allele complement 2 = shha<tb392/tb392>
and within each of these 'variant single locus complements' there is one or more variant gene locus member:
in complement 1: fgf8a<ti282a>
in complement 2: shha<ttb392>
A genomic genotype that does not specify the sex determining chromosomal features of its bearer (i.e. does not indicate the background sex chromosome complement)
This modeling approach allows use to create separate genotype instances for data sources that report sex-specific phenotypes to ensure that sex-specific G2P differences are accurately described. These sex-qualified genotypes can be linked to the more general sex-agnostic intrinsic genotype that is shared by make and female mice of the same strain, to aggregate associated phenotypes at this level, and allow aggregation with G2P association data about the same strains from sources that distinguish sex-specific phenotypes (e.g. IMPC) and those that do not (e.g. MGI).
Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome. Thus, in the genotype partonomy, a sex-qualified genotype has as part a sex-agnostic genotype. This allows for the propagation of phenotypes associated with a sex-qualified genotype to the intrinsic genotype.
genotype
organismal genotype
sex-agnostic intrinsic genotype
In practice, most genotype instances classified as sex-agnostic genotypes because they are not sex-specific. When a genotype is indicated to be that of a male or female, it implies a known sex chromosome complement in the genomic background. This requires us to distinguish separate 'sex-qualified' genotype instances for males and females that share a common 'sex-agnostic' genotype. For example, male and female mice that of the same strain/background and containing the same set of genetic variations will have the same sex-agnostic intrinsic genotype, but different sex-qualified intrinsic genotypes (which take into account background sex chromosome sequence as identifying criteria for genotype instances).
genomic genotype (sex-agnostic)
An allele that varies in it sequence from what is considered the reference or canonical sequence at that locus. Note that what is considered the 'reference' vs. 'variant' sequence at a given locus may be context-dependent - so being 'variant' is more a role played in a particular situation.
The use of the descriptor 'variant' here is consistent with naming recommendations from the ACMG Guidelines paper here: PMID:25741868. Generally, the descriptive labels chosen for subtypes of variant allele conform these recommendations as well, where 'variant' is used to cover mutant and polymorphic alleles.
alterante allele
sequence-variant feature
variant feature
1. A 'variant allele' contains a 'sequence alteration', or is itself a 'sequence alteration', that makes it vary_with some other allele to which it is being compared. But in any comparison of alternative sequences at a particular genomic location, the choice of a 'reference' vs the 'variant' is context-dependent - as comparisons in other contexts might consider a different feature to be the reference. So being 'variant' is more a role played in a particular situation - as an allele that is variant in one context/analysis may be considered reference in another.
GENO classifies a feature as 'variant' only in cases where we can be confident that the feature will *always* be considered to be variant with a reference feature. For example, exerimentally induced/generated varaitions in model organism genes that are created expressly to vary from an esablished reference.
2. A variant allele can be variant along its entire extent, in which case it is considered a 'sequence alteration', or it can span a broader extent of sequence contains sequence alteration(s) as part. And example of the former is a SNP, and an example of the latter is a variant gene allele that contains one or more point mutations in its sequence.
variant allele
A sequence collection comprised of all 'variant single locus complements' in a single genome, which together constitute the variant component of an intrinsic genotype.
1. Note that even a reference feature (e.g. a wild-type gene) that is a member of a single locus complement that contains a variant allele is included in this 'genomic variation complement'. Thus, the members of this 'genomic variation complement' (which is a sequence collection) are 'single locus variant complements'. Our axiom below uses has_part rather than has_member, however, to account for the fact that many 'genomic variation complements' have only one 'single locus variant complement' as members. So because has_member is not reflexive, it is not appropriate for these cases.
2. Most genotypes have only one altered locus (ie only one 'single-locus variant complement') that distinguish it from some reference background. For example, the genotype instance 'fgf8a<t1282a/+>(AB)') exhibits a mutation at only one locus. But some genotypes vary at more than one locus (e.g. a double mutant that has alterations in the fgf8a gene and the shh gene)).
genomic variation complement
The ZFIN background 'AB' that serves as a reference as part of the genotype fgf8a^ti282a/+ (AB)
A reference genome that represents the sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).
Here, a 'genomic background' would differ form a 'reference genome' in that 'background' implies a derivation of the variant from the background (which is the case for most MOD strains), whereas a reference is simply meant as a target for comparison. But in a sense all background genomes are by default reference, in that the derived variant genome is compared against it.
genomic background
OBI:genetic population background information
background genome
The reference/wild-type cd99l2 danio rerio gene allele spans bases 27,004,426-27,021,059 on Chromosome 7. The "mn004Gt" represents an experimentally-created allele of this gene, in which sequence from a gene trap construct containing an RFP marker has been inserted at the cd99l2 gene locus. The resulting gene allele includes sequence from this construct that make it longer than the reference gene sequence, and also alter its seqauence in a way that prevents it from producing a functional product. The sequence extent of this cd99l2 gene allele is determined based on how its sequence aligns with that of the canonical gene and surrounding sequence in a reference genome.
http://useast.ensembl.org/Danio_rerio/Gene/Summary?g=ENSDARG00000056722
http://zfin.org/action/feature/feature-detail?zdbID=ZDB-ALT-111117-8
A genomic feature that represents one of a set of versions of a gene (i.e. a haplotype whose extent is that of a gene)
Regarding the distinction between a 'gene' and a 'gene allele': Every zebrafish genome contains a 'gene allele' for every zebrafish gene. Many will be 'wild-type' or at least functional gene alleles. But some may be alleles that are mutated or truncated so as to lack functionality. According to current SO criteria defining genes, a 'gene' no longer exists in the case of a non-functional or deleted variant. But the 'gene allele' does exist - and its extent is that of the remaining/altered sequence based on alignment with a reference gene. Even for completely deleted genes, an allele of the gene exists (and here is equivalent to the junction corresponding to the where gene would live based on a reference alignment).
This design allows us to classify genes and any variants of those genes (be they functional or not) as the same type of thing (ie a 'gene allele'), since classification is based on genomic position rather than functional capacity. This is practical for representation of variant genotypes which often carry non-functional versions of a gene at a particular locus. What is important here is specifying what is present at a locus associated with a particular gene, whether or not it is a functional gene or not.
http://purl.obolibrary.org/obo/SO_0001023 ! allele
In SO, the concept of a 'gene' is functionally defined, in that a gene necessarily produces a functional product. By contrast, the concept of a 'gene allele' here is positionally defined - representing the sequence present at the location a gene resides in a reference genome (based on sequence alignment). An Shh gene allele, for example, may be a fully functional wild-type version of the gene, a non-functional version carrying a deleterious point mutation, a truncated version of the gene, or even a complete deletion. In all these cases, an 'Shh gene allele' exists at the position where the canonical gene resides in the reference genome - even if the extent of this allele different than the wild-type, or even zero in the case of the complete deletion.
A genomic feature being an allele_of a gene is based on its location in a host genome - not on its sequence. This means, for example, that the insertion of the human SMN2 gene into the genome of a mouse (see http://www.informatics.jax.org/allele/MGI:3056903) DOES NOT represent an allele_of the human SMN2 gene according to the GENO model - because it is located in a mouse genome, not a human one. Rather, this is a transgenic insertion that derives_sequence_from the human SMN2 gene. If this human SMN2 gene is inserted within the mouse SMN2 gene locus (e.g. used to replace mouse SMN2 gene), the feature it creates is an allele_of the mouse SMN2 gene (one that happens to match the sequence of the human ortholog of the gene). But again, it is not an allele_of the human SMN2 gene.
gene allele
A sequence that serves as a standard against which other sequences at the same location are compared.
The notion of a 'reference' in GENO is implemented at the level of 'biological sequence' rather than at the level of a sequence feature - i.e. we define a class for 'reference sequence' rather than reference sequence feature'. This is because it is at the *sequence* level that features of interest are determined to be variant or not. It is taken for granted that the *location* of the feature of interest is the same as that of the reference sequence to which it is compared, becasue an alignment process establishing common location always precedes the sequence comparison that determines if the feature is variant.
reference sequence
A reference sequence is one that serves as a standard against which 'variant' versions of the feature are compared, or against which located sequence features within the reference region are aligned in order to assign position information. Being 'reference' does not imply anything about the frequency or function of features bearing the sequence. Only that some agent has used it to serve a reference role in defining a variant or locating a sequence.
reference sequence
a collection more than one sequence features (ie a collection of discontinuous sequence features)
perhaps not same as SO:sequence collection, as here we explicitly include features that can have an extent of zero (and SO:sequence collection is a collection of regions that have an extent of at least one)
1. Note that members of this class can be features with extents of zero (e.g. junctions). This is likely different than the SO:sequence feature class which has members that are regions.
obsolete_sequence feature collection
A sequence feature collection comprised of discontiguous sequences from a single genome
Previously called 'genetic locus collection'. Difference between 'genetic' and 'genomic', as used here, is that 'genomic' implies a feature is a heritable part of some genome, while 'genetic' implies that it is part of some feature that is capable of contributing to gene expression in a cell or other biological system.
genomic feature collection
Conceptually, members of this collection are meant to be about the sum total genetic material in a single cell or organism. But these members need not be associated with an actual material in a real cell or organism individual. For example, things like a 'reference genome' may not actually represent the material genome of any individual cell or organism in reality. Here, there may be no genomic material referents of the sequences in such a collection because the genome is tied to an idealized, hypothetical cell or organism instance. The key is that conceptually, they are still tied to the idea of being contained in a single genome. In the case of a genotype, the individual seqeunce members are not all about the genetic material of a singel cell or organism. Rather, it is the resolved sequence contained in the genotype that is meant to be about the total genomic sequence content of a genome - which we deem acceptable for classifying as a genetic locus collection.
obsolete_genomic feature collection
A single locus complement that serves as a standard against which 'variant' sequences are compared
reference allelic complement
reference single locus feature complement
Not required at present for any specific use case, so marking as exploratory and obsoleting for simplicity.
Eq Class axiom:
'single locus complement'
and (has_sequence_attribute some reference)
SC axioms:
'has member' exactly 0 'variant allele'
'has member' only 'reference genomic feature'
'has member' some 'reference genomic feature'
obsolete_reference single locus complement
A single locus complement in which at least one member allele is considered variant, and/or the total number of features in the complement deviates from the normal poloidy of the reference genome (e.g. trisomy 13).
variant allelic complement
variant single locus feature complement
Instances of this class are collections comprised of all versions of some defined feature present in a genome (e.g. alleles of a specific gene), where at least one member is variant (non-reference). In diploid genomes this complement is typically the two alleles at the same locus on homologous chromosomes.
This class also covers cases where deviant numbers of genes or chromosomes are present in a genome (e.g. trisomy of chromosome 21), even if their sequence is not variant.
variant single locus complement
A genome that varies at one or more loci from the sequence of some reference genome.
http://purl.obolibrary.org/obo/SO_0001506 ! variant_genome (definition of SO term here is too vague to know if has same meaning as GENO class here)
variant genome
An allele whose sequence matches what is consdiered to be the reference sequence at that location in the genome.
Being a 'reference allele' is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, 'reference' status is typically assigned based on factors such as being the most common in a population, being an ancestral allele, or being indentified first as a prototypical example of some feature or gene. For example, 'reference alleles' in characterizing SNPs often represent the allele first characterized in a reference genome, or the most common allele in a population.
In model organism datasets, 'reference' alleles are typically (but not always) the 'wild-type' variant at a given locus, representing a functional and unaltered version of the feature that is part of a defined genomic background, and against which natural or experimentally-induced alterations are compared.
reference allele
A genomic feature known to exist, but remaining uncharacterized with respect to its identity (e.g. which allele exists at a given gene locus).
Uses as a term of convenience for describing data reporting unspecified alleles in a genotype (i.e. in cases where zygosoty for a given locus is not known). Typlically recorded in genotype syntaxes as a ' /? '.
Not required at present for any specific use case, so marking as exploratory and obsoleting for simplicity.
Eq Class def: 'genomic feature'
and (has_sequence_attribute some unspecified)
An unspecified feature is known to exist as the partner of a characterized allele when the zygosity at that locus is not known. Its specific sequence/identity, however, is unknown (ie whether it is a reference or variant allele).
obsolete_unspecified feature
A junction found at a chromosomal position where an insertion has occurred on the homologous chromosome, such that the junction represents the reference feature paired with the hemizygously inserted feature.
hemizygous reference junction
Eliminating unecessary defined/organizational classes. Former logical def:
junction
and (has_sequence_attribute some reference)
Subclass axiom:
is_variant_with some insertion
In the case of a transgenic insertion that creates a hemizygous locus, the refernce locus that this insertion is variant_with is the junction on the homologous chromosome at the same position where the insertion occurred. This is the 'hemizygous reference' junction.
The junction-insertion pair represents the allelic complement at that locus, which is considered to be hemizygous. Most genotype syntaxes represent this hemizygous state with a ' /0' notation.
obsolete_reference junction
A gene that originates from the genome of a danio rerio.
danio rerio gene
A gene that originates from the genome of a homo sapiens.
homo sapiens gene
A gene that originates from the genome of a mus musculus.
mus musculus gene
A reference human sonic hedgehog (shh) gene spans bases 155,592,680-155,604,967 on Chromosome 7, according to genome build GRCh37, and produces a primary funcitonal transcript that is 4454 bp in length and produces a 462 amino acid protein involved in cell signaling events behind various aspects of cell differentiation and development.
http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000164690
Note that this may be slightly different than the extend described in other gene databases, such as Entrez Gene:http://www.ncbi.nlm.nih.gov/gene/6469
A version/allele of a gene that serves as a standard against which variant genes are compared.
reference gene
Not required at present for any specific use case, so marking as exploratory and obsoleting for simplicity.
Eq Class axiom:
'gene allele'
and (has_sequence_attribute some reference)
SC axioms:
is_variant_with some 'gene allele'
is_reference_allele_of some gene
Being a 'reference gene' is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, 'reference' status is typically assigned based on factors such as being the most common version/allele in a population, being an ancestral allele, or being indentified first as a prototypical example of a gene.
In model organism datasets, 'reference' genes are typically the 'wild-type' allele for a given gene, representing a functional and unaltered version of the gene that is part of a defined genomic background, and against which natural or experimentally-induced versions are compared.
obsolete_reference gene allele
obsolete_experimental insertion
gene trap insertion
A transgene that has been integrated into a chrromosome in the host genome.
An integrated transgene differs from a transgenic insertion in that a transgenic insertion may contain single transgene, a partial transgene that needs endognous sequences from the host genome to become functional (e.g. an enhancer trap), or multiple transgenes (i.e. be polycistronic). Fiurthermore, the transgenic insertion may contain sequences in addition to its transgene(s - e.g. sequences flanking the transgene reqired for integration or replicaiton/maintenance in the host genome. The term 'integrated transgene' covers individual transgenes that were delivered in whole or in part by a transgenic insertion.
An 'integrated transgene' differs from its parent 'transgene' in that transgenes can include genes introduced into a cell/organism on an extra-chromosomal plasmid that is never integrated into the host genome.
integrated transgene
A nucleic acid macromolecule that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or is capable of being replicated and inherited through successive generations of progeny.
1. Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genomic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some genome.
2. A challenging edge case is experimentally delivered DNA into a terminally differentiated cell that will never divide. Such material does technically meet our definition - since we are careful to say that the material must be *capable of* being stably inherited through subsequent generations. Thus, we would say that *if* the cell were resume replication, the material would be heritable in this way.
1. Genomic material here is considered as a DNA or RNA molecule that is found in a cell or virus, and capable of being replicated and inherited by progeny cells or virus. As such, this nucleic acid is either chromosomal DNA, or some replicative epi-chromosomal plasmid or transposon. Genetic material is necessarily part of some 'material genome', and both are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some genome.
2. Genomic material need not be inherited from an immediate ancestor cell or organism (e.g. a replicative plasmid or transposon acquired through some experimental modification), but such cases must be capable of being inherited by progeny cells or organisms.
genomic material
A material entity that represents all genetic material in a cell or virion. The material genome is typically molecular aggregate of all the chromosomal DNA and epi-chromosomal DNA that represents all sequences that are heritable by progeny of a cell or virion.
physical genome
A genome is the collection of all nucleic acids in a cell or virus, representing all of an organism's hereditary information. It is typically DNA, but many viruses have RNA genomes. The genome includes both nuclear chromosomes (ie nuclear and micronucleus chromosomes) and cytoplasmic chromosomes stored in various organelles (e.g. mitochondrial or chloroplast chromosomes), and can in addition contain non-chromosomal elements such as replicative viruses, plasmids, and transposable elements.
Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some SO:genome.
material genome
a population of homo sapiens grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role)
Consider http://semanticscience.org/resource/SIO_001062 ! human population ("A human population refers to a collection of human beings").
homo sapiens population
human population
A maximal collection of organisms of a single species that have been bred or experimentally manipulated with the goal of being genetically identical.
organism strain or breed
Two mice colonies with the same genotype information, but maintained in different labs, are different strains (many examples of this in MGI/IMSR)
strain or breed
A group comprised of organisms from a single taxonomic group (e.g. family, order, genus, species, or a strain or breed within a given taxon)
taxonomic group
mus musculus strain
danio rerio strain
sequence attribute that can inhere only in a collection of more than one sequence features
obsolete_sequence feature collection attribute
A quality inhering in a collection of discontinuous sequence features in a single genome that reside on the same macromolecule (eg the same chromosomes).
in cis
A quality inhering in a collection of discontinuous sequence features in a single genome that reside on different macromolecules (e.g. different chromosomes).
in trans
An allelic state that describes the degree of similarity of features at a particular locus in the genome (i.e. whether the alleles or haplotypes are the same or different).
allelic state
derived from https://en.wikipedia.org/wiki/Zygosity
http://semanticscience.org/resource/SIO_001263
zygosity
hemizygous
heterozygous
homozygous
indeterminite zygosity
no-call zygosity
unknown zygosity
unspecified zygosity
indeterminite zygosity
MGI uses this term when zygosity is not known.
no-call zygosity
(this is how the GVF10 format/standard refers to loci without enough data to make an accurate call . . . see http://www.sequenceontology.org/resources/gvf.html#quick_gvf_examples)
The disposition of an entity to be transmitted to subsequent generations following a genetic replication or organismal reproduction event.
We can use these terms to describe the heritability of genetic matieral or sequence features - e.g. chromosomal DNA or genes are heritable in that they are passed on to child cells/organisms). Such genetic material has a heritable disposition in a cell or virion, in virtue of its being replicated in its cellular host and inherited by progeny cells (such that the sequence content it encodes is stably propagated in the genetic material of subsequence generations of cells).
We can also use these terms to describe the heritability of phenotypes/conditions - e.g. the passage of a particular trait or disease across generations of reproducing cells/organisms.
heritabililty
heritable
non-heritable
The pattern in which a genetic trait or condition is passed from one generation to the next, as determined by the interactions between alleles of the causal gene, and between these alleles and the environment.
Considering alternate definitions:
- a disposition related to the phenotypic effect of a particular allele based on its inherited allelic state (i.e., the complement of alleles present at a particular locus)
- the disposition of an allele in a particular allelic state (e.g. heterozygouse vs homozygous) to cause a particular phenotype.
The subtypes of inheritance pattern in this hierarchy are largely distinguished based on the underlying genetic mechanism, which will manifest in a characteristic pattern of traits in affected and unaffected family members. For example, 'autosomal dominant inheritance' defines an inheritance pattern that is caused by the interaction of alleles on non-sex chromosomes wherein the trait manifests even in heterozygotes - resulting in a characteristic pattern of 'dominant' inheritance across generations of individuals in a family.
condition inheritance
mode of inheritance
phenotypic inheritance pattern
Ontologically, an inheritance pattern can be considered a disposition of a genetic variant to cause a particular trait or phenotype when it is present in a particular genetic and environmental context. Here, "genetic context" refers to the allelic state of the variant, which depends on what other alleles exist at the same locus. Zygosities such as heterozygous and homozygous are simple, common examples of 'states' of an allele.
These genetic and environmental "interactions" of alleles play out at the level of the gene products produced by the causal alleles, and are observable in the pattern with which the trait caused by an allele is inherited across generations of individuals. Thus, an inheritance pattern such as dominance is not inherent to a single allele or its phenotype, but rather a result of the relationship between two alleles of a gene and the phenotype that results in a given environment. This also means that the 'dominance' of an allele is context dependent - Allele 1 can be dominant over Allele 2 in the context of Phenotype X, but recessive to Allele 3 in the context of Phenotype Y.
inheritance pattern
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that at least a partial variant-associated phenotype is apparent in heterozygotes
dominant inheritance
A mode of inheritance whereby a heterozygous individual expresses distinct traits or conditions associated with both alleles (e.g. an individual with an AB blood type).
Alt: A disposition inhering in a variant of a genetic locus that is realized in an inheritance pattern whereby two different variants are phenotypically expressed in a heterozygous individual (e.g. an individual with an AB blood type)
co-dominant inheritance
pure dominant inheritance
complete dominant inheritance
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that it's associated phenotype is partially expressed in heterozygotes (ie the observed phenotype is intermediate between that of the two distinct loci)
incomplete dominant inheritance
semi-dominant inheritance
X-linked dominant inheritance
allosomal dominant inheritance
autosomal dominant inheritance
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that no variant-associated phenotype is apparent in heterozygotes
recessive inheritance
X-linked recessive inheritance
allosomal recessive inheritance
autosomal recessive inheritance
An attribute inhering in a feature that is designated to serve as a standard against which 'variant' versions of the same locus are compared.
Being 'reference' is a role or status assigned in the context of a data set or analysis framework. A given allele can be reference on one context and variant in another.
reference
unspecified life cycle stage
objective is to insert some specified sequence into the genome of a cell or virus
genetic insertion technique
mutagen treatment technique
a genetic alteration technique that creates a variant/allele of a known gene - either by prospective targeting of a specific the gene through homologous recombination, or by retrospective sequence analysis to determine the insertion locus of a randomly integrated transgene (e.g. as done in gene trapping).
This is represented axiomatically by the requirement that a 'gene variant' (ie an allele/variant of a known gene) is the specified_output of this technique. This is contrasted to non-targted/random insertions/alterations where the altered locus is not known, and therefore no variant allele of a gene is created.
targeted gene mutation technique
Is considered to be 'non-targeted' in the sense that the insertion occurs randomly and not through homologous recombination.
random genetic insertion technique
targeted genetic insertion technique
enhancer trapping technique
gene trapping technique
promoter trapping technique
targeted knock-in technique
random transgene insertion technique
A single locus complement that represents the collection of all chromosome sequences for a given chromosome in a single genome
obsolete_chromosome complement
A complete chromosome that has been abnormally duplicated in a genome, typically as the result of a meiotic non-disjunction event or unbalanced translocation
duplicate chromosome
This 'gained' chromosome is conceptually an 'insertion' in a genome that received two copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration, and as a 'extra' chromosome.
gained aneusomic chromosome
0
A 'deletion' resulting from the loss of a complete chromosome, typically as the result of a meiotic non-disjunction event or unbalanced translocation.
This 'lost' chromosome is conceptually a 'deletion' in a genome that received zero copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration. But it doesn't classify under SO:deletion because this class is defined as "the point at which one or more contiguous nucleotides were excised".
absent aneusomic chromosome
lost aneusomic chromosome
A large deletion or terminal addition of part of some non-homologous chromsosome, as the result of an unbalanced translocation.
Novel sequence features gained in a genome are considered to be sequence alterations, including aneusomic chromosome segments gained through unbalanced translocation events, entire aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that becoome part of the heritable genome of a cell or organism.
aneuploid chromosomal segment
aneusomic chromosomal subregion/segment
partial aneusomic chromosomal element
Aneusomic chromosomal parts are examples of "partial aneuploidy" as described in http://en.wikipedia.org/wiki/Aneuploidy: "The terms "partial monosomy" and "partial trisomy" are used to describe an imbalance of genetic material caused by loss or gain of part of a chromosome. In particular, these terms would be used in the situation of an unbalanced translocation, where an individual carries a derivative chromosome formed through the breakage and fusion of two different chromosomes. In this situation, the individual would have three copies of part of one chromosome (two normal copies and the portion that exists on the derivative chromosome) and only one copy of part of the other chromosome involved in the derivative chromosome."
aneusomic chromosomal part
A part of some non-homologous chromosome that has been gained as the result of an unbalanced translocation event.
duplicate partial aneuploid chromosomal element
translocated duplicate chromosomal element
translocated duplicate chromosomal segment
Such additions of translocated chromosomal parts confer a trisomic condition to the duplicated region of the chromsome, and are thus considered to be 'variant single locus complements' in virtue of an abnormal number of features at a particular locus, rather than abnormal sequence within the locus.
gained aneusomic chromosomal segment
0
A deletion of a terminal portion of a chromosome resulting from an unbalanced translocation to another chromosome.
In our model, we consider this chromosomal region to be monosomic, and thus a variant single locus complement
dropped partial anneuploid chromosomal element
translocated absent chromosomal segment
truncated chromosome terminus
This is not a deletion in the sense defined by the Sequence Ontology in that it is not the result of an 'excision' of nucleotides, but an unbalanced translocation event. The allelic complement that results is comprised of the terminus or junction represented by this lost chromosomal segment, and the remaining normal segment in the homologous chromosome. The lost aneusommic chromosomal segment is typically accommpanied by a gained aneusomic chromosomal segment from another chromosome.
Loss of translocated chromosomal parts can confer a monosomic condition to a region of the chromsome. This results in a 'variant single locus complement' - in virtue of an abnormal number of features at a particular locus, rather than abnormal sequence within the locus.
lost aneusomic chromosomal segment
A complete chromosome that has been abnormally duplicated, or the absense of a chromosome that has been lost, typically as the result of a non-disjunction event or unbalanced translocation
complete aneusomic chromosome
Large sequence features gained in a genome are considered to be sequence alterations (akin to insertions), including aneusomic chromosome segments gained through unbalanced translocation events, entrie aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable gneme of a cell or organism.
Similarly, large sequence features lost from genome are akin to deletions and therefore also considered sequence alterations. This includes the loss of chromosomal segments through unbalanced translocation events, and the loss of entire chromosomes through a non-disjunction event during replication.
aneusomic chromosome
Stub class to serve as root of hierarchy for imports of biological processes from GO-BP.
biological process
disomic zygosity
aneusomic zygosity
trisomic homozygous
trisomic heterozygous
A heterozygous quality inhering in a single locus complement comprised of two different varaint alleles and no wild type locus. (e.g.fgf8a<ti282a>/fgf8a<x15>)
trans-heterozygous
compound heterozygous
A sequence feature that references some biological macromolecule applied as a reagent in an experiment or technique (e.g. a morpholino expression plasmid, or oligonucleotide probe)
replaced with SO:engineered_region
extra-genomic sequence
obsolete_reagent sequence feature
a heterozygous quality inhering in a single locus complement comprised of one variant allele and one wild-type/reference allele (e.g.fgf8a<ti282a/+>)
simple heterozygous
A structurally or functionally defined component of a transgene (e.g. a promoter, a region coding for a fluorescent protein tag, etc)
transgene part
An attribute inhering in a locus that varies from some designated reference in virtue of alterations in its sequence or expression level
variant
An attribute inhereing in a locus for which there is more than one version fixed in a population at some significant percentage (typically 1% or greater), where the locus is not considered to be either reference or a variant.
polymorphic
An attribute inhering in a feature bearing a sequence alteration that is present at very low levels in a given population (typically less than 1%), or that has been experimentally generated to alter the feature with respect to some reference sequence.
mutant
A sequence feature (continuous extent of located biological sequence) that is located in the heritable genome of a cell or organism.
This class was created largely as a modeling cnvenience to support organizing data for schema definitions. We may consider obsoleting it if it ends up causing confusion or complicating classification of terms in the ontology.
1. A genomic feature is a continuous extent of sequence at a particular location in a genome, which can span any size from a complete chromosome, to a chromosomal band or region, to a gene, to a single base pair or even junction between base pairs (this would be a sequence feature with an extent of zero).
2. A feature being 'located in' a genome here means only that the abstract sequence they contain can be positioned in the genome of some organism by alignment with some reference genomic sequence. In this way, we can distinguish features that can be associated with a particular taxa and assigned genomic coordinates.
3. As sequence features, instances of genomic features are identified by both their inherent *sequence* and their *position* in a genome - as determined by an alignment with some reference sequence. Accordingly, the 'ATG' start codon in the coding DNA sequence of the human AKT gene and the 'ATG' start codon in the human SHH gene represent two distinct genomic features despite having he same sequence, in virtue of their different positions in the genome.
genomic feature
A nucleic acid molecule that contains one or more sequences serving as a template for gene expression in a biological system (ie a cell or virion).
This class is different from genomic material in that genomic material is necessarily heritable, while genetic material includes genomic material, as well as any additional nucleic acids that participate in gene expression resulting in a cellular or organismal phenotype. So things like transiently transfected expression constructs would qualify as 'genetic material but not 'genomic material'. Things like siRNAs and morpholinos affect gene expression indirectly, (ie are not templates for gene expression), and therefore do not qualify as genetic material.
genetic material
An allele that is variant with respect to some wild-type allele, in virtue of its being very rare in a population (typically <1%), or being an experimentally-induced alteration that derives from a wild-type feature in a given strain.
Based on use of 'mutant' as described in PMID: 25741868 ACMG Guidelines
Not required for any specific use case at this point so removed for simplicity.
Formely asserted as allele and inferred as varaint allele.
Eq class definition:
allele
and (mutation or ('has subsequence' some mutation))
'Mutant' is typically contrasted with 'wild-type', where 'mutant' indicates a natural but very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type background locus for a given strain, which can be selected for in establishing a mutant line.
obsolete_mutant allele
A sequence alteration that is very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type feature in a given strain.
mutation
A genetic feature that is not part of the chromosomal genome of a cell or virion, but rather a stable and heritable element that is replilcated and passed on to progeny (e.g. a replicative plasmid or transposon)
Consider replacing with SO_0001038 ! extrachromosomal_mobile_genetic_element
episomal replicon
Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is novel or aberrant (i.e. not included in the reference for that genome), the replicon is considered a 'sequence alteration'.
extrachromosomal replicon
expression construct feature
expression construct
An allele that is fixed in a population at some stable level, typically > 1%. Polymorphic alleles reside at loci where more than one version exists at some signifcant frequency in a population.
PMID: 25741868 ACMG Guidelines
Polymorphic alleles are contrasted with mutant alleles (extremely rare variants that exist in <1% of a population), and 'wild-type alleles' (extremenly common variants present in >99% of a population). Polymorphic alleles exist in equilibrium in a given population somewhere between these two extremes (i.e. >1% and <99%).
polymorphic allele
A polymorphic allele that is present at the highest frequency relative to other polymorphic variants at the same locus.
major allele
major polymorphic allele
A polymorphic allele that is not present at the highest frequency among all fixed variants at the locus (i.e. not the major polymorphic allele at a given locus).
minor allele
minor polymorphic allele
A polymorphic allele that is determined from the sequence of a recent ancestor in a phylogentic tree.
ancestral allele
ancestral polymorphic allele
An allele representing a highly common varaint (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are often compared.
wild-type allele
'Wild-type' is typically contrasted with 'mutant', where 'wild-type' indicates a highly prevalent allele in a population (typically >99%), and/or some prototypical allele in a background genome that serves as a basis for some experimental alteration to generate a mutant allele, which can be selected for in establishing a mutant strain.
The notion of wild-type alleles is more common in model organism databases, where specific mutations are generated against a wild-type reference feature. Wild-type alleles are typically but not always used as reference alleles in sequence comparison/analysis applications. More than one wild-type sequence can exist for a given feature, but typically only one allele is deemed wild-type iin the context of a single dataset or analysis.
wild-type allele
wild-type gene allele
A gene allele representing the most common varaint in a population (typically >99% frequency), that exhibits canonical function, and against which rare and/or non-functional mutant gene alleles are compared in characterizing the phenotypic consequences of genetic variation.
wild-type gene
A gene altered in its expression level in the context of some experiment as a result of being targeted by gene-knockdown reagent(s) such as a morpholino or RNAi.
The identity of a given instance of a reagent-targeted gene is dependent on the experimental context of its knock-down - specifically what reagent was used and at what level. For example, the wild-type shha zebrafish gene targeted in epxeriment 1 by morpholino1 annd in experiment 2 by morpholino 2 represent two distinct instances of a 'reagent-targeted gene', despite sharing the same sequence and position.
reagent targeted gene
A transgene that is delivered as part of a DNA expression construct into a cell or organism in order to transiently express a specified product (i.e. it has not integrated into the host genome).
experimentally-expressed transgene
extrinsic transgene
transiently-expressed transgene
An allele attribute describing a highly common variant (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are compared.
wild-type
One of a set of sequence features that exist at a particular genomic locus.
The notion of an 'allele' defined here is distinct from that of a genetic 'variant' - as 'allele' includes features considered to be the 'reference' at a particular locus. The notion of an 'allele' captures the state of the sequence found at a genetic locus, be it variant or reference. The notion of a 'variant' implies a state that differs from some 'reference' sequence at that location. What is considered the 'reference' state at a particular locus may vary, depending on the context/goal of a particular analysis.
variable feature
An allele is a continuous extent of genomic sequence that spans a locus where variation is known to occur (i.e. where >1 version is known to exist). The minimal extent of an allele is the variable sequence itself (i.e. a 'sequence alteration'), but the notion of an allele includes larger regions that contain as proper parts one or more such alterations (e.g. a 'haplotype').
Our landsacpe analysis found mostly gene-centric definitions of 'allele' that represented a particular version of a gene, or variation within a gene sequence [1][2][3][4][5][6]. But we also found 'allele' used to refer to other types and extents of variation - including single nucleotide polymorphisms, repeat regions, and copy number variations [7][8][9][10][11]. where such variations don't neccessarily impact a gene.
To be maximally accommodating of how this term is used across research communities, GENO defines 'allele' broadly and allow alleles can span any locus or extent of sequence. While 'alleles' encountered in public datases typically overlap a gene, GENO allows for alleles that do not. But we do define the subtype 'gene allele' that refers more specifically to a particular version of a gene.
[1] https://isogg.org/wiki/Allele (retrieved 2018-03-17)
[2] http://semanticscience.org/resource/allele (retrieved 2018-03-17)
[3] https://en.wikipedia.org/wiki/Allele (retrieved 2018-03-17)
[4] https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/allele (retrieved 2018-03-17)
[5] http://purl.obolibrary.org/obo/SO_0001023 (retrieved 2018-03-17)
[6] http://purl.obolibrary.org/obo/NCIT_C16277 (retrieved 2018-03-17)
[7] https://www.snpedia.com/index.php/Allele (retrieved 2018-03-17)
[8] https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism (retrieved 2018-03-17)
[9] http://purl.obolibrary.org/obo/OGI_0000008 (retrieved 2018-03-17)
[10] http://purl.obolibrary.org/obo/OBI_0001352 (retrieved 2018-03-17)
[11] http://purl.phyloviz.net/ontology/typon#Allele (retrieved 2018-03-17)
allele
a sequence attribute of a chromosome or chromosomal region that has been abnormally duplicated or lost, as the result of a non-disjunction event or unbalanced translocation.
aneusomic
An allele of a gene that contains some sequence alteration.
A gene allele is 'variant' in virtue of its containing a sequence alteration that varies from some reference gene standard. But note that a gene allele that is variant in one context/dataset can be considered a reference in another context/dataset.
variant gene allele
The set of both shha gene alleles in a diiploid zebrafish genome, e.g. fgf8a<ti282a/+>.
The collection of the individual base-pairs present at the position 24126737 in both copies of chromosome 5 in a diploid human genome.
The set of all sequence features occupying a particular genomic location across all homologous chromosomes in a genome.
TO DO: show a VCF representation of this example.
allelic complement
single locus feature complement
A genomic feature is any located feature in the genome, be it a junction, a single nucleotide, a gene, or an entire chromosome. The notion of a "complement" describes the collection of all elements in some defined set. The notion of "homologous" genomic features describes those that occupy the same locus on homologous chromosomes. Therefore, we use the term "single locus complement" to describe the set of all homologous features present at a particular genomic locus. This complement is typically a pair of two features in a diploid genome (with two copies of each chromosome). E.g. a gene pair, a QTL pair, a nucleotide pair for a SNP, or a pair of entire chromosomes.
single locus complement
In an experiment where shha is targeted by MO1 and shhb is overexpressed from a transgenic expression construct, the extrinsic genotype captures the altered expression status of these two genes. A notation for representing such a genotype might describe this scenario as:
shha<MO1-1ng/ul>; shhb<pFLAG-mmusShhb>
This notation parallels those used for more traditional 'intrinsic' genotypes, where the affected gene is presented with its alteration in angled brackets < >. In the extrinsic genotype shown here, the variation in shha is affected by a specific concentration of an shha-targeting morpholino (instead of a mutation in the shha gene). And the variation in shhb is affected by its overexpression from a pFLAG Shhb expression construct.
A specification of the known state of gene expression across a genome, and how it varies from some baseline/reference state.
We acknowledge that this is not a 'genotype' in the traditional sense, but this terminological choice highlights similarities that play out in parallel modeling of intrinsic and extrinsic genotype partonomies, and parallel syntactic formats for labeling instances of these genotypes.
Our rationale here is that what we care about from perspective of G2P associations is identifying genomic features that impact phenotype - where experimental approaches include permanent introduction of intrinsic modifications to genomic sequence, and transient introduction of extrinsic factors that modify expression of specific genes. As the former is described by the traditional notion of a genotype, it seems a rational leap to consider the latter akin to an 'extrinsic genotype' wherein the alterations are externally applied rather than inherent to the genome.
Finally, there is some precedent to thinking about such extrinsic modifications in terms of a genotype, in the EFO:0000513 ! genotype: "The total sum of the genetic information of an organism that is known and relevant to the experiment being performed, including chromosomal, plasmid, viral or other genetic material which has been introduced into the organism either prior to or during the experiment."
experimental genotype
expression genotype
An extrinsic genotype describes variation in the 'expression level' of genes in a cell or organism, as mediated by transient, gene-specific experimental interventions such as RNAi, morpholinos, TALENS CRISPR, or construct overexpression. This concept is relevant primarily for model organisms and systems that are subjected to such interventions to determine how altered expression of specific genes may impact organismal or cellular phenotypes in the context of a particular experiment.
The 'extrinsic genotype' concept is contrasted with the more familiar notion of an 'intrinsic genotype', describing variation in the inherent genomic sequence (i.e. 'allelic state'). In G2P research, interventions affecting both genomic sequence and gene expression are commonly applied in order to assess the impact specific genomic features can have on phenotype and disease. It is in this context that we chose to model 'extrinsic' alterations in expression as genotypes - to support parallel conceptualization and representation of these different types of genetic variation that inform the discovery of G2P associations.
extrinsic genotype
A genotype that describes the total intrinsic and extrinsic variation across a genome at the time of a phenotypic assessment (where 'intrinsic' refers to variation in genomic sequence, as mediated by sequence alterations, and 'extrinsic' refers to variation in gene expression, as mediated through transient gene-specific interventions such as gene knockdown reagents or overexpression constructs).
Closest concept/definition we could find for this concept was for EFO:0000513 ! genotype: "The total sum of the genetic information of an organism that is known and relevant to the experiment being performed, including chromosomal, plasmid, viral or other genetic material which has been introduced into the organism either prior to or during the experiment."
An effective genotype is meant to summarize all factors related to genes and their expression that influence an observed phenotype - including 'intrinsic' alterations in genomic sequence, and gene-specific 'extrinsic' alterations in expression transiently introduced at the time of the phenotypic assessment.
effective genotype
A set of all targeted genes in a single genome in the context of a given experiment (e.g. both copies of the WT shha gene in a zebrafish exposed to shha-targeting morpholinos)
reagent-targeted gene complement
reagent-targeted gene complement
The set of all transgenes trransiently expressed in a biological system in the context of a given experiment.
experimental transgene complement
transiently-expressed transgene complement
Consider wild-type zebrafish shha gene in the context of being targeted by morpholino1 vs morpholino 2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'expression-variant genes' because of their different external context. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants of the same gene can have potentially different associated phenotypes).
A gene altered in its expression level relative to some baseline of normal expression in the system under investigation (e.g. a cell line or model organism).
See SO classes under 'silenced gene' (e.g. 'gene silenced by RNA interference'). These seem to represent the concept of a qualified feature as I define it here, in that they are defined by alterations extrinsic to the sequence and position of the gene itself.
expression allele
Expression-variant genes are altered in their expression level through some modification or intervention external to its sequence and position. These may include endogenous mechanisms (e.g. direct epigentic modification that impact expression level, or altered regulatory networks controlling gene expression), or experimental interventions (e.g. targeting by a gene-knockdown reagent, or being transiently expressed as part of a transgenic construct in a host cell or organism).
The identity of a given instance of a experssion-variant gene is dependent on how its level of expression is manipulated in a biological system (i.e. via targeting by gene-knockdown reagents, or being transiently overexpressed). So expression-variant genes have the additional identity criteria of a genetic context of its material bearer (external to its sequence and position) that impacts its level of expression in a biological system.
expression-variant gene
gene targeting reagent
sequence targeting reagent
gene knockdown reagent
A region within a gene that is specifically targeted by a gene knockdown reagent, typically in virtue of bearing sequence complementary to the reagent.
targeted gene segment
reagent-targeted gene subregion
A specification of the genetic state of an organism, whether complete (defined over the whole genome) or incomplete (defined over a subset of the genome). Genotypes typically describe this genetic state as a diff between some variant component and a canonical reference.
As information artifacts, genotypes specify the state of a genome be defining a diff between some canonical reference and a variant or alternate sequence that replaces the corresponding portion of the reference. We can consider a genotype then as a collection of these reference and variant features, along with some rule for operating on them and resolve a final single sequence. This is valid ontologically because we commit only to sequence features being GDCs - which allows for their concretization in either biological or informational patterns. Accordingly, a particular gene allele, such as shh<tbx292>, can be part of a genome in a biological sense and part of a genotype in an informational sense. This idea underpins the 'genotype partonomy' at the core of the GENO model that decomposes a complete genotype into its more fundamental parts, including alleles and allele complements, as described in the comment above.
Core definition above adapted from the GA4GH VMC data model definition here: https://docs.google.com/document/d/12E8WbQlvfZWk5NrxwLytmympPby6vsv60RxCeD5wc1E/edit#heading=h.4e32jj4jtmsl (retrieved 2018-04-09).
Note however that the VMC genotype concept likely is not intended to cover 'effective' and 'extrinsic' genotype concepts defined in GENO.
1. Scope of 'Genetic State':
'Genetic state' is considered quite broadly in GENO to describe two general kinds of 'states'. First, is traditional notion of 'allelic state' - defined as the complement of alleles present at a particular locus ofr loci in a genome (i.e. across all homologous chromosomes containing this locus). Here, a genotype can describe allelic state at a specific locus in a genome (an 'allelic genotype'), or describe the allelic state across the entire genome ('genomic genotype'). Second, this concept can also describe states of genomic features 'extrinsic' to their intrinsic sequence, such as the expression status of a gene as a result of being specifically targeted by experimental interventions such as RNAi, morpholinos, or CRISPRs.
2. Genotype Subtypes:
In GENO, we use the term 'intrinsic' for genotypes describing variation in genomic sequence, and 'extrinsic' for genotypes describing variation in gene expression (e.g. resulting from the targeted experimental knock-down or over-expression of endogenous genes). We use the term 'effective genotype' to describe the total intrinsic and extrinsic variation in a cell or organism at the time a phenotypic assessment is performed.
Two more precise conccepts are subsumed by the notion of an 'intrinsic genotype': (1) 'allelic genotypes', which specify allelic state at a single genomic locus; and (2) 'genomic genotypes', which specify allelic state across an entire genome. In both cases, allelic state is typically specified in terms of a differential between a reference and a set of 1 or more known variant features.
3. The Genotype Partonomy:
'Genomic genotypes' describing sequence variation across an entire genome are 'decomposed' in GENO into a partonomy of more granular levels of variation. These levels are defined to be meaningful to biologists in their attempts to relate genetic variation to phenotypic features. They include 'genomic variation complement' (GVC), 'variant single locus complement' (VSLC), 'allele', 'haplotype', 'sequence alteration', and 'genomic background' classes. For example, the components of the zebrafish genotype "fgf8a<ti282a/ti282a>; fgf3<t24149/+>[AB]", described at zfin.org/ZDB-FISH-150901-9362, include the following elements:
- GVC: fgf8a<ti282a/ti282a>; fgf3<t24149/+> (total intrinsic variation in the genome)
- Genomic Background: AB (the reference against which the GVC is variant)
- VSLC1: fgf8a<ti282a/ti282a> (homozygous complement of gene alleles at one known variant locus)
- VSLC2: fgf3<t24149/+> (heterozygous complement of gene alleles at another known variant locus)
- Allele 1: fgf8a<ti282a> (variant version of the fgf8a gene, present in two copies)
- Allele 2: fgf3<t24149> (variant version of the fgf3 gene, present in one copy)
- Allele 3: fgf3<+> (wild-type version of the fgf3 gene, present in one copy)
- Sequence Alteration1: <ti282a> (the specific mutation within the fgf8a gene that makes it variant)
- Sequence Alteration2: <t24149> (the specific mutation within the fgf3 gene that makes it variant)
A graphical representation of this decomposition that maps each element to a visual depiction of the portion of a genome it denotes can be found here: https://github.com/monarch-initiative/GENO-ontology/blob/develop/README.md
One reason that explicit representation of these levels is important is because it is at these levels that phenotypic features are annotated to genetic variations in different clinical and model organism databases For example, ZFIN typically annotates phenotypes to effective genotypes, MGI to intrinsic genotypes, Wormbase to variant alleles, and ClinVar to haplotypes and sequence alterations. The ability to decompose a genotype into representations at these levels allows us to "propagate phenotypes" up or down the partonomy (e.g. infer associations of phenotypes annotated to a genotype to its more granular levels of variation and the gene(s) affected). This helps to supporting integrated analysis of G2P data.
genotype
ZFIN do not annotate with a pre-composed phenotype ontology - all annotations compose phenotypes on-the-fly using a combination of PATO, ZFA, GO and other ontologies. So while there is no manually curated zebrafish phenotype ontology, the Upheno pipeline generates one automatically here: http://purl.obolibrary.org/obo/upheno/zp.owl
This ontology does not have a root 'phenotype' class, however, and so we generate our own in GENO as a stub placeholder for import of needed zebrafish phenotype classes.
zebrafish phenotype
an allelic state where a single allele exists at a particular locus in the organellar genome (mitochondrial or plastid) of a cell/organism.
homoplasmic
an allelic state where more than one type of allele exists at a particular locus in the organellar genome (mitochondrial or plastid) of a cell/organism.
heteroplasmic
hemizygous X-linked
hemizygous Y-linked
hemizygous insertion-linked
A genomic genotype that specifies the baseline sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).
Being a 'background genotype' implies the derivation of some variant from this background (which is the case for most model organism database genotypes/strains). This is a subtly different notion than being a 'reference genotype' , which can be any genotype that serves as a basis for comparison. But in a sense all background genotypes are by default reference genotypes, in that the derived variant genotype is compared against it.
reference genotype
background genotype
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
An extended part of a chromosome representing a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.
New term request for SO.
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
chromosomal region
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
chromosome sub-band
chromosomal band brightness
chromosomal band intensity
gpos
gneg
gvar
gpos100
gpos75
gpos50
gpos25
A chromosome arm that is the shorter of the two arms of a given chromosome.
p-arm
stalk
short chromosome arm
A chromosome arm that is the longer of the two arms of a given chromosome.
q-arm
long chromosome arm
gpos66
gpos33
A transgene part whose sequence regulates the synthesis of a functional product, but which is not itself transcribed.
regulatory transgene region
A transgene part whose sequence is expressed in a gene product through transcription and/or translation.
coding transgene feature
expressed transgene region
reporter region
A transgene whose product is used as a selectable marker.
selectable marker transgene
A genotype that describes what is known about variation in a genome at a gross structural level, in terms of the number and appearance of chromosomes in the nucleus of a eukaryootic cell.
Derived from http://en.wikipedia.org/wiki/Karyotype (accessed 2017-03-28)
Karyotypes describe structural variation across a genome at the level of chromosomal morphology and banding patterns detectable in stained chromosomal spreads. This coarser level does not capture more granular levels of variation commonly represented in other forms of genotypes (e.g. specific alleles and sequence alterations).
A base karyotype representing a genome with no known structural variation can be as simple as '46XY', but karyotypes typically contains some gross variant component (such as a chromosome duplication or translocation).
karyotype
A genomic genotype where the genomic background specifies a male or female sex chromosome complement.
This modeling approach enables creation separate genotype instances for data sources that report sex-specific phenotypes to ensure that sex-specific G2P differences are accurately described. These sex specific genotypes can be linked to the broader intrinsic genotype that is shared by male and female mice of the same strain, to aggregate associated phenotypes at this level, and allow aggregation with G2P association data about the same strains from sources that distinguish sex-specific phenotypes (e.g. IMPC) and those that do not (e.g. MGI).
In the genotype partonomy, a sex qualified genotype has as part a sex-agnostic genotype. This allows for the propagation of phenotypes associated with a sex-qualified genotype to the intrinsic genotype. Ontologically, this parthood is based on the fact that the background component of a sex-qualified genotype specifies the sex chromosomes while that of the sex-agnostic genotype does not. Thus, the sequence content of the sex-qualified genotype is a superset of that of the intrinsic genotype, with the latter being a proper part of the former.
intrinsic genotype (sex-specific)
sex-qualified genotype
sex-qualified intrinsic genotype
We distinguish the notion of a sex-agnostic intrinsic genotype, which does not specify whether the portion of the genome defining organismal sex is male or female, from the notion of a sex-qualified intrinsic genotype, which does. Male and female mice that contain the same background and genetic variation complement will have the same 'sex-agnostic intrinsic genotype', despite their genomes varying in their sex-chromosome complement. By contrast, these two mice would have different 'sex-qualified intrinsic genotypes', as this class takes background sex chromosome sequences into account in the identity criteria for its instances.
Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome.
genomic genotype (sex-qualified)
A genomic genotype here the genomic background specifies a male sex chromosome complement.
male intrinsic genotype
A genomic genotype here the genomic background specifies a female sex chromosome complement.
female intrinsic genotype
A background genotype whose sequence or identity is not known or specified.
unspecified background genotype
An exhaustive collection of all sequence features that make up an enumerated sequence feature collection with a well-defined set of members (e.g. all chromosomes in a genome, all variant alleles in a genome, all copies of a particular gene or allele in a genome, all allleles defined in a haplotype).
Not all sequence feature complements will be collections - i.e. in some cases the complement of all features of type X will consist of a single feature. For example, a 'single locus complement' for an X-linked locus in a XY male.
sequence feature complement
An exhaustive collection of all features of a specified type in a single genome (e.g. all chromosomes in a genome, all variant features in a genome, all copies of a given gene or allele in a genome).
In some cases there may be zero or only one member of such a complement, which is why this class is not defened to necessarily have some 'genomic feature' as a member.
genomic locus complement
A genomic feature is any located sequence feature in the genome, from a single nucleotide to a gene into an entire chromosome. A complement is the collection of all elements in a set (i.e. "the full number of things in a set"). Here, a 'genomic feature complement' is here the set of all features in a single genome of a specified type (according to some defined inclusion criteria).
genomic feature complement
A genomic feature that is part of a gene, and delineated by some functional or structural function or role it serves (e.g.a promoter element, coding region, etc).
defined gene part
gene part
A transgene that codes for a product used as a reporter of gene expression or activity.
reporter transgene
A junction between bases, a deletion variant, a terminus at the end of a chromosome.
A genomic feature that has an extent of zero.
Former logical def:
'genomic feature'
and (has_extent value 0)
obsolete_null feature
An extrachromosomal replicon that is variant in a genome in virtue of its being a novel addition to the genome - i.e. it is not present in the reference for the genome in which it is found.
aberrant extrachromosomal replicon
exogenous extrachromosomal replicon
transgenic extrachromosomal replicon
Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is exogenous or aberrant (i.e. not included in the reference for that genome), the replicon is considered a 'variant locus' and a 'sequence alteration'.
novel extrachromosomal replicon
A genomic feature that represents an entirely new replicon in the genome, e.g. an extrachromosomal replicon or an extra copy of a chromosome.
This class is defined so as to support classification of things like novel extrachromosomal replicons and aneusomic chromosomes as being variant alleles in a genome. These represent entirely new features in the genome - not variants of an existing feature.
Novel replicons are considered as an 'insertion' in a genome, and as such, qualify as types of sequence_alterations and variant alleles. There is no pre-existing locus that it modifies, however, and thus it is not really an 'allele of' a named locus. But conceptually, we still consider these to represent genetic variants and classify them as variant alleles.
novel replicon
An attribute of a genomic feature that represents a feature not previously found in a given genome, e.g. an extrachromosomal replicon or aneusomic third copy of a chromosome.
novel
A sequence feature representing the end of a sequence that is bounded only on one side (e.g. at the end of an chromosome or oligonucleotide).
terminus
An extent of continuous biological sequence, or a collection of such sequences, whose identity is dependent on both its sequence and its position.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
sequence feature or collection
An ordered collection units representing successive monomers of a biological macromolecule.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
biomacromolecular sequence
state
VMC:State
'Sequences' differ from 'sequence features' in that instances are distinguished only by their inherent ordering of units, and not by any positional aspect related to alignment with some reference sequence. Accordingly, the 'ATG' translational start codon of the human AKT gene is the same *sequence* as the 'ATG' start codon of the human SHH gene, but these represent two distinct *sequence features* in virtue of their different positions in the genome.
biological sequence
true
state
In the VMC model, the notion of a GENO:biological sequence is called the 'state' of an allele.
A sequence feature (or collection of features) whose identity is dependent on the context or state of its material bearer (in addition to its sequence an position). This context/state describes factors external to its inherent sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
qualified sequence feature or collection
Consider wild-type zebrafish shha gene in the context of being targeted by morpholino1 vs morpholino 2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'qualified sequence feature' because of their different external context. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants of the same gene can have potentially different associated phenotypes).
A sequence feature whose identity is dependent on the context or state of its material bearer (in addition to its sequence and position). This context describes factors external to its inherent sequence and position. Examples include the context of being targeted by gene-knockdown reagents, or the context of being the target of an epigenetic modification.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
Modeling sequence entities at this 'qualified' level useful when it is important to distinguish features with identical sequence and position as separate instances - based on their material bearers being found in different contexts. For example, a situation where the shha gene is targeted by two different morpholinos and phenotypes assessed for each. This is analogous to two different alleles of the shha gene at the sequence feature level, and similarly worthy of being distinguished when considering how the resulting alteration in gene expression impacts the measured phenotypes of the host zebrafish.
qualified genomic feature
true
This axiom is an initial attempt to formalize the identity criteria of an extrinnsic context that separates qualified sequence features from sequence features (i.e. the context of its material bearer). As we further develop our efforts here this will get refined and more precise.
true
Formalizes one identity criteria of the sequence feature component of a qualified sequence feature (which itself is identified by its sequence and its genomic position).
A set of all qualified sequence features of a specified type in a single genome.
In some cases there may be zero or only one member of such a complement, which is why this class is not defened to necessarily have some 'qualified genomic feature' as a member.
qualified genomic feature complement
A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype.
Genotype vs Genome in GENO: An (intrinsic) genotype is an information artifact representing an indirect syntax for specifying a genome sequence. This syntax has reference and variant components - a 'background genotype' and 'genomic variation complement' - that must be operated on to resolve a specifie genome sequence. Specifically, the genome sequence is resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the 'reference genome'. So, while the total sequence content represented in a genotype may be greater than that in a genome, the intended resolution of these sequences is to arrive at a single genome sequence. It is this end-point that we consider when holding that a genotype 'specifies' a genome.
1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome ('genomic background'), and all specific variants from this reference (the 'genomic variation complement'). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'genomic background' sequence.
2. 'Heritable' genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons.
intrinsic genotype
DNA sequence
RNA sequence
amino acid sequence
obsolete_biological sequence or collection
obsolete_biological sequence collection
A sequence feature whose identity is additionally dependent on the cellular or anatomical location of the genetic material bearing the feature.
As a qualified sequence feature, the BRCA1c.5096G>A variant as materialized in a somatic breast epithelial cell could be distinguished as a separate entity from a BRCA1c.5096G>A variant in a different cell type or location (e.g. germline BRCA1 varaint in a sperm cell).
location-qualified sequence feature
A sequence feature whose identity is additionally dependent on factors specifically influencing its level of expression in the context of a biological system (e.g. being targeted by gene-knockdown reagents, or driven from exogneous expression system like recombinant construct)
expression-qualified sequence feature
A sequence feature position based on a genomic coordinate system, where the position specifies start and end coordinates based on its alignment with some reference genomic sequence.
This 'genomic position' concept differs from the faldo:Position concecpt in that the former describes the start AND end points/coordinates of a feature, while the latter describes a single point/coordinate at the beginning OR end of a feature.
genomic coordinates
remodeling notion of sequence feature position around the idea of a 'genomic locus'
obsolete_genomic position
phenotypic inheritance process
A sequence attribute inhering in a feature whose identity is not specified.
unspecified
An attribute describing a type of variation inhering in a sequence feature or collection.
allele attribute
variation attribute
An intrinsic genotype that specifies variation from a reference or background.
variant genomic genotype
An information entity that is intented to represent some biological sequence, sequence feature, qualified sequence feature, or a collection of one or more of these entities.
eliminating classes that are not necessary or add uneeded complexity.
obsolete_sequence information entity
biological sequence residue
monomeric residue
biological sequence unit
deoxyribonucleic acid residue
DNA residue
ribonucleic acid residue
RNA residue
amino acid residue
An attribute, quality, or state of a sequence or sequence feature.
Sequence feature attributes can be based on qualities of the material bearers of the feature, for example, the staining intensity of a chromosomal band feature.
http://purl.obolibrary.org/obo/SO_0000400
sequence feature attribute
The location of a sequence feature as defined by its start and end position on some reference coordinate system.
1. The notion of a sequence feature location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be 'occupied by' physical objects, while a location is 'occupied by' sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a location is distinct from a sequence feature that occupies it. A more concrete analogy can be drawn to the distinction between a street address (the location) and the building that occupies it (the sequence feature).
2. A sequence feature location is defined by its begin and end coordinates on a reference sequence, but it is not identified by a particular sequence that may reside there. The same location, defined by a alignment to some reference, may be occupied by different sequences in the genome of organism 1 vs that of organism 2.
sequence feature location
A sequence feature whose identity is additionally dependent on a chemical modification made to the genetic material bearing the feature (e.g. binding of transcriptional regulators, or epigenetic modifications including direct DNA methylation, or modification of histones associated with a feature)
modification-qualified sequence feature
1. The zebrafish "fgf8a<ti282a>/fgf8a<+>" allelic genotype describes the combination of gene alleles present at a specific gene locus (the fgf8a locus - which here has a heterozygous state).
2. The human allelic genotypes in the VCF records describes below describe the set of SNPs present at specific positions on Chromosome 20 in the human genome. The first record describes a heterozygouse C/T allelic genotype at Chr20:2300608, and the second describes a homozygous G/G allelic genotype at Chr20:2301308.
##fileformat=VCFv4.2
##FORMAT=<ID=GT, Description="Genotype, 0=REF, 1=ALT">
#CHROM POS REF ALT FILTER FORMAT SAMP001
20 2300608 C T PASS GT 0/1
20 2301308 T G PASS GT 1/1
(derived from https://faculty.washington.edu/browning/beagle/intro-to-vcf.html)
3. Some allelic genotype formats encode the genotype as a single string - e.g. "GRCh38 Chr12:258635(A;T)" describes a heterozygous A/T allelic genotype of SNPs present at a specific position 258635 on human chromosome 12.
A genotype that specifies the 'allelic state' at a particular locus in the genome - i.e. the set of alleles present at this locus across all homologous chromosomes.
single locus genotype
An 'allelic genotype' describes the set of alleles present at a particular locus ('allelic genotype') in the genome. This use of the term 'genotype' reflects its use in clinical genetics where variation has historically been assessed at a specific locus, and a genotype describes the allelic state at that particular location.
This contrasts to the use of the term 'genotype in model orgnaism communities where it commonly describes the allelic state at all loci in a genome known to vary from an established reference or background.
allelic genotype
Exploratory class looking at creating more specific subtypes of associatiosn, and defining identity criteria for each.
genotype-phenotype association
true
true
true
true
knockdown reagent targeted gene complement
A sequence alteration within the coding sequence of a gene.
Not required at this poitn, so marked exploratory and obsoleted.
Asserted under sequence_alteration.
obsolete_coding sequence alteration
A construct that contains a mobile P-element, holding sequences to be delivered to a target cell or genome.
P-element construct
An engineered region that is used to transfer foreign genetic material into a host cell.
engineered_genetic_vector
Constructs can be engineered to carry inserts of DNA from external sources, for purposes of cloning and propagation or gene expression in host cells.
Constructs are typically packaged as part of delivery systems such as plasmids or viral vectors.
engineered genetic construct
A transgene that is not chromosomally integrated in the host genome, but instead exists as part of an extra-chromosomal construct.
non-integrated transgene
extra-chromosomal transgene
A collection of more than one sequence feature.
http://purl.obolibrary.org/obo/SO_0001260 ! sequence_collection
obsolete_sequence feature collection
A set of genetically-linked alleles, that reside at different locations on the same chromosomal strand and are typically inherited/transmitted together.
reconsiderd defintion of haplotype to better align with SO and GA4GH/VMC definitions.
Former SC axiom: has_member some allele.
Informed by https://isogg.org/wiki/Haplotype and https://en.wikipedia.org/wiki/Haplotype.
Haplotypes are 'complements' in that they include the set of all alleles in a defined region of the genome. Because they are genetically linked, the alleles comprising a haplotype are likely to be co-inherited and survive descent across many generations of reproduction. Haplotypes are located within 'haplotype blocks' - regions of DNA where no recombination occurs between alleles on the same chromosomal strand. The 'haplotype' within a 'haplotype block' may be comprised of any number of distinct alleles.
As described at https://en.wikipedia.org/wiki/Haplotype, the term 'haplotype' is commonly used to describe three specific cases of genetically-linked alleles within a haplotype block:
1. The first is a set of linked 'gene alleles' - specific versions of entire genes that reside in tightly linked clusters on a single chromosome.
2. The second is a set of linked 'single nucleotide polymorphism' (SNP) alleles that tend to occur together (i.e. be statistically associated).
3. A third and less common use is describing individual collections of specific mutations within a given genetic segment - typically short tandem repeats.
Each of these more specific definition serves a purpose for a particular type of genetic analysis or use case - e.g. 'SNP allele' haplotypes are identified and analysed in studies to uncover the genetic basis of common disease by efforts like the International HapMap Project.
The GENO definition is broadly inclusive of these and any other scenarios where distinct 'alleles' of any kind on the same chromosomal strand are genetically linked and thus tend to be co-inherited across successive generations.
obsolete_haplotype
A attribute describing the number of copies of a feature present in a genome.
copy number
A relation used to describe an environment contextualizing the identity of an entity.
microsatellite alteration
A relation used to describe a process contextualizing the identity of an entity.
repeat region alteration
A quality inhering in an 'allelic complement' (aka a 'single locus complement') that describes the allelic variability found at a particular locus in the genome of a single cell/organism
allelic state
allelic dosage
an attribute inhering in a feature based on the total number or relative stoichiometry of copies present in a particular genome.
gene dosage
genetic dosage
A quality inhering in an allele based on its source/origin - typically the parent from which it was inherited.
allele origin
a quality of an allele in virtue of its having been inherited from a female parent.
maternal allele origin
a quality of an allele in virtue of its having been inherited from a male parent.
paternal allele origin
a quality of an allele in virtue of its having occurred through a de novo mutaiton, rather than inherited from a parent..
de novo allele origin
a quality of an allele in virtue of its origin not being known.
unknown allele origin
A quality inhering in a particular allele in virtue of its presence only in the genome of non-germ cells in an organism, and therefore not inherited by subsequent generations.
somatic
a quality inhering in a feature in virtue of its presence only in the genome of gametes (germ cells).
germ-line
replaced by GENO:0000900 ! 'germline'
obsolete_gametic
2
An allelic genotype specifying the set of two alleles present at a locus in a diploid genome (i.e., a diploid 'single locus complement')
Alt: A sequence feature complement comprised of two haplotypes at a particular locus on paired homologous chromosomes in a diploid genome.
"Humans are diploid organisms; they have paired homologous chromosomes in their somatic cells, which contain two copies of each gene. An allele is one member of a pair of genes occupying a specific spot on a chromosome (called locus). Two alleles at the same locus on homologous chromosomes make up the individual’s genotype. A haplotype (a contraction of the term ‘haploid genotype’) is a combination of alleles at multiple loci that are transmitted together on the same chromosome. Haplotype may refer to as few as two loci or to an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. Genewise haplotypes are established with markers within a gene; familywise haplotypes are established with markers within members of a gene family; and regionwise haplotypes are established within different genes in a region at the same chromosome. Finally, a diplotype is a matched pair of haplotypes on homologous chromosomes."
From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/figure/sap-26-03-165-g002/
diplotype
A quality inhering in a collection of discontinuous sequence features in a single genome in virtue of their relative position on the same or separate chromosomes.
allelic phase
oryzias latipes strain
a quality of an allele in virtue of its having been inherited from either parent.
parental allele origin
unknown inheritance
The canonical allele that represents a single nucleotide variation in the BRCA2 gene, which can be described by various contextual alleles such as “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A”.
One of a set of sequence features or haplotypes that exist at a particular genetic locus. <see ClinGen Allele Model>
The notion of a 'canonical allele' is taken from the ClinGen Allele model (http://dataexchange.clinicalgenome.org/allele/). It is implemented in GENO to provide an ontological representation of this concept that will support data integration efforts, but may be replaced by should an IRI become available from the ClinGen model.
http://dataexchange.clinicalgenome.org/allele/resource/canonical_allele/
ClinGen Allele Model (http://dataexchange.clinicalgenome.org/allele/)
As a 'sequence feature or collection' (sensu SO), a 'canonical allele' is considered here as an extent of biological sequence encoded in nucleic acid molecules of a cell or organism (as opposed to an information artifact that is about such a sequence). Canonical alleles can include haplotypes that contain more than one discontinuous sequence alteration that exist in cis on the same chromosomal strand.
In the ClinGen allele model, 'canonical alleles are contrasted with 'contextual alleles'. Contextual alleles are informational representation that describe a canonical allele using a particular reference sequence. A single canonical allele can be described by many contextual alleles that each use a different reference sequence in their representation (e.g. different chromosomal or transcript references)
canonical allele
An informational artifact that describes a canonical allele by defining its sequence and position relative to a particular reference sequence.
The notion of a 'contextual allele' is taken from the ClinGen Allele model (http://dataexchange.clinicalgenome.org/allele/). It is implemented in GENO to provide an ontological representation of this concept that will support data integration efforts, but may be replaced by should an IRI become available from the ClinGen model.
http://dataexchange.clinicalgenome.org/allele/resource/contextual_allele/
ClinGen Allele Model (http://dataexchange.clinicalgenome.org/allele/)
The notion of a 'contextual allele' derives from the ClinGen Allele model. Here, each genetic allele in a patient corresponds to a single 'canonical allele', which in turn may aggregate any number of 'contextual allele' representations that are may be defined against different reference sequences. Accordingly, many contextual alleles can describe a single canonical allele. For example, the contextual alleles “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A” both describe the same underlying canonical allele, a single nucleotide variation, in the BRCA2 gene.
contextual allele
A mode of inheritance whereby manifestation of a trait or condition occurs only when both affected and unaffected mitochondria are inherited (i.e. some mitochondria that do and some that do not contain the causative allele).
heteroplasmic mitochondrial inheritance
A mode of inheritance whereby manifestation of a trait or condition occurs only when affected mitochondria are inherited (i.e. mitochondria containing the causative allele)
homoplasmic mitochondrial inheritance
true
An generically dependent continuant that carries biological sequence that is part of or derived from a genome.
An abstract/organizational class to support data modeling, that includes genomic features, genomic feature complements, qualified genomic features and their complements, as well as genotypes that denote such entities.
genomic entity
A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequences it contain are typically co-inherited/transmitted across generations.
Consider http://purl.obolibrary.org/obo/SO_0000355 ! haplotype_block. And consider whether, as a defined region of genomic sequence where variation is known to occur, a haplotype block should be classified as a subtype of allele.
From DOI: 10.1126/science.1069424
reconsiderd defintion of haplotype to better align with SO and GA4GH/VMC definitions.
Former SC axiom: has_part some haplotype
A 'haplotype block' is a continuous region of sequence that contains 'haplotypes' - sets of discontinuous, genetically-linked sequence alterations on the same chromosomal strand that are typically co-inherited within this haplotype block. The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span one sequence alteration or several, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype.
obsolete_haplotype block
A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype.
'Genomic Genotype' vs 'Genome' in GENO:
A genomic genotype is an information artifact with a representational syntax that can specify what is known about the complete sequence of a genome. This syntax describes 'reference' and 'variant' components - namely a 'background genotype' and 'genomic variation complement' - that must be operated on to resolve the genome sequence. Specifically, the genome sequence is determined by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'background genotype'. So, while the total sequence content described in a genotype may exceed that of a single a genome (in that it includes a reference genome and variatoin complement), the intended resolution of these sequences is to arrive at a single genome sequence. It is this end-point that we consider when asserting that a genotype 'specifies' a genome.
1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome ('genomic background'), and all specific variants from this reference (the 'genomic variation complement'). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'genomic background' sequence.
2. 'Heritable' genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons.
genomic genotype
gametic
A quality inhering in a particular allele in virtue of its presence in the genome of germ cells in an organism, and therefore having the potential to be inherited by subsequent generations.
germline
A quality inhering in a particular allele in virtue of its presence only in a particular type of cell in an organism (e.g. somatic vs germ cells)
Cellular context of an allele is typically defined in the context of evaluating an individual organism, as alleles that are somatic in one organism can be germline in others.
allele cellular context
The location of a sequence feature in a genome, defined by its start and end position on some reference genomic coordinate system
In GENO, the notion of a Genomic Locus plays the same role as that of a FALDO:Region in the design pattern for describing the location of a feature of interest. We define this specific GENO class because the ontological semantics FALDO:Region class are not clear, nor is how it fits into the GENO model. We will work to resolve these questions and ideally converge these concepts in the future.
We don't link a Genomic Locus to a specific reference sequence because in the FALDO model (which GENO adopts with the exception of swapping GENO:Genomic Locus for FALDO:Region), allows the start and end positions of a region to be defined on separate reference sequences. So while a given locus is conceptually associated with a single reference, in practice it can be pragmatic to define start and stop on different references sequences.
In practice, GENO advocates describing biology at the level of genomic features - i.e. define specific terms for genes as genomic features, and not duplicate representation of the loci where each gene resides. So we would have a class representing the human Shh gene as a 'genomic feature', but not parallel this with a 'human Shh gene locus' class. The utility of the 'genomic locus' class in the ontology is primarily to be clear about the distinction, but we would only use it in modeling data if absolutely needed.
For example, we would define an 'HLA gene block' as a subclass of 'genomic feature', and assert that HLA-A, HLA-B, and HLA-C genes are part/subsequences of this HLA gene block (as opposed to modeling this as an 'HLA locus' and asserting that the HLA-A, HLA-B, and HLA-C genes occupy this locus).
genomic location
VMC:Location
1. A genomic locus is defined by its begin and end coordinates on a reference genome, but it is not identified by a particular sequence that may reside there. In GENO, we say that a genomic locus is *occupied_by* a sequence feature. This sequence feature is identified by its sequence and its position in the genome (i.e. the locus it occupies). So the ATG sequence beginning the ORF of the human Shh gene shares the same sequence as the ATG beginning the ORF of the human Akt gene. But these are distinct sequence features because they occupy different loci in the genome. We call sequence features that are located in a genome 'genomic features'.
2. A particular locus (e.g. the human Shh gene locus) may be occupied by different sequence features (e.g. different 'alleles' of the Shh gene). Within the genome of a single (diploid) organism, there is potential for two sequence features to exit at such a locus (i.e. two different Shh alleles). Across the genomes of all cells comprising members of a species, many more alleles of the Shh gene may exist. These alleles occupy the same genomic locus with different sequences (or no sequence at all in the case of a complete Shh deletion).
3. The notion of a genomic locus in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be occupied_by physical objects, while a genomic locus is occupied_by sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic locus is distinct from a sequence feature that occupies it. A more concrete analogy can be drawn to the distinction between a street address (the locus) and the building that occupies it (the sequence feature).
genomic locus
true
A material entity that is an organism, derived from an organism, or composed of organisms (e.g. a cell line, biosample, tissue culture, population, etc).
useful organizational term to collect entities that have genomes/genotypes.
organismal entity
The molecular product resulting from transcription of a single gene (either a protein or RNA molecule)
gene product
reporter role
selectable marker role
selectable marker region
A genome whose sequence is identical to that of a genome sequence considered to be the reference.
reference genome
A haplotype is an allele that represents one of many possible versions of a 'haplotype block', which defines a region of genomic sequence that is typically 'co-inherited' across generations due to a lack of historically observed recombination within it.
Informed by https://isogg.org/wiki/Haplotype and https://en.wikipedia.org/wiki/Haplotype and http://purl.obolibrary.org/obo/SO_0001024 ! haplotype.
1. Haplotypes typically contain 'genetically-linked' sequence alterations that are likely to be co-inherited and survive descent across many generations of reproduction. A common use of 'haplotype' is in phasing of patient WGS or WES data, where theis term refers to sequence containing two or more alterations that are beleived to be 'in cis' on the same chromosomal strand. GENO's definition is consistent with but more inclusive than this view, allowing for haplotypes with one or zero established alterations as long as there is a low probability of recombination within the region it spans (such that alterations found in cis are likely to remain in cis across successive generations). As a result, GENO considers any allele that spans an extent greater than that of a single sequence alteration to represent a haplotype, if there is an expectation of low recombination frequency within the allele. For example, a 'gene allele' is a particular version of a gene that contains one or more sequence alterations, and represents a haplotype that spans the extent of a gene.
2. The relationship between 'haplotype' and 'haplotype block' is analogous to the relationship between 'gene allele' and 'gene' - a 'gene allele' is one of many possible instances of a 'gene', while a 'haplotype' is one of many possible instances of a 'haplotype block'. In this sense, a gene allele can be considered to be a haplotype whose extent is that of a gene (if it is the case that there is a lack of historic recombination within the extent of the gene)
3. As described at https://en.wikipedia.org/wiki/Haplotype, the term 'haplotype' is commonly used to describe three specific cases of genetically-linked alleles within a haplotype block:
a. The first is a set of linked 'gene alleles' - specific versions of entire genes that reside in tightly linked clusters on a single chromosome.
b. The second is a region containing linked 'single nucleotide polymorphism' (SNP) alleles that tend to occur together on a chromosomal strand (i.e. be statistically associated).
c. A third and less common use is describing individual collections of specific mutations within a given genetic segment - typically short tandem repeats.
Each of these more specific definition serves a purpose for a particular type of genetic analysis or use case - e.g. 'SNP allele' haplotypes are identified and analysed in studies to uncover the genetic basis of common disease by efforts like the International HapMap Project.
The GENO definition of haplotype is broadly inclusive of these and any other scenarios where distinct 'alleles' of any kind on the same chromosomal strand are genetically linked and thus tend to be co-inherited across successive generations. This typically covers the case of a 'gene allele' (a particular version of a gene) - as most genes span regions of a size that there is little chance of recombination events moving cis alterations onto separate chromosomes.
haplotype
A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequences it contain are typically co-inherited/transmitted across generations.
Derived from DOI: 10.1126/science.1069424 and http://purl.obolibrary.org/obo/SO_0000355 ! haplotype_block.
A haplotype block is a class of genomic sequence defined by a lack of evidence for historical recombination, such that sequence alterations within it tend to be co-inherited across successive generations. A haplotype is considered to be one of many possible versions of a 'haplotype block' - defined by the set of co-inherited alterations it contains. In this sense, the relationship between 'haplotype' and 'haplotype block' is analogous to the relationship between 'gene allele' and 'gene' - a 'gene allele' is one of many possible instances of a 'gene', while a 'haplotype' is one of many possible instances of a 'haplotype block'.
The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span one sequence alteration or several, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype.
haplotype block
molecular function
A biological process whose specific outcome is the progression of an integrated living unit: an anatomical structure (which may be a subcellular structure, cell, tissue, or organ), or organism over time from an initial condition to a later condition. [database_cross_reference: GOC:isa_complete]
developmental process
pulling in HP 'phenotypic abnormality' root here
human phenotypic abnormality
Stub class to serve as root of hierarchy for imports of human developmental stages from the Human Developmental Stages Ontology.
A spatiotemporal region encompassing some part of the life cycle of an organism.
human life cycle stage
information content entity
Examples of information content entites include journal articles, data, graphical layouts, and graphs.
an information content entity is an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity
information_content_entity 'is_encoded_in' some digital_entity in obi before split (040907). information_content_entity 'is_encoded_in' some physical_document in obi before split (040907).
Previous. An information content entity is a non-realizable information entity that 'is encoded in' some digital or physical entity.
PERSON: Chris Stoeckert
OBI_0000142
information content entity
information content entity
ontology metadata
data about an ontology part
where to place this depends on if we take the organismal view or the quality centric view.
mammalian phenotype
Mus musculus
Stub class to serve as root of hierarchy for imports of virus types from relevant ontologies or terminologies.
Viruses
Danio rerio
Oryzias latipes
Homo sapiens
A processual entity that realizes a plan which is the concretization of a plan specification.
Stub class to serve as root of hierarchy for experimental techniques and processes, defined in GENO or imported from ontologies such as OBI and ERO.
planned process
reagent role
a population is a collection of individuals from the same taxonomic class living, counted or sampled at a particular site or in a particular area
population
An assay which generates data about a genotype from a specimen of genomic DNA. A variety of techniques and instruments can be used to produce information about sequence variation at particular genomic positions.
genotyping assay
A genetic transformation that renders a gene non-functional, e.g. due to a point mutation, or the removal of all, or part of, the gene using recombinant methods.
A genetic transformation that involves the insertion of a protein coding cDNA sequence at a particular locus in an organism's chromosome. Typically, this is done in mice since the technology for this process is more refined, and because mouse embryonic stem cells are easily manipulated. The difference between knock-in technology and transgenic technology is that a knock-in involves a gene inserted into a specific locus, and is a "targeted" insertion.
targeted gene knock-out technique
targeted gene knock-in technique
Stub class to serve as root of hierarchy for imports from NCBI Taxonomy.
organism
the introduction. alteration or integration of genetic material into a cell or organism
genetic modification technique
'Value' label chosen here according to http://www.uwgb.edu/heuerc/2D/ColorTerms.html
Was parent of chromosomal band intensity before moving this class to live as a sequence feature attribute.
color value
obsolete_color brightness
female
male
phenotypic sex
A material entity that consists of two or more organisms, viruses, or viroids.
A group of organisms of the same taxonomic group grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role).
collection of organisms
A domestic group, or a number of domestic groups linked through descent (demonstrated or stipulated) from a common ancestor, marriage, or adoption.
family
Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino.
morpholino_oligo
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere.
Formerly http://purl.obolibrary.org/obo/GENO_0000613, replaced by SO term.
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
chromosome arm
Any extent of continuous biological sequence.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
A sequence feature is an extent of biological sequence. An instance of a sequence feature is identified by both its sequence (inherent ordering of units representing nucleic acid or animo acid monomers) and its position (start and stop coordinates based on alignment with some reference feature). By contrast, 'biological sequences' are identified and distinguished only by their inehrent sequence, and not their position. Accordingly, the 'ATG' start codon in the coding DNA sequence of the human AKT gene is the same 'sequence' as the 'ATG' start codon in the human SHH gene, but these represent two distinct 'sequence features' in virtue of their different positions in the genome.
sequence_feature
true
Formalizes the first identity criteria for a sequence feature of its sequence.
true
Formalizes the second identify criteiria for a sequence feature of its genomic position. We use the FALDO model to represent positional information, which links features to positional information through an instance of a Region class that represents the mapping of the feature onto some reference sequence. (But features can also be linked to Positions directly through the location property).
A region of known length which may be used to manufacture a longer region.
assembly_component
A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases.
contig
0
The point at which one or more contiguous nucleotides were excised.
deleted_sequence
nucleotide deletion
nucleotide_deletion
SO:1000033
SO:0000159
SOFA
http://en.wikipedia.org/wiki/Nucleotide_deletion
deletion
enhancer
A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the basal transcription machinery.
promoter
A region of nucleotide sequence that has translocated to a new position.
transchr
translocated sequence
SO:0000199
DBVAR
translocation
SSLP
simple sequence length polymorphism
simple sequence length variation
SO:0000207
simple_sequence_length_variation
sequence length variation
SO:0000248
sequence_length_variation
See here for a list of engineered regions in ZFIN: http://zfin.org/cgi-bin/webdriver?MIval=aa-markerselect.apg&marker_type=REGION&query_results=t&compare=contains&WINSIZE=25.
Includes things like loxP sites, inducible promoters, ires elements, etc.
engineered_foreign_gene
A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem.
http://en.wikipedia.org/wiki/Microsatellite_%28genetics%29
A defined feature that includes any type of VNTR or SSLP locus.
microsatellite
RNAi_reagent
Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication.
A complete chromosome sequence.
chromosome
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark.
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
"Band' is a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.
chromosome band
centromere
Obsoleted as we didnt want to commit to constructs being plasmids - but rather wanted a classification of more general types of engineered regions used to replicate and deliver sequence to target cells/genomes. Replaced by GENO:0000856 ! engineered genetic construct.
obsolete_engineered_plasmid
The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence.
insertion
nucleotide insertion
nucleotide_insertion
SO:1000034
SO:0000667
DBVAR
SOFA
insertion
SNPs are single base pair positions in genomic DNA at which different sequence alternatives exist in normal individuals in some population(s), wherein the least frequent variant has an abundance of 1% or greater.
single nucleotide polymorphism
SO:0000694
SOFA
SNP
A junction is a boundary between regions. A boundary has an extent of zero.
junction
A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions.
Regarding the distinction between a 'gene' and a 'gene allele':
Every zebrafish genome contains a 'gene allele' for every zebrafish gene. Many will be 'wild-type' or at least functional gene alleles. But some may be alleles that are mutated or truncated so as to lack functionality. According to current SO criteria defining genes, a 'gene' no longer exists in the case of a non-functional or deleted variant. But the 'gene allele' does exist - and its extent is that of the remaining/altered sequence based on alignment with a reference gene. Even for completely deleted genes, an allele of the gene exists (and here is equivalent to the junction corresponding to the where gene would live based on a reference alignment).
A gene is any 'gene allele' that produces a functional transcript (ie one capable of translation into a protein, or independent functioning as an RNA), when encoded in the genome of some cell or virion.
gene
A quantitative trait locus (QTL) is a polymorphic locus which contains alleles that differentially affect the expression of a continuously distributed phenotypic trait. Usually it is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci.
quantitative trait locus
QTL
An attribute to describe a region that was modified in vitro.
engineered
construct
engineered_region
An extended region of sequence corresponding to a defined feature that is a proper part of a chromosome, e.g. a chromosomal 'arm', 'region', or 'band'.
chromosomal feature
gross chromosomal part
chromosome part
A gene that has been transferred naturally or by any of a number of genetic engineering techniques into a cell or organism where it is foreign (i.e. does not belong to the host genome).
Transgenes can exist as integrated into the host genome, or extra-chromosomally on replicons or transiently carried/expressed vectors. What matters is that they are active in the context of a foreign biological system (typically a cell or organism).
Note that transgenes as defined here are not necessarily from a different taxon than that of the host genome. For example, a Mus musculus gene over-expressed from a chromosomally-integrated expression construct in a Mus musculus genome qualifies as a transgene because it is exogenous to the host genome.
transgene
A multiple nucleotide polymorphism with alleles of common length > 1, for example AAA/TTT.
multiple nucleotide polymorphism
SO:0001013
MNP
A variation that increases or decreases the copy number of a given region.
CNP
CNV
copy number polymorphism
copy number variation
SO:0001019
SOFA
http://en.wikipedia.org/wiki/Copy_number_variation
copy_number_variation
A collection of sequence features (typically a collection of chromosomes) that covers the sum genetic material within a cell or virion (where 'genetic material' refers to any nucleic acid that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or can be replicated and inherited by its progeny)
Genotype vs Genome in GENO: An (intrinsic) genotype is an information artifact representing an indirect syntax for specifying a genome sequence. This syntax has reference and variant components - a 'referrence genome' and 'genomic variation complement' - that must be operated on to resolve a specifie genome sequence. Specifically, the genome sequence is resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the 'reference genome'. So, while the total sequence content represented in a genotype may be greater than that in a genome, the intended resolution of these sequences is to arrive at a single genome sequence.
'genome sequence'
A genome is considered the complement of all heritable sequence features in a given cell or organism (chromosomal or extrachromosomal). This is typically a collection of chromosomes, but in some organisms (e.g. bacteria) it may be a single chromosomal entity. For this reason 'genome' classifies under 'sequence feature complement' rather than 'sequence feature collection'.
genome
A few examples highlighting the distinction of 'sequence alterations' from their parent 'variant allele':
1. Consider NM_000059.3(BRCA2):c.631G>A variation in the BRCA2 gene. This mutation of a single nucleotide creates a gene allele whose extent is that of the entire BRCA2 gene. This version of the full BRCA2 gene is a 'variant allele', while the extent of sequence spanning just the single altered base is a 'sequence alteration'. See https://www.ncbi.nlm.nih.gov/snp/80358871.
2. Consider the NM_000059.3(BRCA2):c.132_133ins8 variation in the BRCA2 gene. This 8 bp insertion creates a gene allele whose extent is that of the entire BRCA2 gene. This version of the full BRCA2 gene is a 'variant allele', while the extent of sequence spanning just the 8 bp insertion is a 'sequence alteration'. See https://www.ncbi.nlm.nih.gov/snp/483353112.
3. Consider the NM_000059.3(BRCA2):c.22_23delAG variation in the BRCA2 gene. This 2 bp deletion creates a gene allele whose extent is that of the entire BRCA2 gene. This version of the full BRCA2 gene is a 'variant allele', while the junction where the deletion occured is a 'sequence alteration' with an extent of zero. See https://www.ncbi.nlm.nih.gov/snp/483353112.
A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence.
sequence variation
SO:1000004
SO:1000007
SO:0001059
SOFA
1. A 'sequence alteration' is an allele whose sequence deviates in its entirety from that of other features found at the same genomic location (i.e. it deviates along its entire extent). In this sense, 'sequence alterations' represent the minimal extent an allele can take - i.e. that which is variable with some other feature along its entire sequence). An example is a SNP or insertion.
Alleles whose extent goes beyond the specific sequence that is known to be variable are not sequence alterations. These are alleles that represent alternate versions of some larger, named feature. The classic example here is a 'gene allele', which spans the extent of an entire gene, and contains one or more sequence alterations (regions known to vary) as part.
2. Sequence alterations are not necessarily 'variant' in the sense defined in GENO (i.e. being 'variant with' some reference sequence). In any comparison of alleles at a particular location, the choice of a 'reference' is context-dependent - as comparisons in other contexts might consider a different allele to be the reference. So while sequence alterations are usually considered 'variant' in the context in which they are considered, this variant status may not hold at all times. For this reason, the 'sequence alteration' class is not made an rdfs:subClassOf 'variant allele'.
For a particular instance of a sequence alteration, howver, we may in some cases be able to rdf:type it as a 'varaint allele' and a 'sequence alteration', in situations where we can be confident that the feature will *never* be considered a reference. For example, experimentally generated mutations in model organism genes that are created expressly to vary from an established reference.
3. Note that we consider novel features gained in a genome to be sequence alterations, including aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable genome of a cell or organism.
sequence_alteration
An insertion that derives from another organism, via the use of recombinant DNA technology.
transgenic insertion
SO:0001218
transgenic_insertion
A region which is the result of some arbitrary experimental procedure. The procedure may be carried out with biological material or inside a computer.
experimental_feature
A construct which is designed to integrate into a genome and produce a fusion transcript between exons of the gene into which it inserts and a reporter element in the construct. Gene traps contain a splice acceptor, do not contain promoter elements for the reporter, and are mutagenic. Gene traps may be bicistronic with the second cassette containing a promoter driving an a selectable marker.
gene_trap_construct
A construct which is designed to integrate into a genome and express a reporter when inserted in close proximity to a promoter element. Promoter traps typically do not contain promoter elements and are mutagenic.
promoter_trap_construct
A construct which is designed to integrate into a genome and express a reporter when the expression from a basic minimal promoter is enhanced by genomic enhancer elements. Enhancer traps contain promoter elements and are not usually mutagenic.
enhancer_trap_construct
SNVs are single base pair positions in genomic DNA at which different sequence alternatives exist.
single nucleotide variant
kareneilbeck
Thu Oct 08 11:37:49 PDT 2009
SO:0001483
SOFA
SNV
A biological_region characterized as a single heritable trait in a phenotype screen. The heritable phenotype may be mapped to a chromosome but generally has not been characterized to a specific gene locus.
heritable_phenotypic_marker
'GRCh37.p10' (a human reference genome build)
A genome sequence that is used as a standard against which other genome sequences are compared, or into which alterations are intentionally introduced.
reference genome sequence
A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.
copy number gain
gain
kareneilbeck
Mon Feb 28 01:54:09 PST 2011
SO:0001742
DBVAR
copy_number_gain
A sequence alteration whereby the copy number of a given region is less than the reference sequence.
copy number loss
loss
kareneilbeck
Mon Feb 28 01:55:02 PST 2011
SO:0001743
DBVAR
copy_number_loss
Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from one parent and no copies of the same chromosome or region from the other parent.
UPD
uniparental disomy
kareneilbeck
Mon Feb 28 02:01:05 PST 2011
SO:0001744
DBVAR
http:http\://en.wikipedia.org/wiki/Uniparental_disomy
UPD
Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the mother and no copies of the same chromosome or region from the father.
maternal uniparental disomy
kareneilbeck
Mon Feb 28 02:03:01 PST 2011
SO:0001745
maternal_uniparental_disomy
Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the father and no copies of the same chromosome or region from the mother.
paternal uniparental disomy
kareneilbeck
Mon Feb 28 02:03:30 PST 2011
SO:0001746
paternal_uniparental_disomy
A structural sequence alteration where there are multiple equally plausible explanations for the change.
complex
kareneilbeck
Wed Mar 23 03:21:19 PDT 2011
SO:0001784
DBVAR
complex_structural_alteration
kareneilbeck
Fri Mar 25 02:27:41 PDT 2011
SO:0001785
DBVAR
structural_alteration
Formerly http://purl.obolibrary.org/obo/GENO_0000067, replaced with SO term.
regulatory element
regulatory gene region
regulatory_region
Any change in genomic DNA caused by a single event.
SO:1000002
SOFA
substitution
When no simple or well defined DNA mutation event describes the observed DNA change, the keyword \"complex\" should be used. Usually there are multiple equally plausible explanations for the change.
complex substitution
SO:1000005
SOFA
complex_substitution
A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence.
point mutation
SO:1000008
SOFA
http://en.wikipedia.org/wiki/Point_mutation
point_mutation
Change of a pyrimidine nucleotide, C or T, into an other pyrimidine nucleotide, or change of a purine nucleotide, A or G, into an other purine nucleotide.
SO:1000009
transition
A substitution of a pyrimidine, C or T, for another pyrimidine.
pyrimidine transition
SO:1000010
pyrimidine_transition
A transition of a cytidine to a thymine.
C to T transition
SO:1000011
C_to_T_transition
The transition of cytidine to thymine occurring at a pCpG site as a consequence of the spontaneous deamination of 5'-methylcytidine.
C to T transition at pCpG site
SO:1000012
C_to_T_transition_at_pCpG_site
T to C transition
SO:1000013
T_to_C_transition
A substitution of a purine, A or G, for another purine.
purine transition
SO:1000014
purine_transition
A transition of an adenine to a guanine.
A to G transition
SO:1000015
A_to_G_transition
A transition of a guanine to an adenine.
G to A transition
SO:1000016
G_to_A_transition
Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G, or vice versa.
SO:1000017
http://en.wikipedia.org/wiki/Transversion
transversion
Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G.
pyrimidine to purine transversion
SO:1000018
pyrimidine_to_purine_transversion
A transversion from cytidine to adenine.
C to A transversion
SO:1000019
C_to_A_transversion
C to G transversion
SO:1000020
C_to_G_transversion
A transversion from T to A.
T to A transversion
SO:1000021
T_to_A_transversion
A transversion from T to G.
T to G transversion
SO:1000022
T_to_G_transversion
Change of a purine nucleotide, A or G , into a pyrimidine nucleotide C or T.
purine to pyrimidine transversion
SO:1000023
purine_to_pyrimidine_transversion
A transversion from adenine to cytidine.
A to C transversion
SO:1000024
A_to_C_transversion
A transversion from adenine to thymine.
A to T transversion
SO:1000025
A_to_T_transversion
A transversion from guanine to cytidine.
G to C transversion
SO:1000026
G_to_C_transversion
A transversion from guanine to thymine.
G to T transversion
SO:1000027
G_to_T_transversion
A sequence alteration which included an insertion and a deletion, affecting 2 or more bases.
SO:1000032
http://en.wikipedia.org/wiki/Indel
Indels can have a different number of bases than the corresponding reference sequence.
indel
One or more nucleotides are added between two adjacent nucleotides in the sequence; the inserted sequence derives from, or is identical in sequence to, nucleotides adjacent to insertion point.
nucleotide duplication
nucleotide_duplication
SO:1000035
duplication
A continuous nucleotide sequence is inverted in the same position.
inversion
SO:1000036
DBVAR
SOFA
inversion
A tandem duplication where the individual regions are in the same orientation.
direct tandem duplication
SO:1000039
direct_tandem_duplication
A tandem duplication where the individual regions are not in the same orientation.
inverted tandem duplication
mirror duplication
SO:1000040
inverted_tandem_duplication
A duplication consisting of 2 identical adjacent regions.
erverted
tandem duplication
SO:1000173
DBVAR
tandem_duplication
Stub class to serve as root of hierarchy for imports of developmental stages from Uberon or taxon specific vocabularies such as ZFIN stages terms)
life cycle stage
Stub class to serve as root of hierarchy for imports of anatomical entities from UBERON, CARO, or taxon-specific anatomy ontologies.
http://purl.obolibrary.org/obo/CARO_0000000
anatomical entity
Stub node that gathers root classes from various taxon-specific phenotype ontologies, as connectors to bringing classes from these ontolgies into the GENO framework.
1. From OGMS: A (combination of) quality(ies) of an organism determined by the interaction of its genetic make-up and environment that differentiates specific instances of a species from other instances of the same species (from OGMS, and used in OBI, but treatment as a quality is at odds with previous OBI discussions and their treatemnt of 'comparative phenotype assessment, where a phenotype is described as a quality or disposition)
2. From OBI calls: quality or disposition inheres in organism or part of an organism towards some growth environment
Phenotype
Animals exhibit variations compared to a given control.
'Variant' is the given label of the root class in the Worm Phenotype ontology. Renamng it here to be consisent with our hierarchy of phenotype classes.
Variant
c. elegans phenotype
worm phenotype
abnormal(ly) malformed endocardium cell
abnormal(ly) absent dorso-rostral cluster
abnormal(ly) disrupted diencephalon development
abnormal(ly) disrupted neutrophil aggregation
abnormal(ly) absent adaxial cell
association
Equivalent to: http://www.informatics.jax.org/marker/MGI:98297
mus musculus shh gene
http://zfin.org/ZDB-GENE-980526-166
danio rerio shha gene
http://zfin.org/ZDB-GENE-040123-1
danio rerio cdkn1ca gene
Equivalent to: http://www.ensembl.org/Gene/Summary?g=ENSG00000164690
Codes for: http://www.uniprot.org/uniprot/Q15465
homo sapiens SHH gene
exploratory term
exemplar term
Initially created such that integrated transgene infers as child of sequence_alteration.