GENO is an OWL model of genotypes, their more fundamental sequence components, and links to related biological and experimental entities. At present many parts of the model are exploratory and set to undergo refactoring. In addition, many classes and properties have GENO URIs but are place holders for classes that will be imported from an external ontology (e.g. SO, ChEBI, OBI, etc). Furthermore, ongoing work will implement a model of genotype-to-phenotype associations. This will support description of asserted and inferred relationships between a genotypes, phenotypes, and environments, and the evidence/provenance behind these associations.
Documentation is under development as well, and for now a slidedeck is available at http://www.slideshare.net/mhb120/brush-icbo-2013
Used to annotation axioms that define identity criteria for instances of a class.
is_identity_criteria
proabalistic_quantifier
begin
end
location
The reference is the resource that the position value is anchored to. For example, a contig or chromosome in a genome assembly.
reference
is part of
has part
A relation used to link sequence entities (sequences, features, qualified features, and collections thereof) to their 'attributes'.
Used in lieu of RO/BFO has_quality as this relation is definend to apply to independent contiinuant bearers, wheras sequence entities are generically dependent continuants.
http://purl.obolibrary.org/obo/so_has_quality
has_sequence_attribute
A relation between a material information bearer or material genetic sequence bearer and generically dependent continuant that carries information or sequence content that the bearer encodes
materializes
Shortcut relation expanding to bearer_of some (concretizes some . . . ), linking a material information bearer or sequence macromolecule to some ICE or GDC sequence.
bears_concretization_of
is_genotype_of
A relationship that holds between a biological entity and some level of genetic variation present in its genome.
This relation aims to be equally as broad/inclusive as RO:0002200 ! has_phenotype.
The biological entity can be an organism, a group of organism that share common genotype, or organism-derived entities such as cell lines or biospecimens. The genotype can be any of the various flavors of genotypes/allelotypes defined in GENO (intrinsic genotype, extrinsic genotype, effective genotype), or any genetic variation component of a genotype including variant alleles or sequence alterations.
has_genotype
An antisymmetric, irreflexive (normally transitive) relation between a whole and a distinct part (source: SIO)
No proper part relation anymore in RO/BFO?
http://semanticscience.org/resource/SIO_000053
has_proper_part
A relationship between an entity that carries a sequence (e.g. a sequence feature), and the sequence itself.
has_sequence_component
'Sequence' in the context of GENO is an abstract entity representing an ordered collection of monomeric units as carried in a biological macromolecule.
has_sequence
A geno:intrinnsic genotype 'specifies' a SO:genome.
A geno:karyotype 'specifies' a geno:karyotype feature collection.
A relationship between an information content entity representing a specification, and the entity it specifies.
obsolete_specifies
Created subproperties 'approximates_sequence' and 'resolves to sequence'. Genotypes and other sequence variant artifacts are not always expected to completely specify a sequence, but rather provide some approximation based on available knowledge. The 'resolves_to_sequence' property can be used when the sequence variant artifact is able to completely resolve a sequence, and the 'approximates_sequence' property can be used when it does not.
obsolete_approximates_sequence
Created subproperties 'approximates_sequence' and 'resolves to sequence'. Genotypes and other sequence variant artifacts are not always expected to completely specify a sequence, but rather provide some approximation based on available knowledge. The 'resolves_to_sequence' property can be used when the sequence variant artifact is able to completely resolve a sequence, and the 'approximates_sequence' property can be used when it does not.
obsolete_resolves_to_sequence
An asymmetric, irreflexive (normally transitive) relation between a part and its distinct whole.
http://semanticscience.org/resource/SIO_000093
is_proper_part_of
is_sequence_of
is_subject_of
obsolete_is_specified_by
shortcut relation used to link a phenotype directly to a genotype of an organism
is_phenotype_of_organism_with_genotype
is_phenotype_with_genotype
phenotype_has_genotype
Might expand to something like:
phenotype and (is_phenotype_of some (organism and (has_part some ('material genome' and (is_subject_of some (genome and (is_specified_by some genotype)))))))
obsolete_is_phenotype_of_genotype
A relation to link variant loci, phenotypes, or disease to the type of inheritance process they are involved in, based on how the genetic interactions between alleles at the causative locus determine the pattern of inheritance of a specific phenotype/disease from one generation to the next.
Exploratory/temporary property, as we formalize our phenotypic inheritance model.
obsolete_participates_in_inheritance_process
A relation between a sequence entity (i.e. a sequence, feature, or qualified feature) and a part of this entity that is variant in terms of its sequence, position, or expression.
has_variant_part
is_variant_part_of
A relation between a sequence entity (i.e. a sequence, feature, or qualified feature) and a part of this entity that is not variant.
has_reference_sequence_part
has_reference_part
is_reference_part_of
The allele instance <fgf8a^ti282a> is_allele_of the gene class 'danio rerio fgf8a'.
Note that the allele <fgf8a^ti282a> may not be an instance of the danio rerio fgf8a gene class, given that we adopt the SO definition of genes as 'producing a functional product'. If the <fgf8a^ti282a> allele is nonfunctional or null, it is an allele_of the danio rerio fgf8a gene class, but not an instance (rdf:type) of this class. It is, however, an instance of the 'danio rerio fgf8a gene locus' class, as being a 'gene locus' as defined in GENO requires only occupying the genomic position where for a gene, but not necessarily producing a functional product.
A relation linking an instance of a variable feature (aka an allele) to a class of genomic feature it is an instance of (typically a gene class).
Domain = allele
Range = punned gene class
In owl models where alleles are instances of gene classes, this relation links an owl:Individual to an owl:Class, and thus 'puns' the gene class IRI.
is_sequence_variant_of
is_allele_of
A relation used to link a variant locus instance to the gene class it is a variant of (in terms of its sequence or expression level).
is_variant_instance_of
formerly grouped is_allele_of and is_expression_variant_of proerpties under feature to class proeprty (now renmaed has_affected_locus)
Domain = genomic feature instance
Range = punned gene class IRI
obsolete_is_genetic_variant_of
A relation linking a gene class to a sequence-varaint or expression-variant of the gene.
has_variant_instance
formerly grouped has_allele and has_expression_variant proerpties under cllass to feature property (now renamed locus_affected_by)
Domain = punned gene class
Range = genomic feature
obsolete_has_genetic_variant
A relation linking a gene class to one of its sequence-variant loci/alleles.
Domain = punned gene class
Range = allele/variable locus
has_sequence_variant
has_allele
A relation between a gene targeting reagent (e.g. a morpholino or RNAi) and the class of gene it targets.
This is intended to be used as an instance-class relation, used for linking an instance of a gene targeting reagent to the class of gene whose instances it targets.
targets_gene
A relation that holds between an instance of a genotype or variant sequence feature or collection, and a genomic feature class (typically a gene) that is affected in its sequence or expression.
has_affected_locus
This is an organizational grouping class to collect all relations used to link instances of sequence features or qualified sequence features to genomic feature classes. For example, is_allele_of links a gene allele instance to its gene class (genes are represented as classes in our OWL model). Such links support phenotype propagation from alleles to genes for Monarch Initiative use cases. Use of these properties effectively puns gene class IRIs into owl:individuals in a given rdf datset.
has_affected_feature
A relation between an expression-variant gene (ie integrated transgenes or knockdown reagent targeted genes), and the class of gene it represents.
Domain = expression variant locus.
Range = punned gene class
This relation links an expression-variant gene instance (targeted or transgenic) to the class of gene that it preresents. For transient transgenes, this is the gene, the coding sequence need only to contain as part an expressed region from a given gene to stand in an is_expression_variant_of relation to the gene class.
is_expression_variant_of
A relation between a genomic feature class (typically a gene class) and an instance of a sequence feature or qualified sequence feature that represents or affects some change in the sequence or expression of the genomic feature.
class_to_feature_relation
This is an organizational grouping class to collect all relations used to link genokmic locus classes (typically genes) to instance of a genomic feature sequence feature or qualified sequence feature. For example, linking a gene class IRI to an instance of an allele of that gene class. Such links support phenotype propagation from features/variants to genes (e.g. for Monarch Initiative use cases)
is_locus_affected_by
A relation between a gene class and a gene targeting reagent that targets it.
is_target_of
Domain = punned gene class
Range = gene knockdown reagent
is_gene_target_of
A relation linking a gene class to one of an expression-variant of that gene..
Domain = punned gene class
Range = expression variant locus
has_expression_variant_instance
has_expression_variant
A relation between two sequence features at a given genomic locus that vary in their sequence or level of expression.
Decided there was no need for a contrasting is_expression_variant_with property, so removed it and this parent grouping property.
This proeprty is most commonly used to relate two different alleles of a given gene. It is not a relation between an allele and the gene it is a variant of.
obsolete_is_variant_with
A relation between two instances of a given gene that vary in their level of expression as a result of external factors influencing expression (e.g. gnee-knockdown reagents, epigenetic modification, alteration of endogenous gene-regulation pathways).
obsolete_is_expression_variant_with
A relation used to describe a context or conditions that define and/or identify an entity.
Used in Monarch Data to link associations to qualifying contexts (e.g. environments or developmental stages) where the association applies. For example, a qualifying environment represents a context where genotype-phenotype associations apply - where the environment is an identity criteria for the association.
Used in GENO to describe physical context of materialized sequence features that represent identifying criteria for instances of qualified sequence features.
has_qualifying_context
has_qualifier
a relation to link a single locus complement to its zygosity.
has_zygosity
A relationship between a reference locus/allele and the gene class it is an allele of.
is_reference_allele_of
Consider obsoleting - it is likely sufficeint to use the parent has_sequence_attribute property - a separate proeprty to link to the staining intensity attribute is not really needed.
has_color_value
Used to link a gross chromosomal sequence feature (chromosome part) to a color value quality that inheres in the sequence feature in virtue of the staining pattern of the chromosomal DNA in which the sequence is materialized.
has_staining_intensity
Used to link a gene targeting reagent such as a morpholino, to an instance of a reagent targeted gene variant.
relation between an molecular agent and its molecular target
is_targeted_by
1. Used to specify derivation of transgene components from a gene class, or a engineered construct instance.
2. Used to specify the strain of origin of an allele (i.e. that an allele was originally isolated from a specific background strain, and propagated into new genetic backgrounds.
3. Used to indicate derivation of a variant mouse genotype from an ES cell line used in generating the modified mice (IMPC)
Relationship between a sequence feature and a distinct, non-overlapping feature from which it derives part or all of its sequence.
sequence_derives_from
A relationship between a variant locus/allele and the gene class it is an allele of.
is_variant_allele_of
Relationship between a sex-qualified genotype and intrinsic genotype, created specifically to support propagation of phenotypes asserted on the former to the later for Monarch Initiative use cases.
has_sex_agnostic_part
A relation between a mutant locus/allele (ie rare variant present in less than 1% of a population, or an experimentally-altered variant such as a knocked-out gene in a model organism), and the gene it is a variant of.
is_mutant_allele_of
A relationship between a polymorphic locus/allele and the gene class it is an allele of.
is_polymorphic_allele_of
A relationship between a wild-type locus/allele and the gene class it is an allele of.
is_wild_type_allele_of
An organizational class to hold relations of parthood between sequences/features.
has_sequence_part
is_sequence_part_of
Relationship between an intrinsic genotype and a sex-qualified genotype, created specifically to support propagation of phenotypes asserted on the latter to the former for Monarch Initiative use cases.
is_sex_agnostic_part_of
A relation between two sequence features at a particular genomic locus that vary in their sequence (in whole or in part).
is_sequence_variant_with
This property is most commonly used to relate two different alleles of a given gene (e.g. a wt and mutant instance of the BRCA2 gene). It is not a relation between an allele and the class-level gene it is a variant of (for this use is_allele_of)
is_variant_with
organizational property to hold imports from faldo.
faldo properties
A relation linking a qualified sequence feature to its component sequence feature.
has_sequence_feature_component
In GENO we define three levels of sequence artifacts: (1) biological sequences, (2) sequence features, and (3) qualified sequence features. The identify criteria for a 'biological sequence' include only its inherent sequence (the ordered string of units that comprise it). The identity criteria for a 'sequence feature' include its sequence and position (where it resides - i.e. its location based on how it maps to a reference or standard) The identity criteria for a 'qualified sequence feature' include its component sequence feature (defined by its sequence and position), and the material context of its bearer in a cell or organism. This context can include direct epigenetic modification, or being targeted by gene knockdown reagents such as morpholinos or RNAi, or being transiently overexpressed from a transgenic construct in a cell or organism.
has_sequence_feature
has_inferred_phenotype
Property chain to propagate inferred phenotype associations 'up' a genotype partonomy in the direction of sequence alteration -> VL -> VSLC -> GVC -> genotype.
Property chain to propagate inferred phenotype associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype.
Property chain to propagate inferred phenotype associations 'down' a genotype partonomy in the direction of genotype -> GVC -> VSLC -> VL -> sequence alteration.
Property chain to propagate inferred phenotype associations from an intrinsic genotype component (e.g. a (sequence-)variant locus instance) to a gene class.
Property chain to propagate inferred phenotype associations from a (sequence-)variant locus instance to a gene class (to support cases where the phenotype association is made at the level of the variant gene locus).
Property chain to propagate inferred phenotype associations from an extrinnsic genotype component (e.g. a expression-variant gene instance) to a gene class.
Property chain to propagate inferred phenotype associations from an expression-variant gene instance to a gene class (to support cases where the phenotype association is made at the level of the expression-variant gene).
Property chain to propagate inferred phenotype associations 'down' a genotype partonomy just from a sex-qualified intrinsic genotype to the immediate sex-agnostic intrinsic genotype. (An additional property chain is needed to then propagate to the intrinsic genotype components)
Proposal for a property linking variants to smaller components that are regulatory, and therefore should not inherit phenotypes.
obsolete_has_regulatory_part
A relation linking a sequence_alteration to the gene it alters.
is_within_allele_of
obsolete_is_alteration_within
has_asserted_phenotype
Proposal for a property linking regulatory elements to larger features of which they are a part.
is_regulatory_part_of
A relation linking a sequence feature to its component Position that represents an identifying criteria for sequence feature instances.
For representing positional data, we advocate use of the FALDO model, which links to positional information through an instance of a Region class that represents the mapping of the feature onto some reference sequence. The positional_component property in GENO is meant primarily to formalize the identity criteria or sequence features and qualified sequence features, to illustrate the distinction between them.
obsolete_has_position_component
A relation between a nucleic acid or amino acid sequence or sequence feature, and one of its monomeric units (nucleotide or amino acid residues)
has_sequence_unit
A relation between two seqeunces or features that are considered variant with each other along their entire extents.
is_completely_variant_with
has_condition
Note that we currently do not have a property chain to propagate phenotypes to genes across sequence_derives_from relation (e.g. in cases where a Tg insertion derives expressed sequence from some gene)
The property chains below are defined as explicitly as possible, but many could be shortened if we used the inferred_to_cause_condition property to construct the property chains. Where this is the case, it is noted in the annotations on the property chains.
Below are the different kinds/paths of propagation we desire:
1. Propagation 'down' a genotype (from larger components to smaller ones)
2. Propagation 'up' a genotype (from smaller components to larger ones)
3. From sex-qualified genotypes down to the sex-agnostic genotype and its components (but not 'up' to a sex-qualified genotype).
4. From an effective genotype to its intrinsic and extrinsic components.
5. From genotype components to genes (note here that a separate chain is needed to propagate conditions asserted on a sequence alteration to the gene, because of the fact that the link to the gene is from the variant locus/allele).
6. (Exploratory). There are cases where we may also want inter-genotype propagation (i.e. propagation that extends beyond moving up or down a single genotype). For example, if a phenotype is asserted on a sex-qualified intrinsic genotype, we want it to infer down through its component sex-agnostic intrinsic genotype and then up to any effective genotypes of which this sex-agnostic intrinsic genotype is a part. Given the data in hand, however, the conditions for this will likely never occur, so probably ok not to implement a chain to support this.
Note that we do not want to propagate phenotypes up from sex-agnostic genotyeps to sex-qualified ones (e.g.from shha<tbx392>/shha<tbx392> [AB] to shha<tbx392>/shha<tbx392> [AB](male)) - because it may not be the case that a phenotype assessed without consideratioon to sex will apply on a sex-specific background. So we would not create a property chain to propagate inferred condition associations from sex-agnaostic intrinsic genotypes and their parts to sex-qualified intrinsic genotypes and effective genotypes that contain them (such as: has_variant_part o has_sex_agnostic_part o has_variant_part o 'causes condition')
inferred_to_cause_condition
This is a case of inter-gneotype phenotype propagation, requiring propagation down oen genotype and then up another. Given the data in hand, however, the conditions for this will likely never occur, so probably ok not to have this chain.
This property chain propagates a phenotype asserted on a sex-qualified intrinsic genotype, down to its sex-agnostic genotype part, and then up to a parent effective genotype that has it as a variant part. I think this is OK in all cases, so we can implement this as the one case where we can have inter-genotype pheno propagation. But as noted, there will likely be no data that actually meets criteria to use this chain, so we can probably leave it out.
Property chain to propagate inferred condition associations 'up' a genotype partonomy in the direction of sequence alteration -> VL -> VSLC -> GVC -> genotype.
Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype, through a sex-agnostic intrinsic genotype, to the coompnent variant parts of this sex-agnostic genotype.
Property chain to propagate inferred condition associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. This chain in particuular is needed to get the conditions to move past the sex-agnostic genotype and down to its parts.
The following shorter chain would also suffice here:
is_variant_part_of o inferred_to_cause_condition
Property chain to propagate inferred condition associations 'down' a genotype partonomy in the direction of genotype -> GVC -> VSLC -> VL -> sequence alteration.
Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype, through a sex-agnostic intrinsic genotype, through the coompnent variant parts of this sex-agnostic genotype, and to the affected gene.
Property chain to propagate inferred condition associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. This chain in particuular is needed to get the conditions to propagate to genes.
The shorter chain below would also suffice for this propagation:
has_allele o inferred_to_cause_condition
Property chain to propagate inferred condition associations from an sequence alteration through the variant locus to a gene class. (separate chains are needed to propagate from the variant locus to the gene class, and another to propagate from a genotype, GVC, or VSLC to the gene class).
NOTE that i dont need this property chain if I have a property chain to infer a has_affected_locus link from a sequence alteration to a gene when the link is asserted from the variant locus to the gene:
is_variant_part_of o has_affected_locus --> has_affected_locus
Obsolete comment: Property chain to propagate inferred condition associations from an intrinsic genotype, GC, or VLSC to a gene class. (a separate chain is needed to propagate from the variant locus to the gene class, and another to propagate from a sequence alteration to the gene class).
The following, shorter chain, would also suffice here:
has_allele o inferred_to_cause_condition -> inferred_to_cause_condition
Property chain to propagate inferred condition associations from an intrinsic genotype, GVC, or VLSC to an affected gene class, or from an extrinsic gneotype or component to an affected gene class.
The following, shorter chain, would also suffice here:
has_affected_locus o inferred_to_cause_condition -> inferred_to_cause_condition
Note that a separate chain is needed to propagate from the variant locus to the gene class, and another to propagate from a sequence alteration to the gene class in cases where the link to gene is through the variant locus rather than the seq alteration).
Property chain to propagate inferred condition associations from a variant locus instance to a gene class (to support cases where the phenotype association is made directly at the level of the variant locus/allele).
Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype to a sex-agnostic intrinsic genotype.
Property chain to propagate inferred condition associations 'down' a genotype partonomy just from a sex-qualified intrinsic genotype to the immediate sex-agnostic intrinsic genotype. (An additional property chain is needed to then propagate to the intrinsic genotype components)
inferred_to_contribute_to_condition
inferred_to_correlate_with_condition
LOINC:LA6668-3
pathogenic_for_condition
LOINC:LA26332-9
likely_pathogenic_for_condition
Relation between an entity and a condition (disease, phenotype) which it does not cause or contribute to.
non-causal_for_condition
LOINC:LA6675-8
benign_for_condition
LOINC:LA26334-5
likely_benign_for_condition
LOINC:LA26333-7
has_uncertain_significance_for_condition
A relation used to describe a process contextualizing the identity of an entity.
has_qualifying_process
A relation used to describe an environment contextualizing the identity of an entity.
has_qualifying_environment
is_candidate_variant_for
is_about is a (currently) primitive relation that relates an information artifact to an entity.
is about
A relation between a planned process and a continuant participating in that process that is not created during the process. The presence of the continuant during the process is explicitly specified in the plan specification which the process realizes the concretization of.
has_specified_input
A relation between a planned process and a continuant participating in that process. The presence of the continuant at the end of the process is explicitly specified in the objective specification which the process realizes the concretization of.
has_specified_output
a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence
inheres_in
a relation between an independent continuant (the bearer) and a specifically dependent continuant (the dependent), in which the dependent specifically depends on the bearer for its existence
bearer of
a relation between a continuant and a process, in which the continuant is somehow involved in the process
participates in
a relation between a process and a continuant, in which the continuant is somehow involved in the process
has participant
A journal article is an information artifact that inheres in some number of printed journals. For each copy of the printed journal there is some quality that carries the journal article, such as a pattern of ink. The quality (a specifically dependent continuant) concretizes the journal article (a generically dependent continuant), and both depend on that copy of the printed journal (an independent continuant).
A relationship between a specifically dependent continuant and a generically dependent continuant, in which the generically dependent continuant depends on some independent continuant in virtue of the fact that the specifically dependent continuant also depends on that same independent continuant. Multiple specifically dependent continuants can concretize the same generically dependent continuant.
concretizes
a relation between an independent continuant (the bearer) and a quality, in which the quality specifically depends on the bearer for its existence
has quality
has_role
a relation between an independent continuant (the bearer) and a disposition, in which the disposition specifically depends on the bearer for its existence
has disposition
derives from
starts during
ends during
x overlaps y if and only if there exists some z such that x has part z and z part of y
overlaps
x is in taxon y if an only if y is an organism, and the relationship between x and y is one of: part of (reflexive), developmentally preceded by, derives from, secreted by, expressed.
in taxon
A relationship that holds between a biological entity and a phenotype. Here a phenotype is construed broadly as any kind of quality of an organism part, a collection of these qualities, or a change in quality or qualities (e.g. abnormally increased temperature). The subject of this relationship can be an organism (where the organism has the phenotype, i.e. the qualities inhere in parts of this organism), a genomic entity such as a gene or genotype (if modifications of the gene or the genotype causes the phenotype), or a condition such as a disease (such that if the condition inheres in an organism, then the organism has the phenotype).
has phenotype
phenotype of
temporally related to
p has direct input c iff c is a participant in p, c is present at the start of p, and the state of c is modified during p.
has input
p has output c iff c is a participant in p, c is present at the end of p, and c is not present at the beginning of p.
has output
is member of
Example 1: a collection of sequences such as a genome being comprised of separate sequences of chromosomes
Example 2: a collection of information entities such as a genotype being comprised of a background component and a variant component
has member is a mereological relation between a collection and an item.
has member
input of
output of
obsolete_formed as result of
Holds between molecular entities a and b when the execution of a activates or inhibits the activity of b
molecularly controls
x has subsequence y iff all of the sequence parts of x are sequence parts of y
has subsequence
is subsequence of
inverse of downstream of sequence of
is upstream of sequence of
x is downstream of the sequence of y iff either (1) x and y have sequence units, and all units of x are downstream of all units of y, or (2) x and y are sequence units, and x is either immediately downstream of y, or transitively downstream of y.
is downstream of sequence of
Relation between a research artifact and an entity it is used to study, in virtue of its replicating or approximating features of the studied entity.
To Do: decide on scope of this relation - inclusive of computational models in domain, or only physical models? Restricted to linking biological systems and phenomena? Inclusive of only diseases in range, or broader?
Matthew Brush
The driving use case for this relation was to link a biological model system such as a cell line or model organism to a disease it is used to investigate, in virtue of the model system exhibiting features similar to that of the disease of interest.
is model of
The genetic variant 'NM_007294.3(BRCA1):c.110C>A (p.Thr37Lys)' casues or contributes to the disease 'familial breast-ovarian cancer'.
An environment of exposure to arsenic causes or contributes to the phenotype of patchy skin hyperpigmentation, and the disease 'skin cancer'.
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has some causal or contributing role that influences the condition.
Note that relationships of phenotypes to organisms/strains that bear them, or diseases they are manifest in, should continue to use RO:0002200 ! 'has phenotype' and RO:0002201 ! 'phenotype of'.
Genetic variations can span any level of granularity from a full genome or genotype to an individual gene or sequence alteration. These variations can be represented at the physical level (DNA/RNA macromolecules or their parts, as in the ChEBI ontology and Molecular Sequence Ontology) or at the abstract level (generically dependent continuant sequence features that are carried by these macromolecules, as in the Sequence Ontology and Genotype Ontology). The causal relations in this hierarchy can be used in linking either physical or abstract genetic variations to phenotypes or diseases they cause or contribute to.
Environments include natural environments or exposures, experimentally applied conditions, or clinical interventions.
causes or contributes to condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has a causal role for the condition.
causes condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has some contributing role in the manifestation of the condition.
contributes to condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity influences the severity with which a condition manifests in an individual.
contributes to expressivity of condition
contributes to severity of condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity influences the frequency of the condition in a population.
contributes to penetrance of condition
contributes to frequency of condition
A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity prevents or reduces the severity of a condition.
Genetic variations can span any level of granularity from a full genome or genotype to an individual gene or sequence alteration. These variations can be represented at the physical level (DNA/RNA macromolecules or their parts, as in the ChEBI ontology and Molecular Sequence Ontology) or at the abstract level (generically dependent continuant sequence features that are carried by these macromolecules, as in the Sequence Ontology and Genotype Ontology). The causal relations in this hierarchy can be used in linking either physical or abstract genetic variations to phenotypes or diseases they cause or contribute to.
Environments include natural environments or exposures, experimentally applied conditions, or clinical interventions.
is preventative for condition
A relationship between an entity and a condition (phenotype or disease) with which it exhibits a statistical dependence relationship.
correlated with condition
association has object
association has predicate
association has subject
The position value is the offset along the reference where this position is found. Thus the only the position value in combination with the reference determines where a position is.
position
Property linking a sequence or sequence feature to an integer representing its length iin terms of the number of units in the sequence.
has_extent
Property linking a sequence to a string representing the ordered units that comprise the sequence (e.g. 'atgcagctagctaccgtcgatcg').
has_sequence_string
ObsoleteDataProperty
The 'rank' quantifier in Bgee gene-anatomy associations, that indicates the imporatnace/specificity of a gene expression in a given anatommy relative to expressionin other anatomies for the same gene.
Property to link an assertion or association with some value quantifying its relevance or ranking.
has_quantifier
The starting position of a sequence region in 0-based coordinates.
ClinGen Allele Model (http://datamodel.clinicalgenome.org/allele/)
start_position
The ending position of a sequence region in 0-based coordinates.
ClinGen Allele Model (http://datamodel.clinicalgenome.org/allele/)
end_position
Both strands
A position that is exactly known.
Exact position
Positive strand
Superclass for the general concept of a position on a sequence. The sequence is designated with the reference predicate.
Position
1
1
A region describes a length of sequence with a start position and end position that represents a feature on a sequence, e.g. a gene
Region
Negative strand
Part of the coordinate system denoting on which strand the feature can be found. If you do not yet know which stand the feature is on, you should tag the position with just this class. If you know more you should use one of the subclasses. This means a region described with a '.' in GFF3. A GFF3 unstranded position does not have this type in FALDO -- those are just a 'position'.
Stranded position
Julius Caesar
Verdi’s Requiem
the Second World War
your body mass index
BFO 2 Reference: In all areas of empirical inquiry we encounter general terms of two sorts. First are general terms which refer to universals or types:animaltuberculosissurgical procedurediseaseSecond, are general terms used to refer to groups of entities which instantiate a given universal but do not correspond to the extension of any subuniversal of that universal because there is nothing intrinsic to the entities in question by virtue of which they – and only they – are counted as belonging to the given group. Examples are: animal purchased by the Emperortuberculosis diagnosed on a Wednesdaysurgical procedure performed on a patient from Stockholmperson identified as candidate for clinical trial #2056-555person who is signatory of Form 656-PPVpainting by Leonardo da VinciSuch terms, which represent what are called ‘specializations’ in [81
Entity doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example Werner Ceusters 'portions of reality' include 4 sorts, entities (as BFO construes them), universals, configurations, and relations. It is an open question as to whether entities as construed in BFO will at some point also include these other portions of reality. See, for example, 'How to track absolutely everything' at http://www.referent-tracking.com/_RTU/papers/CeustersICbookRevised.pdf
An entity is anything that exists or has existed or will exist. (axiom label in BFO2 Reference: [001-001])
entity
BFO 2 Reference: Continuant entities are entities which can be sliced to yield parts only along the spatial dimension, yielding for example the parts of your table which we call its legs, its top, its nails. ‘My desk stretches from the window to the door. It has spatial parts, and can be sliced (in space) in two. With respect to time, however, a thing is a continuant.’ [60, p. 240
Continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example, in an expansion involving bringing in some of Ceuster's other portions of reality, questions are raised as to whether universals are continuants
A continuant is an entity that persists, endures, or continues to exist through time while maintaining its identity. (axiom label in BFO2 Reference: [008-002])
continuant
continuant
BFO 2 Reference: every occurrent that is not a temporal or spatiotemporal region is s-dependent on some independent continuant that is not a spatial region
BFO 2 Reference: s-dependence obtains between every process and its participants in the sense that, as a matter of necessity, this process could not have existed unless these or those participants existed also. A process may have a succession of participants at different phases of its unfolding. Thus there may be different players on the field at different times during the course of a football game; but the process which is the entire game s-depends_on all of these players nonetheless. Some temporal parts of this process will s-depend_on on only some of the players.
Occurrent doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. An example would be the sum of a process and the process boundary of another process.
Simons uses different terminology for relations of occurrents to regions: Denote the spatio-temporal location of a given occurrent e by 'spn[e]' and call this region its span. We may say an occurrent is at its span, in any larger region, and covers any smaller region. Now suppose we have fixed a frame of reference so that we can speak not merely of spatio-temporal but also of spatial regions (places) and temporal regions (times). The spread of an occurrent, (relative to a frame of reference) is the space it exactly occupies, and its spell is likewise the time it exactly occupies. We write 'spr[e]' and `spl[e]' respectively for the spread and spell of e, omitting mention of the frame.
An occurrent is an entity that unfolds itself in time or it is the instantaneous boundary of such an entity (for example a beginning or an ending) or it is a temporal or spatiotemporal region which such an entity occupies_temporal_region or occupies_spatiotemporal_region. (axiom label in BFO2 Reference: [077-002])
occurrent
occurrent
a chair
a heart
a leg
a molecule
a spatial region
an atom
an orchestra.
an organism
the bottom right portion of a human torso
the interior of your mouth
b is an independent continuant = Def. b is a continuant which is such that there is no c and no t such that b s-depends_on c at t. (axiom label in BFO2 Reference: [017-002])
independent continuant
independent continuant
a process of cell-division, \ a beating of the heart
a process of meiosis
a process of sleeping
the course of a disease
the flight of a bird
the life of an organism
your process of aging.
p is a process = Def. p is an occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t. (axiom label in BFO2 Reference: [083-003])
BFO 2 Reference: The realm of occurrents is less pervasively marked by the presence of natural units than is the case in the realm of independent continuants. Thus there is here no counterpart of ‘object’. In BFO 1.0 ‘process’ served as such a counterpart. In BFO 2.0 ‘process’ is, rather, the occurrent counterpart of ‘material entity’. Those natural – as contrasted with engineered, which here means: deliberately executed – units which do exist in the realm of occurrents are typically either parasitic on the existence of natural units on the continuant side, or they are fiat in nature. Thus we can count lives; we can count football games; we can count chemical reactions performed in experiments or in chemical manufacturing. We cannot count the processes taking place, for instance, in an episode of insect mating behavior.Even where natural units are identifiable, for example cycles in a cyclical process such as the beating of a heart or an organism’s sleep/wake cycle, the processes in question form a sequence with no discontinuities (temporal gaps) of the sort that we find for instance where billiard balls or zebrafish or planets are separated by clear spatial gaps. Lives of organisms are process units, but they too unfold in a continuous series from other, prior processes such as fertilization, and they unfold in turn in continuous series of post-life processes such as post-mortem decay. Clear examples of boundaries of processes are almost always of the fiat sort (midnight, a time of death as declared in an operating theater or on a death certificate, the initiation of a state of war)
process
process
an atom of element X has the disposition to decay to an atom of element Y
certain people have a predisposition to colon cancer
children are innately disposed to categorize objects in certain ways.
the cell wall is disposed to filter chemicals in endocitosis and exocitosis
BFO 2 Reference: Dispositions exist along a strength continuum. Weaker forms of disposition are realized in only a fraction of triggering cases. These forms occur in a significant number of cases of a similar type [89
b is a disposition means: b is a realizable entity & b’s bearer is some material entity & b is such that if it ceases to exist, then its bearer is physically changed, & b’s realization occurs when and because this bearer is in some special physical circumstances, & this realization occurs in virtue of the bearer’s physical make-up. (axiom label in BFO2 Reference: [062-002])
disposition
disposition
the disposition of this piece of metal to conduct electricity.
the disposition of your blood to coagulate
the function of your reproductive organs
the role of being a doctor
the role of this boundary to delineate where Utah and Colorado meet
To say that b is a realizable entity is to say that b is a specifically dependent continuant that inheres in some independent continuant which is not a spatial region and is of a type instances of which are realized in processes of a correlated type. (axiom label in BFO2 Reference: [058-002])
realizable entity
realizable entity
the ambient temperature of this portion of air
the color of a tomato
the length of the circumference of your waist
the mass of this piece of gold.
the shape of your nose
the shape of your nostril
a quality is a specifically dependent continuant that, in contrast to roles and dispositions, does not require any further process in order to be realized. (axiom label in BFO2 Reference: [055-001])
quality
quality
Reciprocal specifically dependent continuants: the function of this key to open this lock and the mutually dependent disposition of this lock: to be opened by this key
of one-sided specifically dependent continuants: the mass of this tomato
of relational dependent continuants (multiple bearers): John’s love for Mary, the ownership relation between John and this statue, the relation of authority between John and his subordinates.
the disposition of this fish to decay
the function of this heart: to pump blood
the mutual dependence of proton donors and acceptors in chemical reactions [79
the mutual dependence of the role predator and the role prey as played by two organisms in a given interaction
the pink color of a medium rare piece of grilled filet mignon at its center
the role of being a doctor
the shape of this hole.
the smell of this portion of mozzarella
b is a relational specifically dependent continuant = Def. b is a specifically dependent continuant and there are n > 1 independent continuants c1, … cn which are not spatial regions are such that for all 1 i < j n, ci and cj share no common parts, are such that for each 1 i n, b s-depends_on ci at every time t during the course of b’s existence (axiom label in BFO2 Reference: [131-004])
b is a specifically dependent continuant = Def. b is a continuant & there is some independent continuant c which is not a spatial region and which is such that b s-depends_on c at every time t during the course of b’s existence. (axiom label in BFO2 Reference: [050-003])
Specifically dependent continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. We're not sure what else will develop here, but for example there are questions such as what are promises, obligation, etc.
specifically dependent continuant
specifically dependent continuant
John’s role of husband to Mary is dependent on Mary’s role of wife to John, and both are dependent on the object aggregate comprising John and Mary as member parts joined together through the relational quality of being married.
the priest role
the role of a boundary to demarcate two neighboring administrative territories
the role of a building in serving as a military target
the role of a stone in marking a property boundary
the role of subject in a clinical trial
the student role
BFO 2 Reference: One major family of examples of non-rigid universals involves roles, and ontologies developed for corresponding administrative purposes may consist entirely of representatives of entities of this sort. Thus ‘professor’, defined as follows,b instance_of professor at t =Def. there is some c, c instance_of professor role & c inheres_in b at t.denotes a non-rigid universal and so also do ‘nurse’, ‘student’, ‘colonel’, ‘taxpayer’, and so forth. (These terms are all, in the jargon of philosophy, phase sortals.) By using role terms in definitions, we can create a BFO conformant treatment of such entities drawing on the fact that, while an instance of professor may be simultaneously an instance of trade union member, no instance of the type professor role is also (at any time) an instance of the type trade union member role (any more than any instance of the type color is at any time an instance of the type length).If an ontology of employment positions should be defined in terms of roles following the above pattern, this enables the ontology to do justice to the fact that individuals instantiate the corresponding universals – professor, sergeant, nurse – only during certain phases in their lives.
b is a role means: b is a realizable entity and b exists because there is some single bearer that is in some special physical, social, or institutional set of circumstances in which this bearer does not have to be and b is not such that, if it ceases to exist, then the physical make-up of the bearer is thereby changed. (axiom label in BFO2 Reference: [061-001])
role
role
The entries in your database are patterns instantiated as quality instances in your hard drive. The database itself is an aggregate of such patterns. When you create the database you create a particular instance of the generically dependent continuant type database. Each entry in the database is an instance of the generically dependent continuant type IAO: information content entity.
the pdf file on your laptop, the pdf file that is a copy thereof on my laptop
the sequence of this protein molecule; the sequence that is a copy thereof in that protein molecule.
b is a generically dependent continuant = Def. b is a continuant that g-depends_on one or more other entities. (axiom label in BFO2 Reference: [074-001])
generically dependent continuant
generically dependent continuant
the function of a hammer to drive in nails
the function of a heart pacemaker to regulate the beating of a heart through electricity
the function of amylase in saliva to break down starch into sugar
BFO 2 Reference: In the past, we have distinguished two varieties of function, artifactual function and biological function. These are not asserted subtypes of BFO:function however, since the same function – for example: to pump, to transport – can exist both in artifacts and in biological entities. The asserted subtypes of function that would be needed in order to yield a separate monoheirarchy are not artifactual function, biological function, etc., but rather transporting function, pumping function, etc.
A function is a disposition that exists in virtue of the bearer’s physical make-up and this physical make-up is something the bearer possesses because it came into being, either through evolution (in the case of natural biological entities) or through intentional design (in the case of artifacts), in order to realize processes of a certain sort. (axiom label in BFO2 Reference: [064-001])
function
a flame
a forest fire
a human being
a hurricane
a photon
a puff of smoke
a sea wave
a tornado
an aggregate of human beings.
an energy wave
an epidemic
the undetached arm of a human being
BFO 2 Reference: Material entities (continuants) can preserve their identity even while gaining and losing material parts. Continuants are contrasted with occurrents, which unfold themselves in successive temporal parts or phases [60
BFO 2 Reference: Object, Fiat Object Part and Object Aggregate are not intended to be exhaustive of Material Entity. Users are invited to propose new subcategories of Material Entity.
BFO 2 Reference: ‘Matter’ is intended to encompass both mass and energy (we will address the ontological treatment of portions of energy in a later version of BFO). A portion of matter is anything that includes elementary particles among its proper or improper parts: quarks and leptons, including electrons, as the smallest particles thus far discovered; baryons (including protons and neutrons) at a higher level of granularity; atoms and molecules at still higher levels, forming the cells, organs, organisms and other material entities studied by biologists, the portions of rock studied by geologists, the fossils studied by paleontologists, and so on.Material entities are three-dimensional entities (entities extended in three spatial dimensions), as contrasted with the processes in which they participate, which are four-dimensional entities (entities extended also along the dimension of time).According to the FMA, material entities may have immaterial entities as parts – including the entities identified below as sites; for example the interior (or ‘lumen’) of your small intestine is a part of your body. BFO 2.0 embodies a decision to follow the FMA here.
A material entity is an independent continuant that has some portion of matter as proper or improper continuant part. (axiom label in BFO2 Reference: [019-002])
material entity
material entity
Stub class to serve as root of hierarchy for imports of molecular entities from ChEBI ontology.
molecular entity
nucleic acid
A cultured cell population that represents a genetically stable and homogenous population of cultured cells that shares a common propagation history (i.e. has been successively passaged together in culture).
cell line
Stub class to serve as root of hierarchy for imports of cell types from CL or other cell terminologies.
cell
1. Stub class to serve as root of hierarchy for imports from an ontology of environment and experimental conditions.
2. Need to consdier how to model environments in a way that covers ENVO and XCO content in a consistent and coherent way. A couple classes under Exploratory Class are relvant here. Consider how we might approach environments/condisitons using an EQ aproach analogous to how phenotypes are defined (i.e. consider environments/coonditions as qualities inhereing in some entity).
In ENVO's alignment with the Basic Formal Ontology, this class is being considered as a subclass of a proposed BFO class "system". The relation "environed_by" is also under development. Roughly, a system which includes a material entity (at least partially) within its site and causally influences that entity may be considered to environ it. Following the completion of this alignment, this class' definition and the definitions of its subclasses will be revised.
environmental system
A technique is a planned process used to accomplish a specific activity or task.
technique
A stem cell line comprised of embryonic stem cells, totipotent cells cultured from an early embryo.
embryonic stem cell line
A cell line comprised of stem cells,relatively undifferentiated cells that retain the ability to divide and proliferate provide progenitor cells that can differentiate into specialized cell types.
stem cell line
Example zebrafish intrinsic genotype:
Genotype = fgf8a<ti282a/+>; shha<tb392/tb392> (AB)
reference component (genomic background) = AB
variant component ('genomic variation complement') = fgf8a<ti282a/+>; shha<tb392/tb392>
. . . and within this variant component, there are two 'variant single locus complements' represented:
allele complement 1 = fgf8a<ti282a/+>
allele complement 2 = shha<tb392/tb392>
and within each of these 'variant single locus complements' there is one or more variant gene locus member:
in complement 1: fgf8a<ti282a>
in complement 2: shha<ttb392>
An intrinsic genotype that does not specify the sex determining chromosomal features of its bearer (i.e. does not indicate the background sex chromosome complement)
This modeling approach allows use to create separate genotype instances for data sources that report sex-specific phenotypes to ensure that sex-specific G2P differences are accurately described. These sex-qualified genotypes can be linked to the more general sex-agnostic intrinsic genotype that is shared by make and female mice of the same strain, to aggregate associated phenotypes at this level, and allow aggregation with G2P association data about the same strains from sources that distinguish sex-specific phenotypes (e.g. IMPC) and those that do not (e.g. MGI).
Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome. Thus, in the genotype partonomy, a sex-qualified genotype has as part a sex-agnostic genotype. This allows for the propagation of phenotypes associated with a sex-qualified genotype to the intrinsic genotype.
genotype
organismal genotype
sex-agnostic intrinsic genotype
In practice, most genotype instances classified as sex-agnostic genotypes because they are not sex-specific. When a genotype is indicated to be that of a male or female, it implies a known sex chromosome complement in the genomic background. This requires us to distinguish separate 'sex-qualified' genotype instances for males and females that share a common 'sex-agnostic' genotype. For example, male and female mice that of the same strain/background and containing the same set of genetic variations will have the same sex-agnostic intrinsic genotype, but different sex-qualified intrinsic genotypes (which take into account background sex chromosome sequence as identifying criteria for genotype instances).
intrinsic genotype (sex-agnostic)
An allele that contains a sequence alteration, or is itself a sequence alteration, making it variant with some reference allele.
The use of the descriptor 'variant' here is consistent with naming recommendations from the ACMG Guidelines paper here: PMID:25741868. Generally, the descriptive labels chosen for subtypes of variant allele conform these recommendations as well, where 'variant' is used to cover mutant and polymorphic alleles.
alterante allele
sequence-variant feature
variant feature
A particular allele is 'variant' in virtue of its containing a sequence alteration that differs from some reference allele standard. But note that an allele that is variant in one context/dataset can be considered a reference in another context/dataset.
variant allele
A sequence collection comprised of all 'variant single locus complements' in a single genome, which together constitute the variant component of an intrinsic genotype.
1. Note that even a reference locus (e.g. a wild-type gene) that is a member of a single locus complement that contains a variant allele is included in this 'genomic variation complement'. Thus, the members of this 'genomic variation complement' (which is a sequence collection) are 'single locus variant complements'. Our axiom below uses has_part rather than has_member, however, to account for the fact that many 'genomic variation complements' have only one 'single locus variant complement' as members. So because has_member is not reflexive, it is not appropriate for these cases.
2. Most genotypes have only one altered locus (ie only one 'single-locus variant complement') that distinguish it from some reference background. For example, the genotype instance 'fgf8a<sup>t1282a/+</sup>(AB)') exhibits a mutation at only one locus. But some genotypes vary at more than one locus (e.g. a double mutant that has alterations in the fgf8a gene and the shh gene)).
genomic variation complement
The ZFIN background 'AB' that serves as a reference as part of the genotype fgf8a^ti282a/+ (AB)
A reference genome that represents the sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).
Here, a 'genomic background' would differ form a 'reference genome' in that 'background' implies a derivation of the variant from the background (which is the case for most MOD strains), whereas a reference is simply meant as a target for comparison. But in a sense all background genomes are by default reference, in that the derived variant genome is compared against it.
genomic background
OBI:genetic population background information
background genome
The reference/wild-type cd99l2 danio rerio gene locus spans bases 27,004,426-27,021,059 on Chromosome 7. "mn004Gt" represents an experimentally created variant of this gene, in which sequence from a gene trap construct containing an RFP marker has been inserted into the cd99l2 gene locus. The resulting sequence-variant gene locus includes sequence from this construct that make it longer than the reference gene sequence, and result in it no longer producing a functional transcript. The sequence extent of this variant cd99l2 gene is determined based on how its remaining sequences align with that of the reference gene and surrounding sequence.
http://useast.ensembl.org/Danio_rerio/Gene/Summary?g=ENSDARG00000056722
http://zfin.org/action/feature/feature-detail?zdbID=ZDB-ALT-111117-8
A genomic feature that is defined positionally, as the sequence present where a gene resides in some reference genome (ie the extent of genomic sequence that corresponds to the location of functional gene in some reference genome).
Regarding the distinction between a 'gene' and a 'gene locus': Every zebrafish genome contains a 'gene locus' for every zebrafish gene. The majority are likely to be 'wild-type' or at least functional variants. But some may be variants that are mutated or truncated so as to lack functionality. According to current SO criteria defining genes, a 'gene' no longer exists in the case of a non-functional or deleted variant. But a 'gene locus' does exist - and its extent is that of the remaining/altered sequence as it aligns with the reference gene. Even for completely deleted genes, the gene locus remains (and here is equivalent to the junction corresponding to the where gene would live according to this alignment).
This design allows us to classify genes and any variants of those genes (be they functional or not) as the same type of thing (ie a 'gene locus'), since classification is based on genomic context rather than some functional capability. This is practical for treatment of MOD alleles and genotypes where many alleles repesent non-functional variants of a gene at a particular locus (ie 'null alleles'). What is important here is specifying what is present at a locus associated with a particular gene, whether or not it is a functional gene or not.
gene locus
1. The concept of a 'gene' in the Sequence Ontology is functionally defined, in that a gene necessarily produces a functional product. The notion of a 'gene locus' in GENO is positionally-defined, based on an alignment with some reference genome. So an Shh 'gene locus', for example, need not produce a functional product - only be located at the position of the Shh gene based on alignment with a reference genome. This is important in the context of representing gneetic variation, where and Shh allele may be non-functional, incomplete, or even completely deleted fom a genome. In all these cases, the 'Shh gene locus' remains at the position where the gene resides in the reference genome.
2. The extent of a gene locus is typically that of a functional reference gene located here, as defined by a reference genome build. In some cases, sequence alteration(s) create can a 'gene variant' whose sequence and/or length varies from that of the reference locus. The extent of such variants is defined relative to that of the reference gene sequence, based on alignment of the variant sequence. Gene variants might simply contain one or several point mutations and maintain the core functionality of the wild-type gene, or they may be null alleles that lack complete functionality as a result of deletion and/or replacement of all or a portion of the gene. In the case of a completely deleted gene, its extent is zero - ie the junction that aligns with where the gene would exist in a reference genome. For engineered transgenes where there is no reference gene to align with, the extent is that of the cistron responsible for producing the gene product.
gene allele
A sequence feature that serves as a standard against which 'variant' versions of the sequence feature are compared, or against which located sequence features within the reference region are aligned in order to assign position information.
This is a defined organizational class to collect all 'reference' sequence entities.
reference sequence
Being 'refercence' does not imply anything about the nature, prevalence, or function of a sequence. Only that some agent has used it to serve a reference role in defining a variant or locting a sequence.
reference sequence feature
2
a collection more than one sequence features (ie a collection of discontinuous sequence features)
perhaps not same as SO:sequence collection, as here we explicitly include features that can have an extent of zero (and SO:sequence collection is a collection of regions that have an extent of at least one)
1. Note that members of this class can be features with extents of zero (e.g. junctions). This is likely different than the SO:sequence feature class which has members that are regions.
obsolete_sequence feature collection
2
A sequence feature collection comprised of discontiguous sequences from a single genome
Previously called 'genetic locus collection'. Difference between 'genetic' and 'genomic', as used here, is that 'genomic' implies a feature is a heritable part of some genome, while 'genetic' implies that it is part of some feature that is capable of contributing to gene expression in a cell or other biological system.
genetic locus collection
genomic feature collection
Conceptually, members of this collection are meant to be about the sum total genetic material in a single cell or organism. But these members need not be associated with an actual material in a real cell or organism individual. For example, things like a 'reference genome' may not actually represent the material genome of any individual cell or organism in reality. Here, there may be no genomic material referents of the sequences in such a collection because the genome is tied to an idealized, hypothetical cell or organism instance. The key is that conceptually, they are still tied to the idea of being contained in a single genome. In the case of a genotype, the individual seqeunce members are not all about the genetic material of a singel cell or organism. Rather, it is the resolved sequence contained in the genotype that is meant to be about the total genomic sequence content of a genome - which we deem acceptable for classifying as a genetic locus collection.
obsolete_genomic locus collection
0
A single locus complement that serves as a standard against which 'alternate' sequences are compared
reference allelic complement
formerly reference single locus complement
reference single locus complement
A single locus complement complement where at least one member variant, and/or the total number of members of the locus deviates from the normal ploidy of the genome (e.g. a trisomic chromosome).
variant allelic complement
Instances of this class are collections comprised of all versions of a specific locus present in a cell or virion where at least one locus is variant (ie non-wild type). This is most commonly a pair of two gene loci on homologous chromosomes in a diploid genome.
This class also covers cases where deviant numbers of genes or chromosomes are present (e.g. trisomy of chromosome 21), even if their sequence is wild-type.
variant single locus complement
A genome that varies at one or more loci from the sequence of some reference genome.
http://purl.obolibrary.org/obo/SO_0001506 ! variant_genome (definition of SO term here is too vague to know if has same meaning as GENO class here)
variant genome
A locus/allele that serves as a standard against which alternate versions of the locus are compared.
reference locus
Being a 'reference locus/allele' is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, 'reference' status is typically assigned based on factors such as being the most common in a population, being an ancestral locus/allele, or being indentified first as a prototypical example of some locus or gene. For example, 'reference loci' in characterizing SNPs often represent the allele first characterized in a reference genome, or the most common allele in a population.
In model organism datasets, 'reference' loci/alleles are typically (but not always) the 'wild-type' variant for a given locus, representing a functional and unaltered version of the locus that is part of a defined genomic background, and against which natural or experimentally-induced alterations are compared.
reference allele
A genomic feature known to exist, but remaining uncharacterized with respect to its identity (e.g. which allele exists at a given gene locus).
Uses as a term of convenience for describing data reporting unspecified alleles in a genotype (i.e. in cases where zygosoty for a given locus is not known). Typlically recorded in genotype syntaxes as a ' /? '.
unspecified locus
An unspecified feature is known to exist as the partner of a characterized allele when the zygosity at that locus is not known. Its specific sequence/identity, however, is unknown (ie whether it is a reference or variant allele).
unspecified feature
A junction found at a chromosomal position where an insertion has occurred on the homologous chromosome, such that the junction represents the reference locus paired with the hemizygously inserted feature.
hemizygous reference junction
In the case of a transgenic insertion that creates a hemizygous locus, the refernce locus that this insertion is variant_with is the junction on the homologous chromosome at the same position where the insertion occurred. This is the 'hemizygous reference' junction.
The junction-insertion pair represents the allelic complement at that locus, which is considered to be hemizygous. Most genotype syntaxes represent this hemizygous state with a ' /0' notation.
reference junction
A gene that originates from the genome of a danio rerio.
danio rerio gene
A gene that originates from the genome of a homo sapiens.
homo sapiens gene
A gene that originates from the genome of a mus musculus.
mus musculus gene
A reference human sonic hedgehog (shh) gene spans bases 155,592,680-155,604,967 on Chromosome 7, according to genome build GRCh37, and produces a primary funcitonal transcript that is 4454 bp in length and produces a 462 amino acid protein involved in cell signaling events behind various aspects of cell differentiation and development.
http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000164690
Note that this may be slightly different than the extend described in other gene databases, such as Entrez Gene:http://www.ncbi.nlm.nih.gov/gene/6469
A version/allele of a gene that serves as a standard against which variant genes are compared.
reference gene
reference gene locus
Being a 'reference gene' is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, 'reference' status is typically assigned based on factors such as being the most common version/allele in a population, being an ancestral allele, or being indentified first as a prototypical example of a gene.
In model organism datasets, 'reference' genes are typically the 'wild-type' allele for a given gene, representing a functional and unaltered version of the gene that is part of a defined genomic background, and against which natural or experimentally-induced versions are compared.
reference gene allele
obsolete_experimental insertion
gene trap insertion
A transgene that has been integrated into a chrromosome in the host genome.
An integrated transgene differs from a transgenic insertion in that a transgenic insertion may contain no actual transgenes, multiple transgenes, and it may contain sequences in addition to its transgene(s). For example, sequences flanking the transgene, or the insertion may be polycistronic and contain more than on transgene. So the term 'integrated transgene' covers only a single transgene integrated as part of a transgenic insertion.
An 'integrated transgene' differs from its parent 'transgene' in that transgenes can include genes introduced into a cell/organism on an extra-chromosomal plasmid that is never integrated into the host genome.
integrated transgene
A nucleic acid macromolecule that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or is capable of being replicated and inherited through successive generations of progeny.
1. Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genomic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some genome.
2. A challenging edge case is experimentally delivered DNA into a terminally differentiated cell that will never divide. Such material does technically meet our definition - since we are careful to say that the material must be *capable of* being stably inherited through subsequent generations. Thus, we would say that *if* the cell were resume replication, the material would be heritable in this way.
1. Genomic material here is considered as a DNA or RNA molecule that is found in a cell or virus, and capable of being replicated and inherited by progeny cells or virus. As such, this nucleic acid is either chromosomal DNA, or some replicative epi-chromosomal plasmid or transposon. Genetic material is necessarily part of some 'material genome', and both are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some genome.
2. Genomic material need not be inherited from an immediate ancestor cell or organism (e.g. a replicative plasmid or transposon acquired through some experimental modification), but such cases must be capable of being inherited by progeny cells or organisms.
genomic material
A material entity that represents all genetic material in a cell or virion. The material genome is typically molecular aggregate of all the chromosomal DNA and epi-chromosomal DNA that represents all sequences that are heritable by progeny of a cell or virion.
physical genome
A genome is the collection of all nucleic acids in a cell or virus, representing all of an organism's hereditary information. It is typically DNA, but many viruses have RNA genomes. The genome includes both nuclear chromosomes (ie nuclear and micronucleus chromosomes) and cytoplasmic chromosomes stored in various organelles (e.g. mitochondrial or chloroplast chromosomes), and can in addition contain non-chromosomal elements such as replicative viruses, plasmids, and transposable elements.
Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some SO:genome.
material genome
a population of homo sapiens grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role)
Consider http://semanticscience.org/resource/SIO_001062 ! human population ("A human population refers to a collection of human beings").
homo sapiens population
human population
A maximal collection of organisms of a single species that have been bred or experimentally manipulated with the goal of being genetically identical.
organism strain or breed
two mice with the same genotype ionformation, but maintained in different labs, are different strains (many examples of this in MGI/IMSR)
strain or breed
A group comprised of organisms from a single taxonomic group (e.g. family, order, genus, species, or a strain or breed within a given taxon)
taxonomic group
mus musculus strain
danio rerio strain
sequence attribute that can inhere only in a collection of more than one sequence features
obsolete_sequence feature collection attribute
A quality inhering in a collection of discontinuous sequence features in a single genome that reside on the same macromolecule (eg the same chromosomes).
in cis
A quality inhering in a collection of discontinuous sequence features in a single genome that reside on different macromolecules (e.g. different chromosomes).
in trans
An allelic state that specifically describes the degree of similarity of alleles at a particular locus in the *chromosomal* genome (i.e. whether the alleles are the same or different).
allelic state
http://semanticscience.org/resource/SIO_001263
zygosity
hemizygous
heterozygous
homozygous
indeterminite zygosity
no-call zygosity
unknown zygosity
unspecified zygosity
indeterminite zygosity
MGI uses this term when zygosity is not known.
no-call zygosity
(this is how the GVF10 format/standard refers to loci without enough data to make an accurate call . . . see http://www.sequenceontology.org/resources/gvf.html#quick_gvf_examples)
The disposition of an entity to be transmitted to subsequent generations following a genetic replication or organismal reproduction event.
We can use these terms to describe the heritability of genetic matieral or sequence features - e.g. chromosomal DNA or genes are heritable in that they are passed on to child cells/organisms). Such genetic material has a heritable disposition in a cell or virion, in virtue of its being replicated in its cellular host and inherited by progeny cells (such that the sequence content it encodes is stably propagated in the genetic material of subsequence generations of cells).
We can also use these terms to describe the heritability of phenotypes/conditions - e.g. the passage of a particular trait or disease across generations of reproducing cells/organisms.
heritabililty
heritable
non-heritable
The disposition of a genetic variant to cause a particular phenotype or disease based on its allelic state (e.g. heterozygous vs homozygous)
Considering alternate definitions:
- a disposition related to the phenotypic effect of a particular allele based on its inherited allelic state (i.e., the complement of alleles present at a particular locus)
- the disposition of an allele in a particular allelic state (e.g. heterozygouse vs homozygous) to cause a particular phenotype.
phenotypic heritability
phenotypic inheritance pattern
condition inheritance
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that at least a partial variant-associated phenotype is apparent in heterozygotes
dominant inheritance
A mode of inheritance whereby a heterozygous individual expresses distinct traits or conditions associated with both alleles (e.g. an individual with an AB blood type).
Alt: A disposition inhering in a variant of a genetic locus that is realized in an inheritance pattern whereby two different variants are phenotypically expressed in a heterozygous individual (e.g. an individual with an AB blood type)
co-dominant inheritance
pure dominant inheritance
complete dominant inheritance
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that it's associated phenotype is partially expressed in heterozygotes (ie the observed phenotype is intermediate between that of the two distinct loci)
incomplete dominant inheritance
semi-dominant inheritance
X-linked dominant inheritance
allosomal dominant inheritance
autosomal dominant iniheritance
disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that no variant-associated phenotype is apparent in heterozygotes
recessive inheritance
X-linked recessive inheritance
allosomal recessive inheritance
autosomal recessive inheritance
An allele attribute inhering in a locus that is designated to serve as a standard against which 'variant' versions of the same locus are compared.
Being 'reference' is a role or status assigned in the context of a data set or analysis framework. A given allele can be reference on one context and variant in another.
reference
unspecified life cycle stage
objective is to insert some specified sequence into the genome of a cell or virus
genetic insertion technique
mutagen treatment technique
a genetic alteration technique that creates a variant/allele of a known gene - either by prospective targeting of a specific the gene through homologous recombination, or by retrospective sequence analysis to determine the insertion locus of a randomly integrated transgene (e.g. as done in gene trapping).
This is represented axiomatically by the requirement that a 'gene variant' (ie an allele/variant of a known gene) is the specified_output of this technique. This is contrasted to non-targted/random insertions/alterations where the altered locus is not known, and therefore no variant allele of a gene is created.
targeted gene mutation technique
Is considered to be 'non-targeted' in the sense that the insertion occurs randomly and not through homologous recombination.
random genetic insertion technique
targeted genetic insertion technique
enhancer trapping technique
gene trapping technique
promoter trapping technique
targeted knock-in technique
random transgene insertion technique
A single locus complement that represents the collection of all chromosome sequences for a given chromosome in a single genome
chromosome complement
An aneuploid chromosomal alement that is a complete duplicated chromosome resulting from a meiotic non-disjunction event.
duplicate chromosome
non-disjunct duplicate chromosome
novel aneusomic chromosome
This 'gained' chromosome is conceptually an 'insertion' in a genome that received two copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration, and as a 'extra' chromosome.
gained aneusomic chromosome
0
A null locus that represents a missing chromosome, typically resulting from a meiotic non-disjunction event
This 'lost' chromosome is conceptually a 'deletion' in a genome that received zero copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration, and as a 'null' chromosome. But it doesnt classify under SO:deletion because this class is defined as "the point at which one or more contiguous nucleotides were excised".
absent aneusomic chromosome
absent monosomic chromosome complement
non-disjunct absent chromosome
lost aneusomic chromosome
A genomic feature that is a portion of a chromosome that has been abnormally lost or duplicated (as a fusion to a non-homologous chromosome), as the result of an unbalanced translocation.
aneuploid chromosomal segment
aneusomic chromosomal subregion/segment
partial aneusomic chromosomal element
Partial aneuploidy: The terms "partial monosomy" and "partial trisomy" are used to describe an imbalance of genetic material caused by loss or gain of part of a chromosome. In particular, these terms would be used in the situation of an unbalanced translocation, where an individual carries a derivative chromosome formed through the breakage and fusion of two different chromosomes. In this situation, the individual would have three copies of part of one chromosome (two normal copies and the portion that exists on the derivative chromosome) and only one copy of part of the other chromosome involved in the derivative chromosome.
http://en.wikipedia.org/wiki/Aneuploidy
We consider novel sequence features gained in a genome to be sequence alterations, including aneusomic chromosome segments gained through unbalanced translocation events, entrie aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that becoome part of the heritable gneme of a cell or organism.
aneusomic chromosomal part
A genomic feature that is a portion of a chromosome that has been abnormally duplicated as a fusion to a non-homologous chromosome as the result of an unbalanced translocation.
In our model, we consider this chromosomal region to be trisomic, and thus a variant single locus complement
duplicate partial aneuploid chromosomal element
translocated duplicate chromosomal element
translocated duplicate chromosomal segment
gained aneusomic chromosomal segment
0
A sequence alteration representing the absence of sequence resulting from an unbalanced translocation to another chromosome.
In our model, we consider this chromosomal region to be monosomic, and thus a variant single locus complement
dropped partial anneuploid chromosomal element
translocated absent chromosomal segment
truncated chromosome terminus
This is not a deletion in the sense defined by the Sequence Ontology in that is is not the result of an 'excision' of nucleotides, but an unbalanced translocation event.
The allelic complement that results is comprised of the null allele (terminus or junction) represented by this lost chromosomal segment, and the remaining normal segment in the homologous chromosome.
The lost aneusommic chromosomal segment is typically accommpanied by a gained aneusomic chromosomal segment from another chromosome.
lost aneusomic chromosomal segment
A genomic feature that is a complete chromosome that has been abnormally duplicated or lost, typically as the result of a non-disjunction event or unbalanced translocation
complete aneusomic chromosome
We consider large sequence features gained in a genome to be sequence alterations (akin to insertions), including aneusomic chromosome segments gained through unbalanced translocation events, entrie aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable gneme of a cell or organism.
Similarly, large sequence features lost from genome are akin to deletions and therefore also considered sequence alterations. This includes the loss of chromosome segments through unbalanced translocation events, and the loss of entrie chromosomes through a non-disjunction event during replication.
aneusomic chromosome
Stub class to serve as root of hierarchy for imports of biological processes from GO-BP.
biological process
disomic zygosity
aneusomic zygosity
trisomic homozygous
trisomic heterozygous
A heterozygous quality inhering in a single locus complement comprised of two different varaint alleles and no wild type locus. (e.g.fgf8a<ti282a>/fgf8a<x15>)
compound heterozygous
A sequence feature that references some biological macromolecule applied as a reagent in an experiment or technique (e.g. a morpholino expression plasmid, or oligonucleotide probe)
replaced with SO:engineered_region
extra-genomic sequence
obsolete_reagent sequence feature
a heterozygous quality inhering in a locus complement comprised of one variant locus and one wild-type/reference locus (e.g.fgf8a<ti282a>/fgf8a<+>)
simple heterozygous
A structurally or functionally defined component of a transgene (e.g. a promoter, a region coding for a fluorescent protein tag, etc)
transgene feature
An allele attribute inhering in a locus that varies from some designated reference in virtue of alterations in its sequence or expression level
variant
An allele attribute inhereing in a locus for which there is more than one version fixed in a population at some significant percentage (typically 1% or greater), where the locus is not considered to be either reference or a variant.
polymorphic
An allele attribute inhering in a locus bearing a sequence alteration that is present at very low levels in a given population (typically less than 1%), or that has been experimentally generated to alter the locus with respect to some reference sequence.
mutant
A sequence feature that is part of the heritable genome of a cell or organism, whose identity is defined by its sequence and genomic position.
genomic locus
A genomic feature (aka genomic locus) is an extent of sequence at a particular location in a genome, which can span any size from a complete chromosome, to chromosomal bands or regions, to single genes, to a single base pair or even junction between base pairs (i.e. feature with an extent of zero).
It has a defined position in a genome, determined by its alignment with a genomic reference sequence. Instances of genomic features are identified by both their sequence and their position in a genome.
genomic feature
A nucleic acid molecule that contains one or more sequences serving as a template for gene expression in a biological system (ie a cell or virion).
This class is different from genomic material in that genomic material is necessarily heritable, while genetic material includes genomic material, as well as any additional nucleic acids that participate in gene expression resulting in a cellular or organismal phenotype. So things like transiently transfected expression constructs would qualify as 'genetic material but not 'genomic material'. Things like siRNAs and morpholinos affect gene expression indirectly, (ie are not templates for gene expression), and therefore do not qualify as genetic material.
genetic material
An allele/locus that is variant with respect to some wild-type allele, in virtue of its being a very rare variant in a population (typically <1%), or being an experimentally-induced alteration that derives from a wild-type background locus for a given strain.
mutant locus
Based on use of 'mutant' as described in PMID: 25741868 ACMG Guidelines
'Mutant' is typically contrasted with 'wild-type', where 'mutant' indicates a natural but very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type background locus for a given strain, which can be selected for in establishing a mutant line.
mutant allele
A sequence alteration that is very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type background locus for a given strain.
mutation
A genetic feature that is not part of the chromosomal genome of a cell or virion, but rather a stable and heritable element that is replilcated and passed on to progeny (e.g a replicative plasmid or transposon)
Consider replacing with SO_0001038 ! extrachromosomal_mobile_genetic_element
episomal replicon
Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is novel or aberrant (i.e. not included in the reference for that genome), the replicon is considered a 'sequence alteration'.
extrachromosomal replicon
expression construct feature
expression construct
A variant locus/allele that is fixed in a population at some stable level, typically > 1%. Polymorphic alleles are variants of loci where more than one version exists at signifcant frequencies in a population.
polymorphic locus
PMID: 25741868 ACMG Guidelines
Polymorphic loci/alleles are contrasted with mutant alleles (extremely rare variants that exist in <1% of a population), and 'wild-type alleles' (extremenly common variants present in >99% of a population).
polymorphic allele
A polymorphic locus that is present at the highest frequency relative to other polymorphic variants of the same locus.
major allele
major polymorphic locus
major polymorphic allele
A polymorphic locus that is not present at the highest frequency among all fixed variants at the locus (i.e. not the major polymorphic allele for a given locus).
minor allele
minor polymorphic locus
minor polymorphic allele
A polymorphic locus that is determined from the sequence of a recent ancestor in a phylogentic tree.
ancestral allele
ancestral polymorphic locus
ancestral polymorphic allele
An allele representing a highly common varaint (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant loci are compared.
WT locus
wild-type allele
'Wild-type' is typically contrasted with 'mutant', where 'wild-type' indicates a highly prevalent allele in a population (typically >99%), and/or some prototypical allele in a background genome that serves as a basis for some experimental alteration to generate a mutant allele, which can be selected for in establishing a mutant strain.
The notion of wild-type alleles is more common in model organism databases, where specific mutations are generated against a wild-type reference feature. Wild-type alleles are typically but not always used as reference alleles in sequence comparison/analysis applications. More than one wild-type sequence can exist for a given feature, but typically only one allele is deemed wild-type iin the context of a single dataset or analysis.
wild-type allele
wild-type gene allele
A gene locus representing a single most common varaint (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant genes are compared in characterizing the phenotypic consequences of genetic variation.
wild-type gene
A gene altered in its expression level in the context of some experiment as a result of being targeted by gene-knockdown reagent(s) such as a morpholino or RNAi.
The identity of a given instance of a reagent-targeted gene is dependent on the experimental context of its knock-down - specifically what reagent was used and at what level. For example, the wild-type shha zebrafish gene targeted in epxeriment 1 by morpholino1 annd in experiment 2 by morpholino 2 represent two distinct instances of a 'reagent-targeted gene', despite sharing the same sequence and position.
reagent targeted gene
A transgene that is delivered as part of a DNA expression construct into a cell or organism in order to transiently express a specified product (i.e. it has not integrated into the host genome).
experimentally-expressed transgene
extrinsic transgene
transiently-expressed transgene
An allele attribute describing a highly common variant (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant loci are compared.
wild-type
A sequence feature representing one of a set of coexisting sequence variants at a particular genomic locus.
variable locus
http://purl.obolibrary.org/obo/SO_0001023 ! allele
The notion of an 'allele' defined here is distinct from that of a genetic 'variant' - as 'allele' includes features considered to be the 'reference' at a particular locus. Thus, alleles capture the state of the sequence found at a genetic locus, be it variant or reference.
The most common use of the term refers to alleles of defined genes. But GENO uses the term broadly to include variants of any defined marker or extent of sequence in a genome. So in addition to genes, we can talk about alleles of single nucleotides (e.g. SNVs / SNPs), QTLs, microsattelite regions, or even large/structural chromosomal regions.
allele
a sequence attribute of a chromosome or chromosomal region that has been abnormally duplicated or lost, as the result of a non-disjunction event or unbalanced translocation.
aneusomic
An allele of a gene that has as part some sequence alteration.
A 'variant locus' being an allele_of a gene is based on its location in a host genome - not strictly on its sequence. This means, for example, that the insertion of the human SMN2 gene as a transgene into the genome of a mouse (see http://www.informatics.jax.org/allele/MGI:3056903) DOES NOT represent an allele_of the human SMN2 gene according to the GENO model - because it is located in a mouse genome, not a human one. Rather, this is a transgene that derives_sequence_from the human SMN2 gene.
If this human gene is inserted to replace the mouse SMN2 gene, it becomes an allele_of the mouse SMN2 gene (one that happens to match the sequence of the human ortholog of the gene). If this human gene is randomly inserted at an unspecified location in the mouse genome, it does not create/represent an allele of any gene - because it is not tied to a known genomic locus.
variant gene locus
A gene allele is 'variant' in virtue of its containing a sequence alteration that varies from some reference gene standard. But note that a gene allele that is variant in one context/dataset can be considered a reference in another context/dataset.
variant gene allele
The set of two copies of the shha gene in a diiploid zebrafish genome, e.g. fgf8a<ti282a/+>.
The collection of the individual nucleotides present at the position 24126737 on across all copies of chromosome 5 in a particular human genome.
TO DO: show a VCF representation of this example.
A genomic feature complement comprised of the set of all homologous features in a single genome that are found at a particular genomic location/span.
Possible definitions for this concept:
1. the set of all homologous features in a single genome that are found at a particular genomic location/span.
2. the set of all homologous instances of a particular feature in a single genome.
3. the set of all versions of a sequence found at a particular genetic locus in a single genome.
allelic complement
allelic set
- A genomic feature is any located feature in the genome, be it a junction, a single nucleotide, a gene, or an entire chromosome.
- The notion of a "complement" describes the collection of all elements in some defined set.
- The notion of "homologous" genomic features describes those that occupy the same locus on homologous chromosomes.
- Therefore, we use the term "single locus complement" to describe the set of all homologous features present at a particular genomic locus.
- This complement is typically a pair of two features in a diploid genome (with two copies of each chromosome). E.g. a gene pair, a QTL pair, a nucleotide pair for a SNP, or a pair of entire chromosomes.
single locus complement
In an experiment where shha is targeted by MO1 and shhb is overexpressed from a transgenic expression construct, the extrinsic genotype describes the 'expression altered' shha gene targeted by MO-1, and the shhb gene as overexpressed from its construct.
A genotype that describes the transient variation in gene expression in a cell or organism during an experiment, as mediated through gene-specific interventions (e.g. gene knockdown reagents such as RNAi or morpholinos, overexpression constructs).
experimental genotype
Extrinsic genotypes describe all genes in a cell or organism whose expression is transiently increased or decreased through gene-specific interventions in an experiment where phenotypic assessment is made.
extrinsic genotype
A genotype that describes the total variation in genomic sequence, along with transient variation in gene expression during an experiment as mediated through gene-specific interventions (e.g. gene knockdown reagents such as RNAi or morpholinos, or overexpression constructs). An effective genotype is meant to summarize all factors related to genes and their expression that influence an observed phenotype.
EFO:0000513 ! genotype: "The total sum of the genetic information of an organism that is known and relevant to the experiment being performed, including chromosomal, plasmid, viral or other genetic material which has been introduced into the organism either prior to or during the experiment."
A genotype that describes the total variation in genomic sequence, along with all genes whose expression is transiently increased or decreased through gene-specific interventions in an experiment. An effective genotype is meant to summarize all factors related to genes and their expression that influence an observed phenotype.
effective genotype
A set of all targeted genes in a single genome in the context of a given experiment (e.g. both copies of the WT shha gene in a zebrafish exposed to shha-targeting morpholinos)
reagent-targeted gene complement
reagent-targeted gene complement
The set of all transgenes trransiently expressed in a biological system in the context of a given experiment.
experimental transgene complement
transiently-expressed transgene complement
Consider wild-type zebrafish shha gene in the context of being targeted by morpholino1 vs morpholino 2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'expression-variant genes' because of their different external context. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants of the same gene can have potentially different associated phenotypes).
A gene altered in its expression level relative to some baseline of normal expression in the system under investigation (e.g. a cell line or model organism).
See SO classes under 'silenced gene' (e.g. 'gene silenced by RNA interference'). These seem to represent the concept of a qualified feature as I define it here, in that they are defined by alterations extrinsic to the sequence and position of the gene itself.
expression allele
Expression-variant genes are altered in their expression level through some modification or intervention external to its sequence and position. These may include endogenous mechanisms (e.g. direct epigentic modification that impact expression level, or altered regulatory networks controlling gene expression), or experimental interventions (e.g. targeting by a gene-knockdown reagent, or being transiently expressed as part of a transgenic construct in a host cell or organism).
The identity of a given instance of a experssion-variant gene is dependent on how its level of expression is manipulated in a biological system (i.e. via targeting by gene-knockdown reagents, or being transiently overexpressed). So expression-variant genes have the additional identity criteria of a genetic context of its material bearer (external to its sequence and position) that impacts its level of expression in a biological system.
expression-variant gene
gene knockdown reagent
A region within a gene that is specifically targeted by a gene knockdown reagent, typically in virtue of bearing sequence complementary to the reagent.
targeted gene segment
reagent-targeted gene subregion
An information content entity that describes a genome by specifying the total variation in genomic sequence and/or gene expression, relative to some extablished background.
The concept of a 'genotype' is considered broadly in GENO to describe the total variation in genetic sequence and/or expression across a genome. The more specific concept of an 'intrinsic genotype' describes varaition in heritable genomic *sequence*, while the concpet of an 'extrinsic genotype' describes the set of genes that varies transiently in *expression* as a result of some experimentally induced, gene-specific targeted knowck-down or overexpression. The concept of an 'effective genotype' describes both intrinsic and extrinsic variation (i.e. = intrinsic + extrinnsic genotype infoormation).
genotype
ZFIN do not annotate with a pre-composed phenotype ontology - all annotations compose phenotypes on-the-fly using a combination of PATO, ZFA, GO and other ontologies. So while there is no manually curated zebrafish phenotype ontology, the Upheno pipeline generates one automatically here: http://purl.obolibrary.org/obo/upheno/zp.owl
This ontology does not have a root 'phenotype' class, however, and so we generate our own in GENO as a stub placeholder for import of needed zebrafish phenotype classes.
zebrafish phenotype
an allelic state where a single allele exists at a particular locus in the organellar genome (mitochondrial or plastid) of a cell/organism.
homoplasmic
an allelic state where more than one type of allele exists at a particular locus in the organellar genome (mitochondrial or plastid) of a cell/organism.
heteroplasmic
hemizygous X-linked
hemizygous Y-linked
hemizygous insertion-linked
An intrinsic genotype that specifies the baseline sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations).
Being a 'background genotype' implies the derivation of some variant from this background (which is the case for most model organism database genotypes/strains). This is a subtly different notion than being a 'reference genotype' , which can be any genotype that serves as a basis for comparison. But in a sense all background genotypes are by default reference genotypes, in that the derived variant genotype is compared against it.
reference genotype
background genotype
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
An extended part of a chromosome representing a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.
New term request for SO.
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
chromosomal region
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
chromosome sub-band
chromosomal band brightness
chromosomal band intensity
gpos
gneg
gvar
gpos100
gpos75
gpos50
gpos25
A chromosome arm that is the shorter of the two arms of a given chromosome.
p-arm
stalk
short chromosome arm
A chromosome arm that is the longer of the two arms of a given chromosome.
q-arm
long chromosome arm
gpos66
gpos33
A transgene feature whose sequence regulated the the synthesis of functional product, but which is not itself transcribed.
regulatory transgene feature
A transgene feature whose sequence is expressed in a gene product through transcription and/or translation.
coding transgene feature
expressed transgene feature
reporter feature
A transgene whose product is used as a selectable marker.
selectable marker transgene
The complement of sequence features specified by a karyotype.
Note that this collection of features can be as simple as '46XY', but typically contains some gross variant component (such as a chromosome duplication or translocation).
karyotype feature complement
An intrinsic genotype where the genomic background specifies a male or female sex chromosome complement.
This modeling approach enables creation separate genotype instances for data sources that report sex-specific phenotypes to ensure that sex-specific G2P differences are accurately described. These sex specific genotypes can be linked to the broader intrinsic genotype that is shared by male and female mice of the same strain, to aggregate associated phenotypes at this level, and allow aggregation with G2P association data about the same strains from sources that distinguish sex-specific phenotypes (e.g. IMPC) and those that do not (e.g. MGI).
In the genotype partonomy, a sex qualified genotype has as part a sex-agnostic genotype. This allows for the propagation of phenotypes associated with a sex-qualified genotype to the intrinsic genotype. Ontologically, this parthood is based on the fact that the background component of a sex-qualified genotype specifies the sex chromosomes while that of the sex-agnostic genotype does not. Thus, the sequence content of the sex-qualified genotype is a superset of that of the intrinsic genotype, with the latter being a proper part of the former.
intrinsic genotype (sex-specific)
sex-qualified genotype
sex-qualified intrinsic genotype
We distinguish the notion of a sex-agnostic intrinsic genotype, which does not specify whether the portion of the genome defining organismal sex is male or female, from the notion of a sex-qualified intrinsic genotype, which does. Male and female mice that contain the same background and genetic variation complement will have the same 'sex-agnostic intrinsic genotype', despite their genomes varying in their sex-chromosome complement. By contrast, these two mice would have different 'sex-qualified intrinsic genotypes', as this class takes background sex chromosome sequences into account in the identity criteria for its instances.
Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome.
intrinsic genotype (sex-qualified)
A sex-qualified intrinsic genotype where the genomic background contains a male sex chromosome complement.
male intrinsic genotype
A sex-qualified intrinsic genotype where the genomic background is contains a female sex chromosome complement.
female intrinsic genotype
A backgrorund genotype whose sequence or identity is not known or specified.
unspecified background genotype
An exhaustive collection of all sequence features in a collection that meet some specified criteria (e.g. all chromosomes in a genome, all variant loci in a genome, all copies of a given gene or locus in a genome).
Not all sequence feature complements will be collections - i.e. in some cases the complement of all features of type X will consist of a single feature. For example, a 'single locus complement' for an X-linked locus in a XY male.
sequence feature complement
An exhaustive collection of all features (genomic loci) in a single genome that complete a set defined by some specified inclusion criteria. (e.g. all chromosomes in a genome, all variant loci in a genome, all copies of a given gene or locus in a genome).
In some cases there may be zero or only one member of such a complement, which is why this class is not necessarily a 'sequence feature collection' (which has 2 or more members).
genomic locus complement
A genomic locus is any located feature in the genome, from a single nucleotide to a gene into an entire chromosome. A complement is the collection of all elements in a set (i.e. "the full number of things in a set"). Here, a 'genomic locus complement' is here the set of all features in a single genome that complete a set defined by some specified inclusion criteria.
genomic feature complement
A genomic feature that is part of a gene, and delineated by some functional or structural function or role it serves (e.g.a promoter element, coding region, etc).
defined gene part
gene locus part
gene part
A transgene that codes for a product used as a reporter of gene expression or activity.
reporter transgene
0
A junction between bases, a deletion variant, a terminus at the end of a chromosome.
A genomic feature that has an extent of zero.
null locus
null feature
An extrachromosomal replicon that is variant in a genome in virtue of its being a novel addition to the genome - i.e. it is not present in the reference for the genome in which it is found.
aberrant extrachromosomal replicon
exogenous extrachromosomal replicon
transgenic extrachromosomal replicon
Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is exogenous or aberrant (i.e. not included in the reference for that genome), the replicon is considered a 'variant locus' and a 'sequence alteration'.
novel extrachromosomal replicon
A genomic feature that represents an entirely new replicon in the genome, e.g. an extrachromosomal replicon or an extra copy of a chromosome.
This class is defined so as to support classification of things like novel extrachromosomal replicons and aneusomic chromosomes as being variant loci/alleles in a genome. These represent entirely new loci in the genome - not variants of an existing locus.
Novel replicons are considered as an 'insertion' in a genome, and as such, qualify as types of sequence_alterations and variant alleles. There is no pre-existing locus that it modifies, however, and thus it is not really an 'allele of' a named locus. But conceptually, we still consider these to represent genetic variants and classify them as variant alleles.
novel replicon
An attribute of a genomic feature that represents a feature not previously found in a given genome, e.g. an extrachromosomal replicon or aneusomic third copy of a chromosome.
novel
A null locus representing the end of a sequence feature that is bounded only on one side (e.g. at the end of an chromosome or oligonucleotide.
terminus
An extent of biological sequence, or a collection of such extents, whose identity is dependent on both its sequence and its position.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
sequence feature or collection
'Sequences' differ from 'sequence features' in that instances are distinguished only by their inherent ordering of units, and not by any positional aspect related to alignment with some reference sequence. Accordingly, the 'ATG' translational start codon of the human AKT gene is the same *sequence* as the 'ATG' start codon of the human SHH gene, but these represent two distinct sequence features in virtue of their different positions in the genome.
An ordered collection units representing successive monomers of a biological macromolecule.
biomacromolecular sequence
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
biological sequence
true
A sequence feature (or collection of features) whose identity is dependent on the context or state of its material bearer (in addition to its sequence an position). This context/state describes factors external to its inherent sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
qualified sequence feature or collection
Consider wild-type zebrafish shha gene in the context of being targeted by morpholino1 vs morpholino 2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'qualified sequence feature' because of their different external context. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants of the same gene can have potentially different associated phenotypes).
A sequence featurewhose identity is dependent on the context or state of its material bearer (in addition to its sequence and position). This context describes factors external to its inherent sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or being the target of epigenetic modification.
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
Modeling sequence entities at this 'qualified' level useful when it is important to distinguish features with identical sequence and position as separate instances - based on their material bearers being found in different contexts. For example, a situation where the shha gene is targeted by two different morpholinos and phenotypes assessed for each. This is analogous to two different alleles of the shha gene at the sequence feature level, and similarly worthy of being distinguished when considering how the resulting alteration in gene expression impacts the measured phenotypes of the host zebrafish.
qualified sequence feature
true
This axiom is an initial attempt to formalize the identity criteria of an extrinnsic context that separates qualified sequence features from sequence features (i.e. the context of its material bearer). As we further develop our efforts here this will get refined and more precise.
true
Formalizes one identity criteria of the sequence feature component of a qualified sequence feature (which itself is identified by its sequence and its genomic position).
A set of all qualified sequence features of a specified type in a single genome.
qualified sequence feature complement
A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype.
Genotype vs Genome in GENO: An (intrinsic) genotype is an information artifact representing an indirect syntax for specifying a genome sequence. This syntax has reference and variant components - a 'background genotype' and 'genomic variation complement' - that must be operated on to resolve a specifie genome sequence. Specifically, the genome sequence is resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the 'reference genome'. So, while the total sequence content represented in a genotype may be greater than that in a genome, the intended resolution of these sequences is to arrive at a single genome sequence. It is this end-point that we consider when holding that a genotype 'specifies' a genome.
1. A genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome ('genomic background'), and all specific variants from this reference (the 'genomic variation complement'). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'genomic background' sequence.
2. *Heritable* genomic sequence is that which is passed on to subsequence generations of ccells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any heritable extrachromosomal sequences.
intrinsic genotype
DNA sequence
RNA sequence
amino acid sequence
obsolete_biological sequence or collection
obsolete_biological sequence collection
A sequence feature whose identity is additionally dependent on the cellular or anatomical location of the genetic material bearing the feature.
As a qualified sequence feature, the BRCA1c.5096G>A variant as materialized in a somatic breast epithelial cell could be distinguished as a separate entity from a BRCA1c.5096G>A variant in a different cell type or location (e.g. germline BRCA1 varaint in a sperm cell).
location-qualified sequence feature
A sequence feature whose identity is additionally dependent on factors specifically influencing its level of expression in the context of a biological system (e.g. being targeted by gene-knockdown reagents, or driven from exogneous expression system like recombinant construct)
expression-qualified sequence feature
A sequence feature position based on a genomic coordinate system, where the position specifies start and end coordinates based on its alignment with some reference genomic sequence.
This 'genomic position' concept differs from the faldo:Position concecpt in that the former describes the start AND end points/coordinates of a feature, while the latter describes a single point/coordinate at the beginning OR end of a feature.
genomic coordinates
genomic position
phenotypic inheritance process
A seauence attribute inhering in a locus whose identity is not specified.
unspecified
An attribute describing a type of variation inhering in a sequence feature or collection.
allele attribute
variation attribute
An information content entity that describes the structural chromosomal alterations present in a genome.
karyotype
An intrinsic genotype that specifies variation from a reference or background.
variant intrinsic genotype
An information entity that is intented to represent some biological sequence, sequence feature, qualified sequence feature, or a collection of one or more of these entities.
sequence information entity
biological sequence residue
monomeric residue
biological sequence unit
deoxyribonucleic acid residue
DNA residue
ribonucleic acid residue
RNA residue
amino acid residue
An attribute, quality, or state of a sequence or sequence feature.
Sequence feature attributes can be based on qualities of the material bearers of the feature, for example, the staining intensity of a chromosomal band feature.
http://purl.obolibrary.org/obo/SO_0000400
sequence feature attribute
An attribute of a sequence feature related to the location of its starting and ending residues according to some reference coordinate system.
sequence feature position
A sequence feature whose identity is additionally dependent on a chemical modification made to the genetic material bearing the feature (e.g. binding of transcriptional regulators, or epigenetic modifications including direct DNA methylation, or modification of histones associated with a feature)
modification-qualified sequence feature
An example of an allelotype for a zebrafish might describe a specific combination of variants and their zygosity at a specific locus in the genome, e.g. "fgf8a<ti282a>/fgf8a<+>". An example allelotype for a human locus might similarly describe a heterozygous state at a specific position on human chromosome 12 as "GRCh38 Chr12:258635(A;T)"
A sequence information entity that specifies the 'allelic state' of a defined locus in the genome - i.e. what allele(s) are present at this locus across all homologous chromosomes and their zygosity.
allelic state information
single locus genotype
The term 'genotype' has varied meanings/scopes in the domains of research and medicine. In some settings we talk about the genotype of a cell or organism, which describes an entire genome in terms of a background and variations/diffs from this background. But we also talk about the genotype at a particular locus, which describes more narrowly the 'allelic state' at a single location in the genome (i.e. the alleles present at a location and their zygosity).
In GENO, we reserve the term 'genotype' for the former case, describing variation across a full genome of a cell or organism, and use the term 'allelic genotype' to refer to the latter, to describe variation present across all features at a single defiend locus in the genome (i.e. the allelic state of the locus).
allelic genotype
Exploratory class looking at creating more specific subtypes of associatiosn, and defining identity criteria for each.
genotype-phenotype association
true
true
true
true
knockdown reagent targeted gene complement
A sequence alteration within the coding sequence of a gene.
coding sequence alteration
A construct that contains a mobile P-element, holding sequences to be delivered to a target cell or genome.
P-element construct
An engineered region that is used to transfer foreign genetic material into a host cell.
engineered_genetic_vector
Constructs can be engineered to carry inserts of DNA from external sources, for purposes of cloning and propagation or gene expression in host cells.
Constructs are typically packaged as part of delivery systems such as plasmids or viral vectors.
engineered genetic construct
A transgene that is not chromosomally integrated in the host genome, but instead exists as part of an extra-chromosomal construct.
non-integrated transgene
extra-chromosomal transgene
2
A collection of more than one sequence feature.
http://purl.obolibrary.org/obo/SO_0001260 ! sequence_collection
sequence feature collection
2
A set of two or more sequence alterations on the same chromosomal strand that tend to be transmitted together.
Haplotypes are most commonly comprised of two or more single-nucleotide polymorphisms or other small alterations that affect a particular gene.
haplotype
A attribute describing the number of copies of a feature present in a genome.
copy number
A relation used to describe an environment contextualizing the identity of an entity.
microsatellite alteration
A relation used to describe a process contextualizing the identity of an entity.
repeat region alteration
A quality inhering in an 'allelic complement' (aka a 'single locus complement') that describes the allelic variability found at a particular locus in the genome of a single cell/organism
allelic state
allelic dosage
an attribute inhering in a feature based on the total number or relative stoichiometry of copies present in a particular genome.
gene dosage
genetic dosage
A quality inhering in an allele based on its source/origin - typically the parent from which it was inherited.
allele origin
a quality of an allele in virtue of its having been inherited from a female parent.
maternal allele origin
a quality of an allele in virtue of its having been inherited from a male parent.
paternal allele origin
a quality of an allele in virtue of its having occurred through a de novo mutaiton, rather than inherited from a parent..
de novo allele origin
a quality of an allele in virtue of its origin not being known.
unknown allele origin
a quality inhering in a feature in virtue of its presence only in the genome of non-germ cells.
somatic
a quality inhering in a feature in virtue of its presence only in the genome of gametes (germ cells).
germ-line
gametic
2
A sequence feature collection comprised of two haplotypes at a particular locus on paired homologous chromosomes.
"Humans are diploid organisms; they have paired homologous chromosomes in their somatic cells, which contain two copies of each gene. An allele is one member of a pair of genes occupying a specific spot on a chromosome (called locus). Two alleles at the same locus on homologous chromosomes make up the individual’s genotype. A haplotype (a contraction of the term ‘haploid genotype’) is a combination of alleles at multiple loci that are transmitted together on the same chromosome. Haplotype may refer to as few as two loci or to an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. Genewise haplotypes are established with markers within a gene; familywise haplotypes are established with markers within members of a gene family; and regionwise haplotypes are established within different genes in a region at the same chromosome. Finally, a diplotype is a matched pair of haplotypes on homologous chromosomes."
From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/figure/sap-26-03-165-g002/
diplotype
A quality inhering in a collection of discontinuous sequence features in a single genome in virtue of their relative position on the same or separate chromosomes.
allelic phase
oryzias latipes strain
a quality of an allele in virtue of its having been inherited from either parent.
parental allele origin
unknown inheritance
The canonical allele that represents a single nucleotide variation in the BRCA2 gene, which can be described by various contextual alleles such as “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A”.
One of a set of sequence features or haplotypes that exist at a particular genetic locus.
The notion of a 'canonical allele' is taken from the ClinGen Allele model (http://datamodel.clinicalgenome.org/allele/). It is implemented in GENO to provide an ontological representation of this concept that will support data integration efforts, but may be replaced by should an IRI become available from the ClinGen model.
http://datamodel.clinicalgenome.org/allele/resource/canonical_allele/
ClinGen Allele Model (http://datamodel.clinicalgenome.org/allele/)
As a 'sequence feature or collection' (sensu SO), a 'canonical allele' is considered here as an extent of biological sequence encoded in nucleic acid molecules of a cell or organism (as opposed to an information artifact that is about such a sequence). Canonical alleles can include haplotypes that consist of more than one discontinuous sequence feature that exist in cis on the same chromosomal strand (as such haplotypes are often linked to a particular disease).
In the ClinGen allele model, 'canonical alleles are contrasted with 'contextual alleles'. Contextual alleles are informational representation that describe a canonical allele, using a particular reference sequence. A single canonical allele can be described by many contextual alleles that each use a different reference sequence in their representation.
canonical allele
An informational artifact that describes a canonical allele by defining its sequence and position relative to a particular reference sequence.
The notion of a 'contextual allele' is taken from the ClinGen Allele model (http://datamodel.clinicalgenome.org/allele/). It is implemented in GENO to provide an ontological representation of this concept that will support data integration efforts, but may be replaced by should an IRI become available from the ClinGen model.
http://datamodel.clinicalgenome.org/allele/resource/contextual_allele/
ClinGen Allele Model (http://datamodel.clinicalgenome.org/allele/)
The notion of a 'contextual allele' derives from the ClinGen Allele model. Here, each genetic allele in a patient corresponds to a single 'canonical allele', which in turn may aggregate any number of 'contextual allele' representations that are may be defined against different reference sequences. Accordingly, many contextual alleles can describe a single canonical allele. For example, the contextual alleles “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A” both describe the same underlying canonical allele, a single nucleotide variation, in the BRCA2 gene.
contextual allele
A mode of inheritance whereby manifestation of a trait or condition occurs only when both affected and unaffected mitochondria are inherited (i.e. some mitochondria that do and some that do not contain the causative allele).
heteroplasmic mitochondrial inheritance
A mode of inheritance whereby manifestation of a trait or condition occurs only when affected mitochondria are inherited (i.e. mitochondria containing the causative allele)
Homoplasmic mitochondrial inheritance
molecular function
A biological process whose specific outcome is the progression of an integrated living unit: an anatomical structure (which may be a subcellular structure, cell, tissue, or organ), or organism over time from an initial condition to a later condition. [database_cross_reference: GOC:isa_complete]
developmental process
pulling in HP 'phenotypic abnormality' root here
human phenotypic abnormality
Stub class to serve as root of hierarchy for imports of human developmental stages from the Human Developmental Stages Ontology.
A spatiotemporal region encompassing some part of the life cycle of an organism.
human life cycle stage
information content entity
Examples of information content entites include journal articles, data, graphical layouts, and graphs.
an information content entity is an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity
information_content_entity 'is_encoded_in' some digital_entity in obi before split (040907). information_content_entity 'is_encoded_in' some physical_document in obi before split (040907).
Previous. An information content entity is a non-realizable information entity that 'is encoded in' some digital or physical entity.
PERSON: Chris Stoeckert
OBI_0000142
information content entity
information content entity
ontology metadata
data about an ontology part
where to place this depends on if we take the organismal view or the quality centric view.
mammalian phenotype
Mus musculus
Stub class to serve as root of hierarchy for imports of virus types from relevant ontologies or terminologies.
Viruses
Danio rerio
Oryzias latipes
Homo sapiens
A processual entity that realizes a plan which is the concretization of a plan specification.
Stub class to serve as root of hierarchy for experimental techniques and processes, defined in GENO or imported from ontologies such as OBI and ERO.
planned process
reagent role
a population is a collection of individuals from the same taxonomic class living, counted or sampled at a particular site or in a particular area
population
An assay which generates data about a genotype from a specimen of genomic DNA. A variety of techniques and instruments can be used to produce information about sequence variation at particular genomic positions.
genotyping assay
A genetic transformation that renders a gene non-functional, e.g. due to a point mutation, or the removal of all, or part of, the gene using recombinant methods.
A genetic transformation that involves the insertion of a protein coding cDNA sequence at a particular locus in an organism's chromosome. Typically, this is done in mice since the technology for this process is more refined, and because mouse embryonic stem cells are easily manipulated. The difference between knock-in technology and transgenic technology is that a knock-in involves a gene inserted into a specific locus, and is a "targeted" insertion.
targeted gene knock-out technique
targeted gene knock-in technique
Stub class to serve as root of hierarchy for imports from NCBI Taxonomy.
organism
the introduction. alteration or integration of genetic material into a cell or organism
genetic modification technique
'Value' label chosen here according to http://www.uwgb.edu/heuerc/2D/ColorTerms.html
Was parent of chromosomal band intensity before moving this class to live as a sequence feature attribute.
color value
obsolete_color brightness
female
male
phenotypic sex
A material entity that consists of two or more organisms, viruses, or viroids.
A group of organisms of the same taxonomic group grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role).
collection of organisms
A domestic group, or a number of domestic groups linked through descent (demonstrated or stipulated) from a common ancestor, marriage, or adoption.
family
Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino.
morpholino_oligo
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere.
Formerly http://purl.obolibrary.org/obo/GENO_0000613, replaced by SO term.
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
chromosome arm
'Sequence features' differ from 'sequences' in that instances are identified and distinguished based on both their inherent sequence, and by their position in soome gneommmic coordiante system. Accordingly, the 'ATG' start codon in the CDS of the human AKT gene is the same *sequence' as the 'ATG' start codon in the human SHH gene, but these represent two distinct sequence features in virtue of their different positions when mapped to a reference feature (i.e. a reference gene, chromosome sequence).
An extent of biological sequence (i.e. a positionally defined ordering of units representing monomers of a biological macromolecule). An instance of a sequence feature is identified by both its sequence (inherent ordering of units) and its position (numerical start and stop coordinates based on alignment with some reference feature).
GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria.
1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence.
2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology).
3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location.
sequence feature
true
Formalizes the second identify criteiria for a sequence feature of its genomic position. We use the FALDO model to represent positional information, which links features to positional information through an instance of a Region class that represents the mapping of the feature onto some reference sequence. (But features can also be linked to Positions directly through the location property).
true
Formalizes the first identity criteria for a sequence feature of its sequence.
A region of known length which may be used to manufacture a longer region.
assembly_component
A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases.
contig
0
The point at which one or more contiguous nucleotides were excised.
deleted_sequence
nucleotide deletion
nucleotide_deletion
SO:1000033
SO:0000159
SOFA
http://en.wikipedia.org/wiki/Nucleotide_deletion
deletion
enhancer
A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the basal transcription machinery.
promoter
A region of nucleotide sequence that has translocated to a new position.
transchr
translocated sequence
SO:0000199
DBVAR
translocation
SSLP
simple sequence length polymorphism
simple sequence length variation
SO:0000207
simple_sequence_length_variation
sequence length variation
SO:0000248
sequence_length_variation
See here for a list of engineered regions in ZFIN: http://zfin.org/cgi-bin/webdriver?MIval=aa-markerselect.apg&marker_type=REGION&query_results=t&compare=contains&WINSIZE=25.
Includes things like loxP sites, inducible promoters, ires elements, etc.
engineered_foreign_gene
A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem.
http://en.wikipedia.org/wiki/Microsatellite_%28genetics%29
A defined locus that includes any type of VNTR or SSLP locus.
microsatellite
RNAi_reagent
Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication.
A complete chromosome sequence.
chromosome
The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three".
A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark.
http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here:
chromosome > arm > region > band > sub-band
Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html):
chromosome > arm > band > sub-band > sub-sub-band
"Band' is a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.
chromosome band
centromere
Obsoleted as we didnt want to commit to constructs being plasmids - but rather wanted a classification of more general types of engineered regions used to replicate and deliver sequence to target cells/genomes. Replaced by GENO:0000856 ! engineered genetic construct.
obsolete_engineered_plasmid
The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence.
insertion
nucleotide insertion
nucleotide_insertion
SO:1000034
SO:0000667
DBVAR
SOFA
insertion
SNPs are single base pair positions in genomic DNA at which different sequence alternatives exist in normal individuals in some population(s), wherein the least frequent variant has an abundance of 1% or greater.
single nucleotide polymorphism
SO:0000694
SOFA
SNP
A junction is a boundary between regions. A boundary has an extent of zero.
junction
A gene locus consisting of all sequence elements that facilitate the production of a functional transcript (ie one capable of translation into a protein, or independent functioning as an RNA), when encoded in the genome of some cell or virion. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions. (from SO:gene)
Regarding the distinction between a 'gene' and a 'gene locus': Every zebrafish genome contains a 'gene locus' for every zebrafish gene. The majority are likely to be 'wild-type' or at least functional variants. But some may be variants that are mutated or truncated so as to lack functionality. According to current SO criteria defining genes, a 'gene' no longer exists in the case of a non-functional or deleted variant. But a 'gene locus' does exist - and its extent is that of the remaining/altered sequence as it aligns with the reference gene. Even for completely deleted genes, the gene locus remains (and here is equivalent to the junction corresponding to the where gene would live according to this alignment).
gene
A quantitative trait locus (QTL) is a polymorphic locus which contains alleles that differentially affect the expression of a continuously distributed phenotypic trait. Usually it is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci.
quantitative trait locus
QTL
An attribute to describe a region that was modified in vitro.
engineered
construct
engineered_region
An extended region of sequence corresponding to a defined feature that is a proper part of a chromosome, e.g. a chromosomal 'arm', 'region', or 'band'.
chromosomal feature
gross chromosomal part
chromosome part
A gene that has been transferred naturally or by any of a number of genetic engineering techniques into a cell or organism where it is foreign (i.e. exogenous/extraneous to the host genome).
Transgenes can exist as integrated into the host genome, or extra-chromosomally on replicons or transiently carried/expressed vectors. What matters is that they are active in the context of a foreign biological system (typically a cell or organism).
Note that transgenes as defined here are not necessarily from a different taxon than that of the host genome. For example, a Mus musculus gene over-expressed from a chromosomally-integrated expression construct in a Mus musculus genome qualifies as a transgene because it is exogenous to the endogenous host genome.
transgene
A multiple nucleotide polymorphism with alleles of common length > 1, for example AAA/TTT.
multiple nucleotide polymorphism
SO:0001013
MNP
A variation that increases or decreases the copy number of a given region.
CNP
CNV
copy number polymorphism
copy number variation
SO:0001019
SOFA
http://en.wikipedia.org/wiki/Copy_number_variation
copy_number_variation
A collection of sequence features (typically a collection of chromosomes) that covers the sum genetic material within a cell or virion (where 'genetic material' refers to any nucleic acid that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or can be replicated and inherited by its progeny)
Genotype vs Genome in GENO: An (intrinsic) genotype is an information artifact representing an indirect syntax for specifying a genome sequence. This syntax has reference and variant components - a 'referrence genome' and 'genomic variation complement' - that must be operated on to resolve a specifie genome sequence. Specifically, the genome sequence is resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the 'reference genome'. So, while the total sequence content represented in a genotype may be greater than that in a genome, the intended resolution of these sequences is to arrive at a single genome sequence.
'genome sequence'
A genome is considered the complement of all heritable sequence features in a given cell or organism (chromosomal or extrachromosomal). This is typically a collection of chromosomes, but in some organisms (e.g. bacteria) it may be a single chromosomal entity. For this reason 'genome' classifies under 'sequence feature complement' rather than 'sequence feature collection'.
genome
A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence.
sequence variation
SO:1000004
SO:1000007
SO:0001059
SOFA
A 'sequence alteration' is any variant allele that varies from a reference along their entire extent (i.e. are completely_variant_with some reference). Other variant alleles can differ only at specific altered bases within the complete sequence of the locus. For example, the CFTRdF508 allele of the cystic fibrosis transmembrane receptor gene is a 'variant allele' of the CFTR gene that is several thousand bases long, and contains a deletion of just three nucleotides that comprise the codon for phenylalanine (F) at aa position 508. A sequence alteration, by contrast, is a variant allele that differs in its entirety from another sequence. For example, a SNP is a sequence altertion that represents a variant of a locus with an extent of the one base. Similarly, an insertion of transgenic sequence is a sequence alteration that represents a variant locus that differs along its full extent from the some reference sequence (which in this case is just a junction where the transgene was inserted).
Note that we consider novel loci gained in a genome to be sequence alterations, including aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that becoome part of the heritable gneme of a cell or organism.
sequence_alteration
An insertion that derives from another organism, via the use of recombinant DNA technology.
transgenic insertion
SO:0001218
transgenic_insertion
A region which is the result of some arbitrary experimental procedure. The procedure may be carried out with biological material or inside a computer.
experimental_feature
A construct which is designed to integrate into a genome and produce a fusion transcript between exons of the gene into which it inserts and a reporter element in the construct. Gene traps contain a splice acceptor, do not contain promoter elements for the reporter, and are mutagenic. Gene traps may be bicistronic with the second cassette containing a promoter driving an a selectable marker.
gene_trap_construct
A construct which is designed to integrate into a genome and express a reporter when inserted in close proximity to a promoter element. Promoter traps typically do not contain promoter elements and are mutagenic.
promoter_trap_construct
A construct which is designed to integrate into a genome and express a reporter when the expression from a basic minimal promoter is enhanced by genomic enhancer elements. Enhancer traps contain promoter elements and are not usually mutagenic.
enhancer_trap_construct
SNVs are single base pair positions in genomic DNA at which different sequence alternatives exist.
single nucleotide variant
kareneilbeck
Thu Oct 08 11:37:49 PDT 2009
SO:0001483
SOFA
SNV
A biological_region characterized as a single heritable trait in a phenotype screen. The heritable phenotype may be mapped to a chromosome but generally has not been characterized to a specific gene locus.
heritable_phenotypic_marker
'GRCh37.p10' (a human reference genome build)
A genome that is used as a standard against which other genome sequences are compared, or into which alterations are intentionally introduced.
reference genome
A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.
copy number gain
gain
kareneilbeck
Mon Feb 28 01:54:09 PST 2011
SO:0001742
DBVAR
copy_number_gain
A sequence alteration whereby the copy number of a given region is less than the reference sequence.
copy number loss
loss
kareneilbeck
Mon Feb 28 01:55:02 PST 2011
SO:0001743
DBVAR
copy_number_loss
Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from one parent and no copies of the same chromosome or region from the other parent.
UPD
uniparental disomy
kareneilbeck
Mon Feb 28 02:01:05 PST 2011
SO:0001744
DBVAR
http:http\://en.wikipedia.org/wiki/Uniparental_disomy
UPD
Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the mother and no copies of the same chromosome or region from the father.
maternal uniparental disomy
kareneilbeck
Mon Feb 28 02:03:01 PST 2011
SO:0001745
maternal_uniparental_disomy
Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the father and no copies of the same chromosome or region from the mother.
paternal uniparental disomy
kareneilbeck
Mon Feb 28 02:03:30 PST 2011
SO:0001746
paternal_uniparental_disomy
A structural sequence alteration where there are multiple equally plausible explanations for the change.
complex
kareneilbeck
Wed Mar 23 03:21:19 PDT 2011
SO:0001784
DBVAR
complex_structural_alteration
kareneilbeck
Fri Mar 25 02:27:41 PDT 2011
SO:0001785
DBVAR
structural_alteration
Formerly http://purl.obolibrary.org/obo/GENO_0000067, replaced with SO term.
regulatory element
regulatory gene region
regulatory_region
Any change in genomic DNA caused by a single event.
SO:1000002
SOFA
substitution
When no simple or well defined DNA mutation event describes the observed DNA change, the keyword \"complex\" should be used. Usually there are multiple equally plausible explanations for the change.
complex substitution
SO:1000005
SOFA
complex_substitution
A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence.
point mutation
SO:1000008
SOFA
http://en.wikipedia.org/wiki/Point_mutation
point_mutation
Change of a pyrimidine nucleotide, C or T, into an other pyrimidine nucleotide, or change of a purine nucleotide, A or G, into an other purine nucleotide.
SO:1000009
transition
A substitution of a pyrimidine, C or T, for another pyrimidine.
pyrimidine transition
SO:1000010
pyrimidine_transition
A transition of a cytidine to a thymine.
C to T transition
SO:1000011
C_to_T_transition
The transition of cytidine to thymine occurring at a pCpG site as a consequence of the spontaneous deamination of 5'-methylcytidine.
C to T transition at pCpG site
SO:1000012
C_to_T_transition_at_pCpG_site
T to C transition
SO:1000013
T_to_C_transition
A substitution of a purine, A or G, for another purine.
purine transition
SO:1000014
purine_transition
A transition of an adenine to a guanine.
A to G transition
SO:1000015
A_to_G_transition
A transition of a guanine to an adenine.
G to A transition
SO:1000016
G_to_A_transition
Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G, or vice versa.
SO:1000017
http://en.wikipedia.org/wiki/Transversion
transversion
Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G.
pyrimidine to purine transversion
SO:1000018
pyrimidine_to_purine_transversion
A transversion from cytidine to adenine.
C to A transversion
SO:1000019
C_to_A_transversion
C to G transversion
SO:1000020
C_to_G_transversion
A transversion from T to A.
T to A transversion
SO:1000021
T_to_A_transversion
A transversion from T to G.
T to G transversion
SO:1000022
T_to_G_transversion
Change of a purine nucleotide, A or G , into a pyrimidine nucleotide C or T.
purine to pyrimidine transversion
SO:1000023
purine_to_pyrimidine_transversion
A transversion from adenine to cytidine.
A to C transversion
SO:1000024
A_to_C_transversion
A transversion from adenine to thymine.
A to T transversion
SO:1000025
A_to_T_transversion
A transversion from guanine to cytidine.
G to C transversion
SO:1000026
G_to_C_transversion
A transversion from guanine to thymine.
G to T transversion
SO:1000027
G_to_T_transversion
A sequence alteration which included an insertion and a deletion, affecting 2 or more bases.
SO:1000032
http://en.wikipedia.org/wiki/Indel
Indels can have a different number of bases than the corresponding reference sequence.
indel
One or more nucleotides are added between two adjacent nucleotides in the sequence; the inserted sequence derives from, or is identical in sequence to, nucleotides adjacent to insertion point.
nucleotide duplication
nucleotide_duplication
SO:1000035
duplication
A continuous nucleotide sequence is inverted in the same position.
inversion
SO:1000036
DBVAR
SOFA
inversion
A tandem duplication where the individual regions are in the same orientation.
direct tandem duplication
SO:1000039
direct_tandem_duplication
A tandem duplication where the individual regions are not in the same orientation.
inverted tandem duplication
mirror duplication
SO:1000040
inverted_tandem_duplication
A duplication consisting of 2 identical adjacent regions.
erverted
tandem duplication
SO:1000173
DBVAR
tandem_duplication
Stub class to serve as root of hierarchy for imports of developmental stages from Uberon or taxon specific vocabularies such as ZFIN stages terms)
life cycle stage
Stub class to serve as root of hierarchy for imports of anatomical entities from UBERON, CARO, or taxon-specific anatomy ontologies.
http://purl.obolibrary.org/obo/CARO_0000000
anatomical entity
Stub node that gathers root classes from various taxon-specific phenotype ontologies, as connectors to bringing classes from these ontolgies into the GENO framework.
1. From OGMS: A (combination of) quality(ies) of an organism determined by the interaction of its genetic make-up and environment that differentiates specific instances of a species from other instances of the same species (from OGMS, and used in OBI, but treatment as a quality is at odds with previous OBI discussions and their treatemnt of 'comparative phenotype assessment, where a phenotype is described as a quality or disposition)
2. From OBI calls: quality or disposition inheres in organism or part of an organism towards some growth environment
Phenotype
Animals exhibit variations compared to a given control.
'Variant' is the given label of the root class in the Worm Phenotype ontology. Renamng it here to be consisent with our hierarchy of phenotype classes.
Variant
c. elegans phenotype
worm phenotype
abnormal(ly) malformed endocardium cell
abnormal(ly) absent dorso-rostral cluster
abnormal(ly) disrupted diencephalon development
abnormal(ly) disrupted neutrophil aggregation
abnormal(ly) absent adaxial cell
association
Equivalent to: http://www.informatics.jax.org/marker/MGI:98297
mus musculus shh gene
http://zfin.org/ZDB-GENE-980526-166
danio rerio shha gene
http://zfin.org/ZDB-GENE-040123-1
danio rerio cdkn1ca gene
Equivalent to: http://www.ensembl.org/Gene/Summary?g=ENSG00000164690
Codes for: http://www.uniprot.org/uniprot/Q15465
homo sapiens SHH gene
exploratory term
exemplar term
Initially created such that integrated transgene infers as child of sequence_alteration.