Begum Durgahee
CODAMONO
Chris Mungall
Erick Antezana
Francesco Strozzi
Joachim Baran
Karen Eilbeck
Michel Dumontier
Raoul Bonnal
Robert Hoehndorf
Stephen Kan (GitHub: Helisquinde)
Takatomo Fujisawa
Toshiaki Katayama
Joachim Baran
The Genomic Feature and Variation Ontology (GFVO) is modeled to represent genomic data using the Resource Description Format (RDF). It is captures the contents of data files that adhere to the Generic Feature Format Version 3 (GFF3, http://www.sequenceontology.org/resources/gff3.html), the General Transfer Format (GTF, http://mblab.wustl.edu/GTF22.html), the Genome Variation Format Version 1 (GVF, http://www.sequenceontology.org/resources/gvf.html), and the Variant Call Format (VCF, http://vcftools.sourceforge.net/specs.html). The creation of the ontology was inspired by previous work of Robert Hoehndorf on RDF2OWL (http://code.google.com/p/rdf2owl).
CC0 [https://creativecommons.org/publicdomain/zero/1.0/].
Genomic Feature and Variation Ontology (GFVO)
Best Practices
1.) The class hierarchy, object- and datatype-property hierarchies are modeled to match SIO's hierarchies.
2.) Every class and property has a label (rdfs:label) as well as a description (rdfs:comment). The description may contain an example of how a class/property is being applied.
3.) Classes that are directly related to either GFF3, GTF, GVF or VCF specification have a link-out to the specification's document as provenance indicator (via rdfs:isDefinedBy).
4.) Link-outs to Wikipedia are provided for classes wherever possible.
5.) Ontology terms are encoded in the URIs using camel case, i.e. letters following a white space in an ontology term are capitalized followed by the removal of the white space.
https://github.com/BioInterchange/Ontologies
1.0.6
Links to an entity for which supportive information is being provided.
describes
Links to additional annotations about an entity.
has annotation
Links out to aggregate information for an entity.
has attribute
References an entity or resource that provides supporting/refuting evidence.
has evidence
Denotes the first entity of an ordered part relationship.
has first part
Links out to an identifier.
has identifier
Links out to an entity that is the input of a "Process" subclass.
has input
Denotes the last entity of an ordered part relationship.
has last part
Denotes membership for "Collection", "Catalog" and "File" instances.
has member
Denotes a compositional relationship to other entities, where the ordering of the composition of entities carries meaning.
has ordered part
Links out to an entity that is the output of a "Process" subclass.
has output
Denotes a compositional relationship to other entities.
has part
Denotes the participation of other entities in processes.
has participant
Links out to an entity that provides qualitative information.
has quality
Denotes information origin.
has source
References an entity about which information is provided for.
is about
Denotes that an entity is affected by another entity.
is affected by
Denotes the trailing occurrence or succession of the subject in regards to the object.
is after
Denotes that an entity is an attribute of the entity that this property links out to.
is attribute of
Denotes the leading occurrence or precedence of the subject in regards to the object.
is before
Denotes the process or method that created an entity.
is created by
Provides a description of the subject via reference to an object that provides further information on the subject.
is described by
Denotes the location of genomic feature on a landmark.
is located on
Denotes that an entity is an intrinsic component of an encapsulating entity.
is part of
Denotes participation with another entity.
is participant in
References an entity or resource that provides refuting evidence.
is refuted by
Denotes that an entity is the source of the entity that this property links out to.
is source of
Denotes spatio-temporal relations to other entities.
is spatiotemporally related to
References an entity or resource that provides supporting evidence.
is supported by
Denotes a temporarily constraint "isPartOf" relationship. The temporal restriction expresses that the relationship is not universally true.
This property can be used to express "Derives_from" relations in GFF3.
is temporarily part of
References another entity or resource.
references
References an entity, where additional information is provided to augment the reference.
refers to
Representation of any literal that is associated with a GFVO class instance. Domain restrictions might apply. For example, "Codon Sequence" entities restrict "has value" to be a non-empty string consisting of A, C, G, or T letters, and whose length is a multiple of 3.
has value
Alias
An alias is an alternative name for an entity. The use of an alias is mostly secondary, whereas instances of the "Name" class should be used to denote primary names.
Encodes for the "Alias" attribute in GFF3 and GVF.
Alias
http://en.wikipedia.org/wiki/Pseudonym
AlleleCount
Count of a specific allele in genotypes.
Encodes for "AC" additional information in VCF files.
Allele Count
0.0
1.0
0.0
1.0
AlleleFrequency
Proportion of a particular gene allele in a gene pool or genotype.
Encodes for "AF" additional information in VCF files.
Allele Frequency
http://en.wikipedia.org/wiki/Allele_frequency
[A-Z]
AminoAcid
"Amino Acid" encodes for the "Variant_aa" and "Reference_aa" attributes in GVF files. Linking an "Amino Acid" instance to a "Reference Sequence" or "Sequence Variant" instance denotes the genomic context of the amino acid.
Amino Acid
http://en.wikipedia.org/wiki/Amino_acid
AncestralSequence
Denotes an ancestral allele of a feature.
May be used to denote the "ancestral allele" ("AA" additional information) of VCF formatted files.
Ancestral Sequence
http://en.wikipedia.org/wiki/Ancestral_reconstruction
ArrayComparativeGenomicHybridization
Feature provenance is based on array-comparative genomic hybridization.
Used by the "data-source" structured pragma in GVF.
Array Comparative Genomic Hybridization
Attribute
An attribute denotes characteristics of an entity. At this stage, "Quality" is the only direct subclass of "Attribute", whose subclasses denote qualitative properties such as sex ("Female", "Male", "Hermaphrodite"), zygosity ("Hemizygous", "Heterozygous", "Homozygous"), etc.
The object property "has quality" (or subproperties thereof) should be utilized to express qualities of entities. The "hasAttribute" object property should be used to denote relationships to "Object" or "Process" instances, unless there is a better object property suitable to represent the relationship between entities.
Attribute
AverageCoverage
Average coverage depth for a genomic locus (a region or single base pair), i.e. the average number of reads representing a given nucleotide in the reconstructed sequence.
Captures the "technology-platform" structured pragma in GVF files ("Average_coverage" tag).
Average Coverage
http://en.wikipedia.org/wiki/Shotgun_sequencing#Coverage
BaseQuality
Root mean square base quality.
Accounts for "BQ" additional information in VCF files.
Base Quality
BiologicalEntity
A biological entity an entity that contains genomic material or utilizes genomic material during its existance. Genomic material itself is represented as subclasses of "Chemical Entity".
Biological Entity
BiopolymerSequencing
Information about features and variants is based on biopolymer sequencing.
This class is not directly instantiated, but its subclasses "DNA Sequencing" and "RNA Sequencing" are used to describe the "data-source" structured pragma in GVF.
Biopolymer Sequencing
http://en.wikipedia.org/wiki/Sequencing
Breakpoint
A breakpoint describes the source or destination of a zero-length sequence alteration. These alterations are typically insertions, deletions or translocations according to the GVF specification (see "Breakpoint_detail" in http://sequenceontology.org/resources/gvf.html). Breakpoint coordinates should be provided using classes of the Feature Annotation Location Description Ontology.
The class encodes for the "Breakpoint_detail" and "Breakpoint_range" attributes in GVF.
Breakpoint
Catalog
A catalog is a specialization of a "Collection", where all of its contents are of the same type. The requirement of same type cannot be enforced formally via this ontology; data providers need to verify this condition manually or programmatically, or alternatively, use the more generic "Collection" class instead.
Catalog
Cell
A cell is a biological unit that in itself forms a living organism or is part of a larger organism that is composed of many other cells. The subclasses "Germline Cell" and "Somatic Cell" can be used to denote the biological material that was used in an experiment.
Cell
http://en.wikipedia.org/wiki/Cell_(biology)
ChemicalEntity
A chemical entity is an entity related to biochemistry. This class is typically not instantiated, but instead, its subclasses "Amino Acid", "Chromosome", "Peptide Sequence", etc., are used to represent specific chemical entities.
Chemical Entity
Chromosome
A chromosome can be used as an abstract representation of a (not necessarily named) chromosome to represent ploidy within a data set. The "Chromosome" instance is then used for for denoting the locus of phased genotypes.
For placing genomic features ("Feature" class instances) on a chromosome, contig, scaffold, etc., please see the "Landmark" class. It is encouraged that Sequence Ontology terms are used to annotate a Landmark with a biological type (s.a. "chromosome", "contig", etc.).
Encodes for "sequencing-scope" pragma in GVF.
Chromosome
http://en.wikipedia.org/wiki/Chromosome
CircularHelix
A circular helix structure.
Can be used to indicate a true "Is_circular" attribute in GFF3 and GVF.
Circular Helix
http://en.wikipedia.org/wiki/Circular_DNA
0
2
CodingFrameOffset
Coding frame offset of a genomic feature that is a coding sequence or other genomic feature that contributes to transcription and translation. A feature's coding frame offset can be either 0, 1, or 2.
It is referred to as "frame" in GTF, but called "phase" in GFF3 and GVF. "phase" is defined in GVF, but unused.
Coding Frame Offset
http://en.wikipedia.org/wiki/Reading_frame
([ACGT]{3})+
CodonSequence
A codon sequence is a nucleotide sequence underlying a potential amino acid sequence. Codon sequences are three bases of length or multiples thereof.
Encodes for "Variant_codon" and "Reference_codon" attributes in GVF.
Codon Sequence
http://en.wikipedia.org/wiki/Codon
Collection
A collection is a container for genomic data. A collection may contain information about genomic data including -- but not limited to -- contents of GFF3, GTF, GVF and VCF files. The latter are better represented by "File" class instances, whereas the result of unions or intersections between different "File" class instances should be captured within this format-independent "Collection" class. When importing data whose provenance is not file based, instances of "Collection" should be utilized (e.g., database exports).
Collection
Comment
A comment is a remark about a piece of information, an observation or statement. In the context of GFF3, GVF, etc., genomic feature and variation descriptions, "isAfter" and "isBefore" relationships should be used to indicate where a comment is situated between pragma or feature statements of GFF3, GTF, GVF or VCF files.
For specific descriptions or textual annotations of genomic features, the use of the "Note" class is encouraged.
Comment
ConditionalGenotypeQuality
Conditional genotype quality expressed in form of a "Phred Score". It denotes the score of the genotype being wrong under the assumption/condition that the genomic site is a sequence variant.
Encodes for "GQ" additional information in VCF files.
Conditional Genotype Quality
Contig
A contig is a contiguous DNA sequence that has been assembled from shorter overlapping DNA segments. "Contig" is a specialization of a "Collection" and should be used to aggregate features, but not for indicating that a "Landmark" is representing a contig. It is encouraged that the latter is annotated by a term of the Sequence Ontology.
Encodes for "sequencing-scope" in GVF and the "contig" information field in VCF.
Contig
http://en.wikipedia.org/wiki/Contig
0.0
0.0
0
Coverage
Number of nucleic acid sequence reads for a particular genomic locus (a region or single base pair).
Accounts for "DP" additional information in VCF files.
Coverage
DNAMicroarray
Feature information is based on DNA microarray probes.
Used by the "data-source" structured pragma in GVF.
DNA Microarray
http://en.wikipedia.org/wiki/DNA_microarray_experiment
([ACGT])+
DNASequence
A DNA sequence is a sequence of nucleic acids.
It can be used to describe "FASTA" annotations in both GFF3 and GVF files as well as short sequences in VCF files.
DNA Sequence
http://en.wikipedia.org/wiki/Dna_sequence
DNASequencing
Information about features and variants is based on DNA sequencing.
Used by the "data-source" structured pragma in GVF.
DNA Sequencing
http://en.wikipedia.org/wiki/DNA_sequencing
Exome
Representation of an exome. Features that constitute the exome may be linked via one or more "Collection", "Catalog", "Contig", "Scaffold" or "File" instances.
"Exome" can be used for describing data contents of the "sequencing-scope" pragma in GVF files.
Exome
http://en.wikipedia.org/wiki/Exome
ExperimentalMethod
An experimental method is a procedure that yields an experimental outcome (result). Experimental methods can be in vivo, in vitro or in silico procedures that are well described and can be referenced.
Encodes for "source" column contents of GFF3, GTF, and GVF file formats as well as the "CHROM" column in VCF. Can be used to describe the "capture-method" pragma in GVF; it can describe "VALIDATED" additional information in VCF.
Experimental Method
ExternalReference
A cross-reference to associate an entity to a representation in another database.
Encodes for the "Dbxref" attribute in GFF3 and GVF. Can be used to describe the contents of the "source" column in GTF files. Captures the "genome-build" pragma, "source-method", "attribute-method", "phenotype-description", and "phased-genotypes" structured pragmas in GVF. Accounts for the "assembly" and "pedigreeDB" information fields, and "DB", "H2", "H3", "1000G" additional information in VCF.
External Reference
Feature
The feature class captures information about genomic sequence features and variations. A genomic feature can be a large object, such as a chromosome or contig, down to single base-pair reference or variant alleles.
Feature
Female
Denoting sex of a female individual, who is defined as an individual producing ova.
This quality can be used to encode for the "sex" pragma in GVF files.
Female
http://en.wikipedia.org/wiki/Female
File
A file represents the contents of a GFF3, GTF, GVF or VCF file. It can capture genomic meta-data that is specific to any of these file formats. The result of unions, intersections or other operations between "File" class instances should be capture with the generic "Collection" class, which is format independent.
File
ForwardReferenceSequenceFrameshift
Denotes a frameshift forward in the reference sequence.
Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF.
Forward Reference Sequence Frameshift
FragmentReadPlatform
Details about the fragment-read (single-end read) sequencing technology used to gather the data in a set.
Encodes for the "technology-platform-read-type" pragma in GVF.
Fragment Read Platform
FunctionalSpecification
A functional specification of bioinformatics data, i.e. the specification of genomic material that potentially has biological function. This class should not be directly instantiated, but instead, its subclass "Genotype" should be used.
Functional Specification
GameticPhase
Denotes the presence of information that required capturing the gametic phase. For diploid organisms, this quality indicates that information is available about which chromosome of a chromosome pair contributed data.
"Gametic Phase" encodes for the "Phase" attribute in GVF. It encodes for "GT" and "PS" additional information in VCF.
Gametic Phase
http://en.wikipedia.org/wiki/Gametic_phase
Genome
Representation of a genome. Genomic features that constitute the genome may be linked via one or more "Collection", "Catalog", "Contig", "Scaffold" or "File" instances.
"Genome" can be used for describing data contents of the "genome-build" and "sequencing-scope" pragmas in GVF files.
Genome
http://en.wikipedia.org/wiki/Genome
GenomeAnalysis
A genome analysis denotes the type of procedure that was carried out to derive information from a genome assembly.
"Genome Analysis" can be instantiated for cases where an application of "FILTER" in VCF cannot be linked to "Genotyping" or "Variant Calling", which are subclasses of "Genome Analysis". If possible, further annotation should be provided to indicate the actually utilized filter type.
Genome Analysis
http://en.wikipedia.org/wiki/Genomics#Genome_analysis
GenomicAscertainingMethod
Provides information about the source of data.
Subclasses of the class can be used to encode for the "data-source" structured pragma in GVF.
Genomic Ascertaining Method
Genotype
The genotype is the genetic information captured in a particular genome. It can also refer to one or more populations, if statistical distributions are provided that assign genetic codes to groups of individuals.
A genotype is denoted by a string of slash-separated list of alleles ("has value" property). The length of the list is dependent on the ploidy of the studied species as well as sequencing technique used.
Example: "A/G" denotes a genotype with alleles "A" and "G".
Encodes for the "Genotype" attribute in GVF and "GT" additional information in VCF.
Genotype
http://en.wikipedia.org/wiki/Genotype
Genotyping
Genotyping is the process of determining the genetics of an individual or sample. The genotype itself is expressed as the difference of genetic mark-up compared to a reference genome.
Applicable to the "FILTER" information field in VCF.
Genotyping
http://en.wikipedia.org/wiki/Genotyping
GermlineCell
The germline feature class captures information about genomic sequence features arising from germline cells.
VCF files permit the explicit annotation with "Somatic Cell" via "SOMATIC" additional information. The absence of that field does not imply the presence of germine cell material though. Describes the "genomic-source" pragma in GVF.
Germline Cell
http://en.wikipedia.org/wiki/Germline
Haplotype
A "Haplotype" is a collection of "Genotype" or "Sequence Variant" instances. It can imply that a set of genes is inherited as a group, or alternatively, that the set of genotypes or sequence variance has a biological function when acting together (e.g., there exists a disease association).
Haplotype instances should only catalog a single type, i.e. either "Genotype" or "Sequence Variant" instances; they should not mix both types (see also "Catalog").
Encodes for "HQ" additional information in VCF.
Haplotype
http://en.wikipedia.org/wiki/Haplotype
HelixStructure
"Helix Structure" denotes the physical shape of biopolymers.
The subclasses "Circular Helix" and "Watson-Crick Helix" can be used to for encoding the "Is_circular" attribute in GFF3 and GVF files.
Helix Structure
http://en.wikipedia.org/wiki/DNA_helix
Hemizygous
A sequence alteration with hemizygous alleles.
This quality can be used to directly encode for the "Zygosity" attribute in GVF files and it indirectly describes genotypes in VCF files.
Hemizygous
http://en.wikipedia.org/wiki/Zygosity#Hemizygous
Heritage
Heritage denotes the passing of traits from parents or ancestors. Passed traits may not be visible as a phenotype, but instead, might only manifest as genetic inheritance.
Heritage
http://en.wikipedia.org/wiki/Heredity
Hermaphrodite
Denoting sex of an individual that contains both male and female gametes.
Hermaphrodite
http://en.wikipedia.org/wiki/Hermaphrodite
Heterozygous
A sequence alteration with heterozygous alleles.
This quality can be used to directly encode for the "Zygosity" attribute in GVF files and it indirectly describes genotypes in VCF files.
Heterozygous
http://en.wikipedia.org/wiki/Zygosity#Heterozygous
Homozygous
A sequence alteration with homozygous alleles.
This quality can be used to directly encode for the "Zygosity" attribute in GVF files and it indirectly describes genotypes in VCF files.
Homozygous
http://en.wikipedia.org/wiki/Zygosity#Homozygous
Identifier
An identifier labels an entity with preferably a single term that is interpreted as an accession. An accession labels entities that are part of a collection of similar type. More generic naming of entities can be achieved using the "Label" class.
Encodes for the "seqid" column in GFF3 and GVF; encodes for the "seqname" column in GTF and "CHROM" column in VCF. Captures the "ID" attribute in GFF3 and GVF. Suitable for expression values of "individual-id" and "technology-platform-machine-id" pragmas in GVF. Encodes for the "ID" key/value property in VCF.
Identifier
http://en.wikipedia.org/wiki/Identifier
InformationContentEntity
An information content entity is a data structure or data type that requires background information or specific domain knowledge to be interpreted correctly. Information content entities can be of simple structure, such as "Label" that only requires the application of "has value" to be meaningful, or, they can be of complex structure such as "Locus" which becomes meaningful with multiple FALDO annotations.
Information Content Entity
Label
A label is a term or short list of terms that describe an entity for the purpose of distinguishing it from entities of similar type. It should be considered to utilize the "Identifier" class, if labels of entities are sufficiently unique to actually identify them.
Encodes for the "PEDIGREE" information field in VCF.
Label
Landmark
A landmark establishes a coordinate system for features. Landmarks can be chromosomes, contigs, scaffolds or other constructs that can harbor "Feature" class instances. For expressing ploidy within a data set, please refer to the "Chromosome" class.
To annotate a landmark with a biological type, it is encouraged to use terms of the Sequence Ontology, but not the classes "Chromosome", "Scaffold" and "Contig". The latter classes are used for describing ploidy within a dataset as well as offering means of data aggregation.
Encodes for the "seqid" column in GFF3 and GVF; encodes for the "seqname" column in GTF and "CHROM" column in VCF. Captures the "sequence-region" pragma in in GFF3 and GVF as well as their "FASTA" annotation. Encodes for "DNA", "RNA" and "Protein" "##"-lines in GTF. Captures the "contig" information field in VCF.
Landmark
Likelihood
A "Likelihood" is a probability of a certain even occurring.
For use with "GL" and "GP" additional information in VCF files.
Likelihood
LikelihoodOfHeterogeneousPloidy
"Likelihood of Heterogeneous Ploidy" expresses the likelihood of genotypes in absence of copy number data.
Specifically designed to encode for values of "GLE" additional information in VCF files.
Likelihood of Heterogeneous Ploidy
Locus
A locus refers to a position within a designated genomic landmark. Actual locus coordinates should be provided using classes of the Feature Annotation Location Description Ontology.
The class encodes for the "start", "end" and "strand" columns in GFF3, GTF, and GVF and for the "POS" column in VCF. It also encodes the "Start_range" and "End_range" attributes in GVF.
Locus
http://en.wikipedia.org/wiki/Locus_(genetics)
Male
Denoting sex of a male individual, who is defined as an individual producing spermatozoa.
This quality can be used to encode for the "sex" pragma in GVF files.
Male
http://en.wikipedia.org/wiki/Male
MappingQuality
Root mean square mapping quality.
Encodes values of the "MQ" additional information in VCF files.
Mapping Quality
Match
Denotes a match between the reference sequence and target sequence.
Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF.
Match
MaterialEntity
A material entity represents a physical object. In the context of genomic features and variations, material entities are cells, organisms, sequences, chromosomes, etc.
"Material Entity" should not be instantiated as such, instead, it is suggested that its subclasses "Genome", "DNA Sequence", "Sample Count", etc. are appropriated.
Material Entity
MaternalHeritage
Maternal heritage is the passing of traits from a female to her ancestors.
Currently ununsed but might be applicable to phased genotypes in GVF and VCF files; included for future use.
Maternal Heritage
http://en.wikipedia.org/wiki/Maternal_effect
Name
A name assigns an entity a non-formal term (or multiples thereof) that can provide information about the entities identity. Unlike an "Identifier", a name should not be considered unique.
Encodes for the "feature" column in GTF. Captures the "genome-build" pragma in GFF3 and GVF. Captures the "population", "technology-platform-name" pragmas in GVF.
Name
Note
A note is a short textual description about an entity. It provides a formal or semi-formal description of an entity, as opposed to a "Comment".
Encodes for the "sample-description" pragma and "Comment" key/value pairs in structured attributes in GVF. Captures "Description" key/value pairs in information fields and "SB" information field in VCF.
0
NumberOfReads
Number of reads supporting a particular feature or variant.
Can encode for "MQ0" additional information in VCF files, if additional annotations are provided to denote a mapping quality of zero for the given count. In GVF files, the class accounts for the "Variant_reads" attribute.
Number of Reads
Object
An object is a concrete entity that realizes a concept and encapsulates data associated with said concept. Objects are typically representing tangible entities, such as "Chromosome", "DNA Sequence", but also objects such as "Identifier", "Average Coverage" or other computational or mathematical entities.
Since an object describes a large body of entities, its use is discouraged. Where applicable, one of its subclasses should be used instead.
Object
PairedEndReadPlatform
Details about the paired-end read sequencing technology used to gather the data in a set.
Encodes for the "technology-platform-read-type" pragma in GVF.
Paired End Read Platform
PaternalHeritage
Paternal heritage is the passing of traits from a male to his ancestors.
Currently ununsed but might be applicable to phased genotypes in GVF and VCF files; included for future use.
Paternal Heritage
http://en.wikipedia.org/wiki/Maternal_effect#Paternal_effect_genes
([A-Z])+
PeptideSequence
A peptide sequence is an ordered sequence of amino acid residues, but which may not necessarily be a protein sequence. For encoding sequences of proteins, the subclass "Protein Sequence" should be used.
Encodes for "FASTA" annotation in GFF3 and GVF.
Peptide Sequence
http://en.wikipedia.org/wiki/Peptide_sequence
Phenotype
A phenotype description represents additional information about a sequenced individual's phenotype. A sequenced individual is represented by instances of the "Sequenced Individual" class.
Encodes for the "phenotype-description" structured pragma in GVF.
Phenotype
http://en.wikipedia.org/wiki/Phenotype
0
PhredScore
The Phred score can be used to assign quality scores to base calls of DNA sequences.
GVF supports the use of Phred scores in the "score" column, but this information needs to be obtained/given by the data provider. In VCF files, the "QUAL" column and "PL", "HQ", and "PQ" additional information carries Phred scores that can be encoded as "Phred Score".
GFVO's "Score" and "Phred Score" cannot be defined as equivalent to the Sequence Ontology terms "score" (SO:0001685) and "quality_value" (SO:0001686) due to differences in inheritance between the two ontology implementations. GFVO defines "Phred Score" as a subclass of "Score", but the Sequence Ontology defines "score" as a sibling of "quality_value".
Phred Score
http://en.wikipedia.org/wiki/Phred_score
PrenatalCell
A prenatal feature is purportedly associated with prenatal cells; the GVF specification declares this feature type under the prama directive "##genomic-source", but does not describe its semantics and the referenced Logical Observation Identifiers Names and Codes (LOINC, http://loinc.org), do not define the meaning or intended usage of the term "prenatal" either.
Prenatal Cell
http://en.wikipedia.org/wiki/Prenatal
true
Process
A process denotes a temporally dependent entity. It can be thought of as a function, where input data is transformed by an algorithm to produce certain output data.
Since a process describes a large number of entities, its direct use is discouraged. At least "Experimental Method" or one of its subclasses should be used instead.
Process
ProteinSequence
A protein sequence is a peptide sequence which represents the primary structure of a protein.
Encodes for "sequencing-scope" pragma in GVF.
Protein Sequence
http://en.wikipedia.org/wiki/Peptide_sequence
Proteome
Representation of a proteome. Features that constitute or contribute to the proteome may be linked via one or more "Collection", "Catalog", "Contig", "Scaffold" or "File" instances.
It is envisioned that "Proteome" could be used for describing data contents of the "sequencing-scope" pragma in GVF files.
Proteome
http://en.wikipedia.org/wiki/Proteome
Quality
Quality is a specific attribute that is strongly associated with an entity, but whose values are varying and disjunct. Qualities are finite enumerations, such as sex ("Female", "Male", "Hermaphrodite"), heritage ("Maternal", "Paternal"), but they also make use of the "has value" datatype property such as "Coding Frame Offset" (either "0", "1" or "2").
For encoding numerical qualities, see "Base Quality" and "Mapping Quality", or, "Phred Score" and "Conditional Genotype Quality", which are sub-classes of "Score".
Quality
Quantity
A property of a phenomenon, body, or substance, where the property has a value that can be expressed by means of a number. This class is typically not directly instantiated, but instead, its subclasses "Allele Frequency", "Average Coverage", etc. are used.
Quantity
RNASequencing
Information about features and variants is based on RNA sequencing.
Used by the "data-source" structured pragma in GVF.
RNA Sequencing
http://en.wikipedia.org/wiki/RNA-Seq
ReferenceSequence
Denotes the reference sequence of a feature. The reference sequence is of importance when dealing with genomic variation data, which is expressed by the "Variant" class.
Encodes for the "Reference_seq" and "Sequence_context" attributes in GVF and the "REF" column in VCF.
Reference Sequence
http://en.wikipedia.org/wiki/Reference_sequence
ReferenceSequenceGap
Denotes a gap in the reference sequence for an alignment.
Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF.
Reference Sequence Gap
ReverseReferenceSequenceFrameshift
Denotes a frameshift backwards (reverse) in the reference sequence.
Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF.
Reverse Reference Sequence Frameshift
Sample
A sample is a limited quantity of a chemical entity, which is typically used (destructively/non-desctructively) in a scientific analysis or test.
It can be applied to describe contents of the "sample-description" pragma in GVF files or the "SAMPLE" information field in VCF files.
Sample
http://en.wikipedia.org/wiki/Sample_(material)
SampleCount
Number of samples in the dataset.
Encodes for "NS" additional information in VCF files.
Sample Count
0.0
1.0
0.0
1.0
SampleMixture
Sample mixture determines the proportion of various tissues/cell types in a biological sample that has been taken as part of a biopsy. The sum of various sample mixtures belonging to the same sample should equal 1.
Expresses the "Mixture" key/value pair in "SAMPLE" fields in VCF.
Sample Mixture
http://en.wikipedia.org/wiki/Biopsy
Scaffold
A scaffold is the aggregation of multiple contigs to form a larger continuous sequencing region. "Scaffold" is a specialization of a "Collection" and should be used to aggregate features, but not for indicating that a "Landmark" is representing a scaffold. It is encouraged that the latter is annotated by a term of the Sequence Ontology.
Encodes for "sequencing-scope" in GVF.
Scaffold
http://en.wikipedia.org/wiki/Contig
Score
A measure that permits the ranking of entities.
Directly encodes for the "score" column in GFF3, GTF and GVF files; if the actual scoring algorithm is known, then "Phred Score" might be used to encode for the values of the "score" column in GVF files. The class can encapsule information of the "score-method" pragma in GVF files. For VCF files, the subclasses "Phred Score" and "Conditional Genotype Quality" should be used.
GFVO's "Score" and "Phred Score" cannot be defined as equivalent to the Sequence Ontology terms "score" (SO:0001685) and "quality_value" (SO:0001686) due to differences in inheritance between the two ontology implementations. GFVO defines "Phred Score" as a subclass of "Score", but the Sequence Ontology defines "score" as a sibling of "quality_value".
Score
http://en.wikipedia.org/wiki/Score_(statistics)
([ACGTUWSMKRYBDHVN\-]+|\~[0-9]*|\.|!|\^)
Sequence
A sequence provides information about any biopolymer sequences. Specialized subclasses are provided to denote specialized instances of sequences, such as "Codon Sequence", "Reference Sequence", "Protein Sequence", etc.
Can be used to encode for the "sequencing-scope" pragma in GVF files. See subclasses for applications in both GFF3 and GVF files.
Sequence
SequenceAlignment
A sequence alignment denotes the congruence of two sequences.
In GFF3/GVF, a sequence alignment can be a nucleotide-to-nucleotide or protein-to-nucleotide alignment (see "The Gap Attribute", http://sequenceontology.org/resources/gff3.html). "Alignment Operation" class instances denote the actual steps that the constitute the sequence alignment.
Encodes for the "Target" attribute in GFF3/GVF files as well as the "sequence-alignment" pragma in GVF files. Can encode "CIGAR" additional information of VCF files.
Sequence Alignment
http://en.wikipedia.org/wiki/Sequence_alignment
SequenceAlignmentOperation
A sequence alignment operation captures the type of alignment (see "Sequence Alignment") between a reference sequence and target sequence. Note that a "Sequence Alignment Operation" is situated in a linked list, where the order of the alignment operations is of significance.
Its subclasses are used to encode for the "Target" attribute and "sequence-alignment" pragma in GVF, and, they encode "CIGAR" additional information in VCF.
Sequence Alignment Operation
http://en.wikipedia.org/wiki/Sequence_alignment
SequenceVariant
Describing specific sequence alterations of a genomic feature. A variant is related to "Reference" class instances, which denote the sequence that serves as a basis for sequence alteration comparisons.
Encodes for the "Variant_seq" attribute in GVF. Captures the "ALT" column and "ALT" information field in VCF.
Sequence Variant
http://en.wikipedia.org/wiki/Mutation
SequencedIndividual
An abstract representation of a particular individual for representing aggregated sequencing information. "Sequenced Individual" can also be used to denote complex heritage relationships in genomic samples.
Encodes for the "individual" attribute and "multi-individual" pragma in GVF.
Sequenced Individual
SequencingTechnologyPlatform
Details about the sequencing/microarray technology used to gather the data in a set.
Encodes for the "technology-platform-class" pragma and is composite for aggregating information of the pragma statements "technology-platform-name", "technology-platform-version", "technology-platform-machine-id", "technology-platform-read-length", "technology-platform-read-type", "technology-platform-read-pair-span", "technology-platform-average-coverage", as well as the structured pragma "technology-platform" in GVF.
Sequencing Technology Platform
http://en.wikipedia.org/wiki/Read_(Biology)
Sex
Biological sex of a sequenced individual. Subclasses "Female" and "Male" can be used to encode for the "sex" pragma in GVF files. The subclass "Hermaphrodite" is included for potential future use cases.
Sex
http://en.wikipedia.org/wiki/Sex
SomaticCell
The somatic feature class captures information about genomic sequence features arising from somatic cells.
Encodes for "genomic-source" pragma in GVF and "SOMATIC" additional information in VCF.
Somatic Cell
http://en.wikipedia.org/wiki/Somatic
0
Span
A span is an attribute denoting the number of nucleotides or peptides that an entity covers.
This is directly used in conjunction with "Sequence Alignment Operation" subclasses to express the number of nucleotides a sequence alignment match ranges over, which can be used in conjunction with GFF3/GVF files. The class also covers "technology-platform-read-length" and "technology-platform-read-pair-span" pragmas in GVF files.
Span
TargetSequenceGap
Denotes a gap in the target sequence for an alignment.
Encodes for the "Target" attribute in GFF3, encodes for "Target" attribute and "sequence-alignment" pragma in GVF, and, encodes "CIGAR" additional information in VCF.
Target Sequence Gap
TotalNumberOfAlleles
Total number of alleles in called genotypes.
Encodes for "AN" additional information in VCF files.
Total Number of Alleles
TotalNumberOfReads
Total number of reads covering a feature or variant.
Covers the "Total_reads" attribute in GVF files and the "DP" additional information field in VCF files.
Total Number of Reads
VariantCalling
Denotes the technique of calling genomic feature variants in a genome assembly.
Applicable to the "FILTER" information field in VCF as well as the "variant-calling" pragma in GVF.
Variant Calling
http://en.wikipedia.org/wiki/SNV_calling_from_NGS_data
Version
A "Version" names a release of a software, dataset, or other resource. Versions can follow the common "major.minor.patch" version format, but are not restricted in any way. The version can also incorporate the dataset name (e.g., "HGNC19").
Encodes for the "gff-version" and "gvf-version" pragma statements in GFF3 and GVF, respectively. Encodes for the "gff-version" "##"-line type in GTF. Captures the "fileformat" meta-information line in VCF. Encodes for "file-version" and "technology-platform-version" pragmas in GVF.
Version
http://en.wikipedia.org/wiki/Versioning
WatsonCrickHelix
Helical structure as first proposed by James Watson and Francis Crick, whose work was greatly influenced by discoveries of Rosalind Franklin and Maurice Wilkins.
Can be used to indicate a false "Is_circular" attribute in GFF3 and GVF.
Watson-Crick Helix
http://en.wikipedia.org/wiki/Non-helical_models_of_DNA_structure#Proposal_of_Watson.E2.80.93Crick_helical_structure
Zygosity
Zygosity denotes the similarities of a specific allele in the genome of an organism. Subclasses can be utilized to directly encode zygosity (e.g., "Zygosity" attribute in GVF files), or, encode zygosity indirectly by inferring it from genotype descriptions (case in VCF files).
Zygosity
http://en.wikipedia.org/wiki/Zygosity
1.0.6
Begum Durgahee
CODAMONO
Chris Mungall
Erick Antezana
Francesco Strozzi
Joachim Baran
Karen Eilbeck
Michel Dumontier
Raoul Bonnal
Robert Hoehndorf
Stephen Kan (GitHub: Helisquinde)
Takatomo Fujisawa
Toshiaki Katayama
Joachim Baran
The Genomic Feature and Variation Ontology (GFVO) is modeled to represent genomic data using the Resource Description Format (RDF). It is captures the contents of data files that adhere to the Generic Feature Format Version 3 (GFF3, http://www.sequenceontology.org/resources/gff3.html), the General Transfer Format (GTF, http://mblab.wustl.edu/GTF22.html), the Genome Variation Format Version 1 (GVF, http://www.sequenceontology.org/resources/gvf.html), and the Variant Call Format (VCF, http://vcftools.sourceforge.net/specs.html). The creation of the ontology was inspired by previous work of Robert Hoehndorf on RDF2OWL (http://code.google.com/p/rdf2owl).
CC0 [https://creativecommons.org/publicdomain/zero/1.0/].
Genomic Feature and Variation Ontology (GFVO)
Best Practices
1.) The class hierarchy, object- and datatype-property hierarchies are modeled to match SIO's hierarchies.
2.) Every class and property has a label (rdfs:label) as well as a description (rdfs:comment). The description may contain an example of how a class/property is being applied.
3.) Classes that are directly related to either GFF3, GTF, GVF or VCF specification have a link-out to the specification's document as provenance indicator (via rdfs:isDefinedBy).
4.) Link-outs to Wikipedia are provided for classes wherever possible.
5.) Ontology terms are encoded in the URIs using camel case, i.e. letters following a white space in an ontology term are capitalized followed by the removal of the white space.
https://github.com/BioInterchange/Ontologies
1.0.6