A formal ontology in the domain of biological and clinical statistics
1.58
Alfred Hero
Anna Maria Masci
Barry Smith
Chris Stoeckert
Jie Zheng
Marcy Harris
OBCS stands for the Ontology of Biological and Clinical Statistics. OBCS is an ontology in the domain of biological and clinical statistics. It is aligned with the Basic Formal Ontology (BFO) and the Ontology for Biomedical Investigations (OBI). OBCS imports all possible biostatistics terms in OBI and includes many additional biostatistics terms, some of which were proposed and discussed in the OBI face-to-face workshop in Ann Arbor in 2012.
OBCS: Ontology of Biological and Clinical Statistics
OWL-DL
Yongqun "Oliver" He
Yu Lin
BFO CLIF specification label
BFO CLIF specification label
Person:Alan Ruttenberg
Really of interest to developers only
Relates an entity in the ontology to the term that is used to represent it in the the CLIF specification of BFO2
NIAID GSCID-BRC alternative term
An alternative term used by the National Institute of Allergy and Infectious Diseases (NIAID) Genomic Sequencing Centers for Infectious Diseases (GSCID) and Bioinformatics Resource Centers (BRC).
NIAID GSCID-BRC alternative term
NIAID GSCID-BRC metadata working group
PERSON: Chris Stoeckert, Jie Zheng
Description
Description
An account of the content of the resource.
Description may include but is not limited to: an abstract,
table of contents, reference to a graphical representation
of content or a free-text account of the content.
definition
textual definition
definition
definition
2012-04-05:
Barry Smith
The official OBI definition, explaining the meaning of a class or property: 'Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions' is terrible.
Can you fix to something like:
A statement of necessary and sufficient conditions explaining the meaning of an expression referring to a class or property.
Alan Ruttenberg
Your proposed definition is a reasonable candidate, except that it is very common that necessary and sufficient conditions are not given. Mostly they are necessary, occasionally they are necessary and sufficient or just sufficient. Often they use terms that are not themselves defined and so they effectively can't be evaluated by those criteria.
On the specifics of the proposed definition:
We don't have definitions of 'meaning' or 'expression' or 'property'. For 'reference' in the intended sense I think we use the term 'denotation'. For 'expression', I think we you mean symbol, or identifier. For 'meaning' it differs for class and property. For class we want documentation that let's the intended reader determine whether an entity is instance of the class, or not. For property we want documentation that let's the intended reader determine, given a pair of potential relata, whether the assertion that the relation holds is true. The 'intended reader' part suggests that we also specify who, we expect, would be able to understand the definition, and also generalizes over human and computer reader to include textual and logical definition.
Personally, I am more comfortable weakening definition to documentation, with instructions as to what is desirable.
We also have the outstanding issue of how to aim different definitions to different audiences. A clinical audience reading chebi wants a different sort of definition documentation/definition from a chemistry trained audience, and similarly there is a need for a definition that is adequate for an ontologist to work with.
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
PERSON:Daniel Schober
The official OBI definition, explaining the meaning of a class or property. Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions.
The official definition, explaining the meaning of a class or property. Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions.
definition
editor note
An administrative note intended for its editor. It may not be included in the publication version of the ontology, so it should contain nothing necessary for end users to understand the ontology.
GROUP:OBI:<http://purl.obfoundry.org/obo/obi>
PERSON:Daniel Schober
editor note
mathematical formula
A normal distribution probability density function has a formula of:
f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))
an annotation property that represents a mathematical formula.
Asiyah Yu Lin, Jie Zheng, Yongqun He
has curation status
OBI_0000281
PERSON:Alan Ruttenberg
PERSON:Bill Bug
PERSON:Melanie Courtot
has curation status
definition source
Discussion on obo-discuss mailing-list, see http://bit.ly/hgm99w
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
PERSON:Daniel Schober
definition source
formal citation, e.g. identifier in external database to indicate / attribute source(s) for the definition. Free text indicate / attribute source(s) for the definition. EXAMPLE: Author Name, URI, MeSH Term C04, PUBMED ID, Wiki uri on 31.01.2007
curator note
An administrative note of use for a curator but of no use for a user
PERSON:Alan Ruttenberg
curator note
term editor
20110707, MC: label update to term editor and definition modified accordingly. See http://code.google.com/p/information-artifact-ontology/issues/detail?id=115.
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
Name of editor entering the term in the file. The term editor is a point of contact for information regarding the term. The term editor may be, but is not always, the author of the definition, which may have been worked upon by several people
PERSON:Daniel Schober
term editor
alternative term
An alternative name for a class or property which means the same thing as the preferred name (semantically equivalent)
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
PERSON:Daniel Schober
alternative term
Source
Source
A reference to a resource from which the present resource
is derived.
The present resource may be derived from the Source resource
in whole or in part. Recommended best practice is to reference
the resource by means of a string or number conforming to a
formal identification system.
editor preferred label
editor preferred term
editor preferred term
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
PERSON:Daniel Schober
The concise, meaningful, and human-friendly name for a class or property preferred by the ontology developers. (US-English)
editor preferred label
example of usage
A phrase describing how a class name should be used. May also include other kinds of examples that facilitate immediate understanding of a class semantics, such as widely known prototypical subclasses or instances of the class. Although essential for high level terms, examples for low level terms (e.g., Affymetrix HU133 array) are not
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
PERSON:Daniel Schober
example of usage
ISA alternative term
Requested by Alejandra Gonzalez-Beltran
https://sourceforge.net/tracker/?func=detail&aid=3603413&group_id=177891&atid=886178
ISA alternative term
Person: Philippe Rocca-Serra
Person: Alejandra Gonzalez-Beltran
ISA tools project (http://isa-tools.org)
An alternative term used by the ISA tools project (http://isa-tools.org).
BFO OWL specification label
BFO OWL specification label
Really of interest to developers only
Relates an entity in the ontology to the name of the variable that is used to represent it in the code that generates the BFO OWL file from the lispy specification.
label
IEDB alternative term
An alternative term used by the IEDB.
IEDB
IEDB alternative term
PERSON:Randi Vita, Jason Greenbaum, Bjoern Peters
has associated axiom(nl)
An axiom associated with a term expressed using natural language
Person:Alan Ruttenberg
Person:Alan Ruttenberg
has associated axiom(nl)
has associated axiom(fol)
An axiom expressed in first order logic using CLIF syntax
Person:Alan Ruttenberg
Person:Alan Ruttenberg
has associated axiom(fol)
imported from
For external terms/classes, the ontology from which the term was imported
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
PERSON:Alan Ruttenberg
PERSON:Melanie Courtot
imported from
temporal interpretation
https://code.google.com/p/obo-relations/wiki/ROAndTime
temporal interpretation
elucidation
Person:Barry Smith
Primitive terms in a highest-level ontology such as BFO are terms which are so basic to our understanding of reality that there is no way of defining them in a non-circular fashion. For these, therefore, we can provide only elucidations, supplemented by examples and by axioms
elucidation
person:Alan Ruttenberg
part of
Everything is part of itself. Any part of any part of a thing is itself part of that thing. Two distinct things cannot be part of each other.
Occurrents are not subject to change and so parthood between occurrents holds for all the times that the part exists. Many continuants are subject to change, so parthood between continuants will only hold at certain times, but this is difficult to specify in OWL. See https://code.google.com/p/obo-relations/wiki/ROAndTime
Parthood requires the part and the whole to have compatible classes: only an occurrent can be part of an occurrent; only a process can be part of a process; only a continuant can be part of a continuant; only an independent continuant can be part of an independent continuant; only an immaterial entity can be part of an immaterial entity; only a specifically dependent continuant can be part of a specifically dependent continuant; only a generically dependent continuant can be part of a generically dependent continuant. (This list is not exhaustive.)
A continuant cannot be part of an occurrent: use 'participates in'. An occurrent cannot be part of a continuant: use 'has participant'. A material entity cannot be part of an immaterial entity: use 'has location'. A specifically dependent continuant cannot be part of an independent continuant: use 'inheres in'. An independent continuant cannot be part of a specifically dependent continuant: use 'bearer of'.
a core relation that holds between a part and its whole
http://www.obofoundry.org/ro/#OBO_REL:part_of
my brain is part of my body (continuant parthood, two material entities)
my stomach cavity is part of my stomach (continuant parthood, immaterial entity is part of material entity)
part of
part_of
this day is part of this year (occurrent parthood)
has part
Everything has itself as a part. Any part of any part of a thing is itself part of that thing. Two distinct things cannot have each other as a part.
Occurrents are not subject to change and so parthood between occurrents holds for all the times that the part exists. Many continuants are subject to change, so parthood between continuants will only hold at certain times, but this is difficult to specify in OWL. See https://code.google.com/p/obo-relations/wiki/ROAndTime
Parthood requires the part and the whole to have compatible classes: only an occurrent have an occurrent as part; only a process can have a process as part; only a continuant can have a continuant as part; only an independent continuant can have an independent continuant as part; only a specifically dependent continuant can have a specifically dependent continuant as part; only a generically dependent continuant can have a generically dependent continuant as part. (This list is not exhaustive.)
A continuant cannot have an occurrent as part: use 'participates in'. An occurrent cannot have a continuant as part: use 'has participant'. An immaterial entity cannot have a material entity as part: use 'location of'. An independent continuant cannot have a specifically dependent continuant as part: use 'bearer of'. A specifically dependent continuant cannot have an independent continuant as part: use 'inheres in'.
a core relation that holds between a whole and its part
has part
has_part
my body has part my brain (continuant parthood, two material entities)
my stomach has part my stomach cavity (continuant parthood, material entity has part immaterial entity)
this year has part this day (occurrent parthood)
realized in
Paraphrase of elucidation: a relation between a realizable entity and a process, where there is some material entity that is bearer of the realizable entity and participates in the process, and the realizable entity comes to be realized in the course of the process
[copied from inverse property 'realizes'] to say that b realizes c at t is to assert that there is some material entity d & b is a process which has participant d at t & c is a disposition or role of which d is bearer_of at t& the type instantiated by b is correlated with the type instantiated by c. (axiom label in BFO2 Reference: [059-003])
is realized by
realized in
realized_in
this disease is realized in this disease course
this fragility is realized in this shattering
this investigator role is realized in this investigation
realizes
Paraphrase of elucidation: a relation between a process and a realizable entity, where there is some material entity that is bearer of the realizable entity and participates in the process, and the realizable entity comes to be realized in the course of the process
realizes
this disease course realizes this disease
this investigation realizes this investigator role
this shattering realizes this fragility
to say that b realizes c at t is to assert that there is some material entity d & b is a process which has participant d at t & c is a disposition or role of which d is bearer_of at t& the type instantiated by b is correlated with the type instantiated by c. (axiom label in BFO2 Reference: [059-003])
preceded by
An example is: translation preceded_by transcription; aging preceded_by development (not however death preceded_by aging). Where derives_from links classes of continuants, preceded_by links classes of processes. Clearly, however, these two relations are not independent of each other. Thus if cells of type C1 derive_from cells of type C, then any cell division involving an instance of C1 in a given lineage is preceded_by cellular processes involving an instance of C. The assertion P preceded_by P1 tells us something about Ps in general: that is, it tells us something about what happened earlier, given what we know about what happened later. Thus it does not provide information pointing in the opposite direction, concerning instances of P1 in general; that is, that each is such as to be succeeded by some instance of P. Note that an assertion to the effect that P preceded_by P1 is rather weak; it tells us little about the relations between the underlying instances in virtue of which the preceded_by relation obtains. Typically we will be interested in stronger relations, for example in the relation immediately_preceded_by, or in relations which combine preceded_by with a condition to the effect that the corresponding instances of P and P1 share participants, or that their participants are connected by relations of derivation, or (as a first step along the road to a treatment of causality) that the one process in some way affects (for example, initiates or regulates) the other.
http://www.obofoundry.org/ro/#OBO_REL:preceded_by
is preceded by
preceded by
preceded_by
precedes
precedes
has measurement unit label
has measurement unit label
is about
7/6/2009 Alan Ruttenberg. Following discussion with Jonathan Rees, and introduction of "mentions" relation. Weaken the is_about relationship to be primitive.
We will try to build it back up by elaborating the various subproperties that are more precisely defined.
Some currently missing phenomena that should be considered "about" are predications - "The only person who knows the answer is sitting beside me" , Allegory, Satire, and other literary forms that can be topical without explicitly mentioning the topic.
Smith, Ceusters, Ruttenberg, 2000 years of philosophy
This document is about information artifacts and their representations
is about
is_about is a (currently) primitive relation that relates an information artifact to an entity.
person:Alan Ruttenberg
denotes
2009-11-10 Alan Ruttenberg. Old definition said the following to emphasize the generic nature of this relation. We no longer have 'specifically denotes', which would have been primitive, so make this relation primitive.
g denotes r =def
r is a portion of reality
there is some c that is a concretization of g
every c that is a concretization of g specifically denotes r
A person's name denotes the person. A variable name in a computer program denotes some piece of memory. Lexically equivalent strings can denote different things, for instance "Alan" can denote different people. In each case of use, there is a case of the denotation relation obtaining, between "Alan" and the person that is being named.
Conversations with Barry Smith, Werner Ceusters, Bjoern Peters, Michel Dumontier, Melanie Courtot, James Malone, Bill Hogan
denotes
denotes is a primitive, instance-level, relation obtaining between an information content entity and some portion of reality. Denotation is what happens when someone creates an information content entity E in order to specifically refer to something. The only relation between E and the thing is that E can be used to 'pick out' the thing. This relation connects those two together. Freedictionary.com sense 3: To signify directly; refer to specifically
person:Alan Ruttenberg
is quality measurement of
8/6/2009 Alan Ruttenberg: The strategy is to be rather specific with this relationship. There are other kinds of measurements that are not of qualities, such as those that measure time. We will add these as separate properties for the moment and see about generalizing later
Alan Ruttenberg
From the second IAO workshop [Alan Ruttenberg 8/6/2009: not completely current, though bringing in comparison is probably important]
This one is the one we are struggling with at the moment. The issue is what a measurement measures. On the one hand saying that it measures the quality would include it "measuring" the bearer = referring to the bearer in the measurement. However this makes comparisons of two different things not possible. On the other hand not having it inhere in the bearer, on the face of it, breaks the audit trail.
Werner suggests a solution based on "Magnitudes" a proposal for which we are awaiting details.
--
From the second IAO workshop, various comments, [commented on by Alan Ruttenberg 8/6/2009]
unit of measure is a quality, e.g. the length of a ruler.
[We decided to hedge on what units of measure are, instead talking about measurement unit labels, which are the information content entities that are about whatever measurement units are. For IAO we need that information entity in any case. See the term measurement unit label]
[Some struggling with the various subflavors of is_about. We subsequently removed the relation represents, and describes until and only when we have a better theory]
a represents b means either a denotes b or a describes
describe:
a describes b means a is about b and a allows an inference of at least one quality of b
We have had a long discussion about denotes versus describes.
From the second IAO workshop: An attempt at tieing the quality to the measurement datum more carefully.
a is a magnitude means a is a determinate quality particular inhering in some bearer b existing at a time t that can be represented/denoted by an information content entity e that has parts denoting a unit of measure, a number, and b. The unit of measure is an instance of the determinable quality.
From the second meeting on IAO:
An attempt at defining assay using Barry's "reliability" wording
assay:
process and has_input some material entity
and has_output some information content entity
and which is such that instances of this process type reliably generate
outputs that describes the input.
This one is the one we are struggling with at the moment. The issue is what a measurement measures. On the one hand saying that it measures the quality would include it "measuring" the bearer = referring to the bearer in the measurement. However this makes comparisons of two different things not possible. On the other hand not having it inhere in the bearer, on the face of it, breaks the audit trail.
Werner suggests a solution based on "Magnitudes" a proposal for which we are awaiting details.
is quality measurement of
m is a quality measurement of q at t when
q is a quality
there is a measurement process p that has specified output m, a measurement datum, that is about q
has coordinate unit label
has coordinate unit label
relating a cartesian spatial coordinate datum to a unit label that together with the values represent a point
is duration of
Person:Alan Ruttenberg
is duration of
relates a process to a time-measurement-datum that represents the duration of the process
has time stamp
Alan Ruttenberg
has time stamp
relates a time stamped measurement datum to the time measurement datum that denotes the time when the measurement was taken
has measurement datum
Alan Ruttenberg
has measurement datum
relates a time stamped measurement datum to the measurement datum that was measured
has probability distribution
a relation between a data set and a probability distribution
Yongqun He, Jie Zheng, Asiyah Yu Lin
is_supported_by_data
Philly 2011 workshop
The relation between a data item and a conclusion where the conclusion is the output of a data interpreting process and the data item is used as an input to that process
The relation between the conclusion "Gene tpbA is involved in EPS production" and the data items produced using two sets of organisms, one being a tpbA knockout, the other being tpbA wildtype tested in polysacharide production assays and analyzed using an ANOVA.
is_supported_by_data
OBI
OBI
has_specified_input
8/17/09: specified inputs of one process are not necessarily specified inputs of a larger process that it is part of. This is in contrast to how 'has participant' works.
PERSON: Bjoern Peters
PERSON: Larry Hunter
PERSON: Melanie Coutot
A relation between a planned process and a continuant participating in that process that is not created during the process. The presence of the continuant during the process is explicitly specified in the plan specification which the process realizes the concretization of.
PERSON: Alan Ruttenberg
has_specified_input
see is_input_of example_of_usage
is_specified_input_of
PERSON:Bjoern Peters
A relation between a planned process and a continuant participating in that process that is not created during the process. The presence of the continuant during the process is explicitly specified in the plan specification which the process realizes the concretization of.
Alan Ruttenberg
is_specified_input_of
some Autologous EBV(Epstein-Barr virus)-transformed B-LCL (B lymphocyte cell line) is_input_for instance of Chromum Release Assay described at https://wiki.cbil.upenn.edu/obiwiki/index.php/Chromium_Release_assay
has_specified_output
PERSON: Bjoern Peters
PERSON: Larry Hunter
PERSON: Melanie Courtot
A relation between a planned process and a continuant participating in that process. The presence of the continuant at the end of the process is explicitly specified in the objective specification which the process realizes the concretization of.
PERSON: Alan Ruttenberg
has_specified_output
is_specified_output_of
PERSON:Bjoern Peters
A relation between a planned process and a continuant participating in that process. The presence of the continuant at the end of the process is explicitly specified in the objective specification which the process realizes the concretization of.
Alan Ruttenberg
is_specified_output_of
achieves_planned_objective
A cell sorting process achieves the objective specification 'material separation objective'
BP, AR, PPPB branch
PPPB branch derived
This relation obtains between a planned process and a objective specification when the criteria specified in the objective specification are met at the end of the planned process.
achieves_planned_objective
modified according to email thread from 1/23/09 in accordince with DT and PPPB branch
has grain
PAPER: Granularity, scale and collectivity: When size does and does not matter, Alan Rector, Jeremy Rogers, Thomas Bittner, Journal of Biomedical Informatics 39 (2006) 333-349
has grain
the relation of the cells in the finger of the skin to the finger, in which an indeterminate number of grains are parts of the whole by virtue of being grains in a collective that is part of the whole, and in which removing one granular part does not nec- essarily damage or diminish the whole. Ontological Whether there is a fixed, or nearly fixed number of parts - e.g. fingers of the hand, chambers of the heart, or wheels of a car - such that there can be a notion of a single one being missing, or whether, by contrast, the number of parts is indeterminate - e.g., cells in the skin of the hand, red cells in blood, or rubber molecules in the tread of the tire of the wheel of the car.
Discussion in Karslruhe with, among others, Alan Rector, Stefan Schulz, Marijke Keet, Melanie Courtot, and Alan Ruttenberg. Definition take from the definition of granular parthood in the cited paper. Needs work to put into standard form
PERSON: Alan Ruttenberg
has category label
has category label
A relation between a categorical measurement data item and the categorical label that indicates the value of that data item on the categorical scale.
has value specification
has value specification
PERSON: James A. Overton
OBI
A relation between an information content entity and a value specification that specifies its value.
inheres in
A dependent inheres in its bearer at all times for which the dependent exists.
a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence
inheres in
inheres_in
this fragility inheres in this vase
this red color inheres in this apple
bearer of
A bearer can have many dependents, and its dependents can exist for different periods of time, but none of its dependents can exist when the bearer does not exist.
a relation between an independent continuant (the bearer) and a specifically dependent continuant (the dependent), in which the dependent specifically depends on the bearer for its existence
bearer of
bearer_of
is bearer of
this apple is bearer of this red color
this vase is bearer of this fragility
participates in
a relation between a continuant and a process, in which the continuant is somehow involved in the process
participates in
participates_in
this blood clot participates in this blood coagulation
this input material (or this output material) participates in this process
this investigator participates in this investigation
has participant
Has_participant is a primitive instance-level relation between a process, a continuant, and a time at which the continuant participates in some way in the process. The relation obtains, for example, when this particular process of oxygen exchange across this particular alveolar membrane has_participant this particular sample of hemoglobin at this particular time.
a relation between a process and a continuant, in which the continuant is somehow involved in the process
has participant
http://www.obofoundry.org/ro/#OBO_REL:has_participant
has_participant
this blood coagulation has participant this blood clot
this investigation has participant this investigator
this process has participant this input material (or this output material)
is concretized as
A journal article is an information artifact that inheres in some number of printed journals. For each copy of the printed journal there is some quality that carries the journal article, such as a pattern of ink. The journal article (a generically dependent continuant) is concretized as the quality (a specifically dependent continuant), and both depend on that copy of the printed journal (an independent continuant).
A relationship between a generically dependent continuant and a specifically dependent continuant, in which the generically dependent continuant depends on some independent continuant in virtue of the fact that the specifically dependent continuant also depends on that same independent continuant. A generically dependent continuant may be concretized as multiple specifically dependent continuants.
An investigator reads a protocol and forms a plan to carry out an assay. The plan is a realizable entity (a specifically dependent continuant) that concretizes the protocol (a generically dependent continuant), and both depend on the investigator (an independent continuant). The plan is then realized by the assay (a process).
is concretized as
concretizes
A journal article is an information artifact that inheres in some number of printed journals. For each copy of the printed journal there is some quality that carries the journal article, such as a pattern of ink. The quality (a specifically dependent continuant) concretizes the journal article (a generically dependent continuant), and both depend on that copy of the printed journal (an independent continuant).
A relationship between a specifically dependent continuant and a generically dependent continuant, in which the generically dependent continuant depends on some independent continuant in virtue of the fact that the specifically dependent continuant also depends on that same independent continuant. Multiple specifically dependent continuants can concretize the same generically dependent continuant.
An investigator reads a protocol and forms a plan to carry out an assay. The plan is a realizable entity (a specifically dependent continuant) that concretizes the protocol (a generically dependent continuant), and both depend on the investigator (an independent continuant). The plan is then realized by the assay (a process).
concretizes
function of
A function inheres in its bearer at all times for which the function exists, however the function need not be realized at all the times that the function exists.
a relation between a function and an independent continuant (the bearer), in which the function specifically depends on the bearer for its existence
function of
function_of
is function of
this catalysis function is a function of this enzyme
role of
A role inheres in its bearer at all times for which the role exists, however the role need not be realized at all the times that the role exists.
a relation between a role and an independent continuant (the bearer), in which the role specifically depends on the bearer for its existence
is role of
role of
role_of
this investigator role is a role of this person
has function
A bearer can have many functions, and its functions can exist for different periods of time, but none of its functions can exist when the bearer does not exist. A function need not be realized at all the times that the function exists.
a relation between an independent continuant (the bearer) and a function, in which the function specifically depends on the bearer for its existence
has function
has_function
this enzyme has function this catalysis function (more colloquially: this enzyme has this catalysis function)
has role
A bearer can have many roles, and its roles can exist for different periods of time, but none of its roles can exist when the bearer does not exist. A role need not be realized at all the times that the role exists.
a relation between an independent continuant (the bearer) and a role, in which the role specifically depends on the bearer for its existence
has role
has_role
this person has role this investigator role (more colloquially: this person has this role of investigator)
location of
Most location relations will only hold at certain times, but this is difficult to specify in OWL. See https://code.google.com/p/obo-relations/wiki/ROAndTime
a relation between two independent continuants, the location and the target, in which the target is entirely within the location
location of
location_of
my head is the location of my brain
this cage is the location of this rat
located in
http://www.obofoundry.org/ro/#OBO_REL:located_in
Location as a relation between instances: The primitive instance-level relation c located_in r at t reflects the fact that each continuant is at any given time associated with exactly one spatial region, namely its exact location. Following we can use this relation to define a further instance-level location relation - not between a continuant and the region which it exactly occupies, but rather between one continuant and another. c is located in c1, in this sense, whenever the spatial region occupied by c is part_of the spatial region occupied by c1. Note that this relation comprehends both the relation of exact location between one continuant and another which obtains when r and r1 are identical (for example, when a portion of fluid exactly fills a cavity), as well as those sorts of inexact location relations which obtain, for example, between brain and head or between ovum and uterus
Most location relations will only hold at certain times, but this is difficult to specify in OWL. See https://code.google.com/p/obo-relations/wiki/ROAndTime
a relation between two independent continuants, the target and the location, in which the target is entirely within the location
located in
located_in
my brain is located in my head
this rat is located in this cage
immediately preceded by
starts_at_end_of
David Osumi-Sutherland
X immediately_preceded_by Y iff: end(X) simultaneous_with start(Y)
immediately preceded by
immediately precedes
David Osumi-Sutherland
ends_at_start_of
X immediately_precedes_Y iff: end(X) simultaneous_with start(Y)
immediately precedes
meets
temporal relation
move to BFO?
A relation that holds between two occurrents. This is a grouping relation that collects together all the Allen relations.
Allen
temporal relation
starts
inverse of starts with
Chris Mungall
Allen
starts
has member
has member is a mereological relation between a collection and an item.
has measurement value
has measurement value
has x coordinate value
has x coordinate value
has z coordinate value
has z coordinate value
has y coordinate value
has y coordinate value
has_feature_value
James Malone
has_feature_value
has_feature_value datatype property is used to describe the feature values which the feature class can contain, for example has_base can have feature values of nonNegativeInteger values.
has specified value
OBI
A relation between a value specification and a number that quantifies it.
PERSON: James A. Overton
A range of 'real' might be better than 'float'. For now we follow 'has measurement value' until we can consider technical issues with SPARQL queries and reasoning.
has specified value
entity
entity
An entity is anything that exists or has existed or will exist. (axiom label in BFO2 Reference: [001-001])
Entity
BFO 2 Reference: In all areas of empirical inquiry we encounter general terms of two sorts. First are general terms which refer to universals or types:animaltuberculosissurgical procedurediseaseSecond, are general terms used to refer to groups of entities which instantiate a given universal but do not correspond to the extension of any subuniversal of that universal because there is nothing intrinsic to the entities in question by virtue of which they – and only they – are counted as belonging to the given group. Examples are: animal purchased by the Emperortuberculosis diagnosed on a Wednesdaysurgical procedure performed on a patient from Stockholmperson identified as candidate for clinical trial #2056-555person who is signatory of Form 656-PPVpainting by Leonardo da VinciSuch terms, which represent what are called ‘specializations’ in [81
Entity doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example Werner Ceusters 'portions of reality' include 4 sorts, entities (as BFO construes them), universals, configurations, and relations. It is an open question as to whether entities as construed in BFO will at some point also include these other portions of reality. See, for example, 'How to track absolutely everything' at http://www.referent-tracking.com/_RTU/papers/CeustersICbookRevised.pdf
Julius Caesar
Verdi’s Requiem
entity
the Second World War
your body mass index
continuant
(forall (x) (if (Material Entity x) (exists (t) (and (TemporalRegion t) (existsAt x t))))) // axiom label in BFO2 CLIF: [011-002]
(forall (x) (if (Continuant x) (Entity x))) // axiom label in BFO2 CLIF: [008-002]
(forall (x y) (if (and (Continuant x) (exists (t) (continuantPartOfAt y x t))) (Continuant y))) // axiom label in BFO2 CLIF: [009-002]
A continuant is an entity that persists, endures, or continues to exist through time while maintaining its identity. (axiom label in BFO2 Reference: [008-002])
Continuant
continuant
(forall (x y) (if (and (Continuant x) (exists (t) (hasContinuantPartOfAt y x t))) (Continuant y))) // axiom label in BFO2 CLIF: [126-001]
An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts.
BFO 2 Reference: Continuant entities are entities which can be sliced to yield parts only along the spatial dimension, yielding for example the parts of your table which we call its legs, its top, its nails. ‘My desk stretches from the window to the door. It has spatial parts, and can be sliced (in space) in two. With respect to time, however, a thing is a continuant.’ [60, p. 240
Continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example, in an expansion involving bringing in some of Ceuster's other portions of reality, questions are raised as to whether universals are continuants
continuant
if b is a continuant and if, for some t, c has_continuant_part b at t, then c is a continuant. (axiom label in BFO2 Reference: [126-001])
if b is a continuant and if, for some t, cis continuant_part of b at t, then c is a continuant. (axiom label in BFO2 Reference: [009-002])
if b is a material entity, then there is some temporal interval (referred to below as a one-dimensional temporal region) during which b exists. (axiom label in BFO2 Reference: [011-002])
occurrent
Occurrent
(forall (x) (iff (Occurrent x) (and (Entity x) (exists (y) (temporalPartOf y x))))) // axiom label in BFO2 CLIF: [079-001]
occurrent
(forall (x) (if (Occurrent x) (exists (r) (and (SpatioTemporalRegion r) (occupiesSpatioTemporalRegion x r))))) // axiom label in BFO2 CLIF: [108-001]
An entity that has temporal parts and that happens, unfolds or develops through time.
An occurrent is an entity that unfolds itself in time or it is the instantaneous boundary of such an entity (for example a beginning or an ending) or it is a temporal or spatiotemporal region which such an entity occupies_temporal_region or occupies_spatiotemporal_region. (axiom label in BFO2 Reference: [077-002])
BFO 2 Reference: every occurrent that is not a temporal or spatiotemporal region is s-dependent on some independent continuant that is not a spatial region
BFO 2 Reference: s-dependence obtains between every process and its participants in the sense that, as a matter of necessity, this process could not have existed unless these or those participants existed also. A process may have a succession of participants at different phases of its unfolding. Thus there may be different players on the field at different times during the course of a football game; but the process which is the entire game s-depends_on all of these players nonetheless. Some temporal parts of this process will s-depend_on on only some of the players.
Every occurrent occupies_spatiotemporal_region some spatiotemporal region. (axiom label in BFO2 Reference: [108-001])
Occurrent doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. An example would be the sum of a process and the process boundary of another process.
Simons uses different terminology for relations of occurrents to regions: Denote the spatio-temporal location of a given occurrent e by 'spn[e]' and call this region its span. We may say an occurrent is at its span, in any larger region, and covers any smaller region. Now suppose we have fixed a frame of reference so that we can speak not merely of spatio-temporal but also of spatial regions (places) and temporal regions (times). The spread of an occurrent, (relative to a frame of reference) is the space it exactly occupies, and its spell is likewise the time it exactly occupies. We write 'spr[e]' and `spl[e]' respectively for the spread and spell of e, omitting mention of the frame.
b is an occurrent entity iff b is an entity that has temporal parts. (axiom label in BFO2 Reference: [079-001])
occurrent
independent continuant
(forall (x t) (if (and (IndependentContinuant x) (existsAt x t)) (exists (y) (and (Entity y) (specificallyDependsOnAt y x t))))) // axiom label in BFO2 CLIF: [018-002]
(forall (x t) (if (IndependentContinuant x) (exists (r) (and (SpatialRegion r) (locatedInAt x r t))))) // axiom label in BFO2 CLIF: [134-001]
(iff (IndependentContinuant a) (and (Continuant a) (not (exists (b t) (specificallyDependsOnAt a b t))))) // axiom label in BFO2 CLIF: [017-002]
A continuant that is a bearer of quality and realizable entity entities, in which other entities inhere and which itself cannot inhere in anything.
For any independent continuant b and any time t there is some spatial region r such that b is located_in r at t. (axiom label in BFO2 Reference: [134-001])
For every independent continuant b and time t during the region of time spanned by its life, there are entities which s-depends_on b during t. (axiom label in BFO2 Reference: [018-002])
ic
IndependentContinuant
a chair
a heart
a leg
a molecule
a spatial region
an atom
an orchestra.
an organism
b is an independent continuant = Def. b is a continuant which is such that there is no c and no t such that b s-depends_on c at t. (axiom label in BFO2 Reference: [017-002])
independent continuant
the bottom right portion of a human torso
the interior of your mouth
spatial region
(forall (x y t) (if (and (SpatialRegion x) (continuantPartOfAt y x t)) (SpatialRegion y))) // axiom label in BFO2 CLIF: [036-001]
SpatialRegion
(forall (x) (if (SpatialRegion x) (Continuant x))) // axiom label in BFO2 CLIF: [035-001]
s-region
A spatial region is a continuant entity that is a continuant_part_of spaceR as defined relative to some frame R. (axiom label in BFO2 Reference: [035-001])
All continuant parts of spatial regions are spatial regions. (axiom label in BFO2 Reference: [036-001])
BFO 2 Reference: Spatial regions do not participate in processes.
Spatial region doesn't have a closure axiom because the subclasses don't exhaust all possibilites. An example would be the union of a spatial point and a spatial line that doesn't overlap the point, or two spatial lines that intersect at a single point. In both cases the resultant spatial region is neither 0-dimensional, 1-dimensional, 2-dimensional, or 3-dimensional.
spatial region
two-dimensional spatial region
2d-s-region
(forall (x) (if (TwoDimensionalSpatialRegion x) (SpatialRegion x))) // axiom label in BFO2 CLIF: [039-001]
TwoDimensionalSpatialRegion
A two-dimensional spatial region is a spatial region that is of two dimensions. (axiom label in BFO2 Reference: [039-001])
an infinitely thin plane in space.
the surface of a sphere-shaped part of space
two-dimensional spatial region
process
An occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t.
BFO 2 Reference: The realm of occurrents is less pervasively marked by the presence of natural units than is the case in the realm of independent continuants. Thus there is here no counterpart of ‘object’. In BFO 1.0 ‘process’ served as such a counterpart. In BFO 2.0 ‘process’ is, rather, the occurrent counterpart of ‘material entity’. Those natural – as contrasted with engineered, which here means: deliberately executed – units which do exist in the realm of occurrents are typically either parasitic on the existence of natural units on the continuant side, or they are fiat in nature. Thus we can count lives; we can count football games; we can count chemical reactions performed in experiments or in chemical manufacturing. We cannot count the processes taking place, for instance, in an episode of insect mating behavior.Even where natural units are identifiable, for example cycles in a cyclical process such as the beating of a heart or an organism’s sleep/wake cycle, the processes in question form a sequence with no discontinuities (temporal gaps) of the sort that we find for instance where billiard balls or zebrafish or planets are separated by clear spatial gaps. Lives of organisms are process units, but they too unfold in a continuous series from other, prior processes such as fertilization, and they unfold in turn in continuous series of post-life processes such as post-mortem decay. Clear examples of boundaries of processes are almost always of the fiat sort (midnight, a time of death as declared in an operating theater or on a death certificate, the initiation of a state of war)
process
Process
(iff (Process a) (and (Occurrent a) (exists (b) (properTemporalPartOf b a)) (exists (c t) (and (MaterialEntity c) (specificallyDependsOnAt a c t))))) // axiom label in BFO2 CLIF: [083-003]
a process of cell-division, \ a beating of the heart
a process of meiosis
a process of sleeping
p is a process = Def. p is an occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t. (axiom label in BFO2 Reference: [083-003])
process
the course of a disease
the flight of a bird
the life of an organism
your process of aging.
disposition
disposition
Disposition
(forall (x t) (if (and (RealizableEntity x) (existsAt x t)) (exists (y) (and (MaterialEntity y) (specificallyDepends x y t))))) // axiom label in BFO2 CLIF: [063-002]
(forall (x) (if (Disposition x) (and (RealizableEntity x) (exists (y) (and (MaterialEntity y) (bearerOfAt x y t)))))) // axiom label in BFO2 CLIF: [062-002]
BFO 2 Reference: Dispositions exist along a strength continuum. Weaker forms of disposition are realized in only a fraction of triggering cases. These forms occur in a significant number of cases of a similar type.
If b is a realizable entity then for all t at which b exists, b s-depends_on some material entity at t. (axiom label in BFO2 Reference: [063-002])
an atom of element X has the disposition to decay to an atom of element Y
b is a disposition means: b is a realizable entity & b’s bearer is some material entity & b is such that if it ceases to exist, then its bearer is physically changed, & b’s realization occurs when and because this bearer is in some special physical circumstances, & this realization occurs in virtue of the bearer’s physical make-up. (axiom label in BFO2 Reference: [062-002])
certain people have a predisposition to colon cancer
children are innately disposed to categorize objects in certain ways.
disposition
the cell wall is disposed to filter chemicals in endocytosis and exocytosis
realizable entity
(forall (x t) (if (RealizableEntity x) (exists (y) (and (IndependentContinuant y) (not (SpatialRegion y)) (bearerOfAt y x t))))) // axiom label in BFO2 CLIF: [060-002]
(forall (x) (if (RealizableEntity x) (and (SpecificallyDependentContinuant x) (exists (y) (and (IndependentContinuant y) (not (SpatialRegion y)) (inheresIn x y)))))) // axiom label in BFO2 CLIF: [058-002]
RealizableEntity
realizable
A specifically dependent continuant that inheres in continuant entities and are not exhibited in full at every time in which it inheres in an entity or group of entities. The exhibition or actualization of a realizable entity is a particular manifestation, functioning or process that occurs under certain circumstances.
All realizable dependent continuants have independent continuants that are not spatial regions as their bearers. (axiom label in BFO2 Reference: [060-002])
To say that b is a realizable entity is to say that b is a specifically dependent continuant that inheres in some independent continuant which is not a spatial region and is of a type instances of which are realized in processes of a correlated type. (axiom label in BFO2 Reference: [058-002])
realizable entity
the disposition of this piece of metal to conduct electricity.
the disposition of your blood to coagulate
the function of your reproductive organs
the role of being a doctor
the role of this boundary to delineate where Utah and Colorado meet
zero-dimensional spatial region
ZeroDimensionalSpatialRegion
(forall (x) (if (ZeroDimensionalSpatialRegion x) (SpatialRegion x))) // axiom label in BFO2 CLIF: [037-001]
0d-s-region
A zero-dimensional spatial region is a point in space. (axiom label in BFO2 Reference: [037-001])
zero-dimensional spatial region
quality
(forall (x) (if (exists (t) (and (existsAt x t) (Quality x))) (forall (t_1) (if (existsAt x t_1) (Quality x))))) // axiom label in BFO2 CLIF: [105-001]
Quality
(forall (x) (if (Quality x) (SpecificallyDependentContinuant x))) // axiom label in BFO2 CLIF: [055-001]
quality
If an entity is a quality at any time that it exists, then it is a quality at every time that it exists. (axiom label in BFO2 Reference: [105-001])
a quality is a specifically dependent continuant that, in contrast to roles and dispositions, does not require any further process in order to be realized. (axiom label in BFO2 Reference: [055-001])
quality
the ambient temperature of this portion of air
the color of a tomato
the length of the circumference of your waist
the mass of this piece of gold.
the shape of your nose
the shape of your nostril
specifically dependent continuant
(iff (SpecificallyDependentContinuant a) (and (Continuant a) (forall (t) (if (existsAt a t) (exists (b) (and (IndependentContinuant b) (not (SpatialRegion b)) (specificallyDependsOnAt a b t))))))) // axiom label in BFO2 CLIF: [050-003]
sdc
(iff (RelationalSpecificallyDependentContinuant a) (and (SpecificallyDependentContinuant a) (forall (t) (exists (b c) (and (not (SpatialRegion b)) (not (SpatialRegion c)) (not (= b c)) (not (exists (d) (and (continuantPartOfAt d b t) (continuantPartOfAt d c t)))) (specificallyDependsOnAt a b t) (specificallyDependsOnAt a c t)))))) // axiom label in BFO2 CLIF: [131-004]
A continuant that inheres in or is borne by other entities. Every instance of A requires some specific instance of B which must always be the same.
Reciprocal specifically dependent continuants: the function of this key to open this lock and the mutually dependent disposition of this lock: to be opened by this key
Specifically dependent continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. We're not sure what else will develop here, but for example there are questions such as what are promises, obligation, etc.
SpecificallyDependentContinuant
b is a relational specifically dependent continuant = Def. b is a specifically dependent continuant and there are n > 1 independent continuants c1, … cn which are not spatial regions are such that for all 1 i < j n, ci and cj share no common parts, are such that for each 1 i n, b s-depends_on ci at every time t during the course of b’s existence (axiom label in BFO2 Reference: [131-004])
b is a specifically dependent continuant = Def. b is a continuant & there is some independent continuant c which is not a spatial region and which is such that b s-depends_on c at every time t during the course of b’s existence. (axiom label in BFO2 Reference: [050-003])
of one-sided specifically dependent continuants: the mass of this tomato
of relational dependent continuants (multiple bearers): John’s love for Mary, the ownership relation between John and this statue, the relation of authority between John and his subordinates.
specifically dependent continuant
the disposition of this fish to decay
the function of this heart: to pump blood
the mutual dependence of proton donors and acceptors in chemical reactions [79
the mutual dependence of the role predator and the role prey as played by two organisms in a given interaction
the pink color of a medium rare piece of grilled filet mignon at its center
the role of being a doctor
the shape of this hole.
the smell of this portion of mozzarella
role
role
(forall (x) (if (Role x) (RealizableEntity x))) // axiom label in BFO2 CLIF: [061-001]
A realizable entity the manifestation of which brings about some result or end that is not essential to a continuant in virtue of the kind of thing that it is but that can be served or participated in by that kind of continuant in some kinds of natural, social or institutional contexts.
BFO 2 Reference: One major family of examples of non-rigid universals involves roles, and ontologies developed for corresponding administrative purposes may consist entirely of representatives of entities of this sort. Thus ‘professor’, defined as follows,b instance_of professor at t =Def. there is some c, c instance_of professor role & c inheres_in b at t.denotes a non-rigid universal and so also do ‘nurse’, ‘student’, ‘colonel’, ‘taxpayer’, and so forth. (These terms are all, in the jargon of philosophy, phase sortals.) By using role terms in definitions, we can create a BFO conformant treatment of such entities drawing on the fact that, while an instance of professor may be simultaneously an instance of trade union member, no instance of the type professor role is also (at any time) an instance of the type trade union member role (any more than any instance of the type color is at any time an instance of the type length).If an ontology of employment positions should be defined in terms of roles following the above pattern, this enables the ontology to do justice to the fact that individuals instantiate the corresponding universals – professor, sergeant, nurse – only during certain phases in their lives.
John’s role of husband to Mary is dependent on Mary’s role of wife to John, and both are dependent on the object aggregate comprising John and Mary as member parts joined together through the relational quality of being married.
Role
b is a role means: b is a realizable entity & b exists because there is some single bearer that is in some special physical, social, or institutional set of circumstances in which this bearer does not have to be& b is not such that, if it ceases to exist, then the physical make-up of the bearer is thereby changed. (axiom label in BFO2 Reference: [061-001])
role
the priest role
the role of a boundary to demarcate two neighboring administrative territories
the role of a building in serving as a military target
the role of a stone in marking a property boundary
the role of subject in a clinical trial
the student role
one-dimensional spatial region
(forall (x) (if (OneDimensionalSpatialRegion x) (SpatialRegion x))) // axiom label in BFO2 CLIF: [038-001]
OneDimensionalSpatialRegion
1d-s-region
A one-dimensional spatial region is a line or aggregate of lines stretching from one point in space to another. (axiom label in BFO2 Reference: [038-001])
an edge of a cube-shaped portion of space.
one-dimensional spatial region
three-dimensional spatial region
(forall (x) (if (ThreeDimensionalSpatialRegion x) (SpatialRegion x))) // axiom label in BFO2 CLIF: [040-001]
3d-s-region
ThreeDimensionalSpatialRegion
A three-dimensional spatial region is a spatial region that is of three dimensions. (axiom label in BFO2 Reference: [040-001])
a cube-shaped region of space
a sphere-shaped region of space,
three-dimensional spatial region
generically dependent continuant
gdc
GenericallyDependentContinuant
(iff (GenericallyDependentContinuant a) (and (Continuant a) (exists (b t) (genericallyDependsOnAt a b t)))) // axiom label in BFO2 CLIF: [074-001]
A continuant that is dependent on one or other independent continuant bearers. For every instance of A requires some instance of (an independent continuant type) B but which instance of B serves can change from time to time.
The entries in your database are patterns instantiated as quality instances in your hard drive. The database itself is an aggregate of such patterns. When you create the database you create a particular instance of the generically dependent continuant type database. Each entry in the database is an instance of the generically dependent continuant type IAO: information content entity.
b is a generically dependent continuant = Def. b is a continuant that g-depends_on one or more other entities. (axiom label in BFO2 Reference: [074-001])
generically dependent continuant
the pdf file on your laptop, the pdf file that is a copy thereof on my laptop
the sequence of this protein molecule; the sequence that is a copy thereof in that protein molecule.
function
function
(forall (x) (if (Function x) (Disposition x))) // axiom label in BFO2 CLIF: [064-001]
A function is a disposition that exists in virtue of the bearer’s physical make-up and this physical make-up is something the bearer possesses because it came into being, either through evolution (in the case of natural biological entities) or through intentional design (in the case of artifacts), in order to realize processes of a certain sort. (axiom label in BFO2 Reference: [064-001])
BFO 2 Reference: In the past, we have distinguished two varieties of function, artifactual function and biological function. These are not asserted subtypes of BFO:function however, since the same function – for example: to pump, to transport – can exist both in artifacts and in biological entities. The asserted subtypes of function that would be needed in order to yield a separate monoheirarchy are not artifactual function, biological function, etc., but rather transporting function, pumping function, etc.
Function
function
the function of a hammer to drive in nails
the function of a heart pacemaker to regulate the beating of a heart through electricity
the function of amylase in saliva to break down starch into sugar
process boundary
(iff (ProcessBoundary a) (exists (p) (and (Process p) (temporalPartOf a p) (not (exists (b) (properTemporalPartOf b a)))))) // axiom label in BFO2 CLIF: [084-001]
(forall (x) (if (ProcessBoundary x) (exists (y) (and (ZeroDimensionalTemporalRegion y) (occupiesTemporalRegion x y))))) // axiom label in BFO2 CLIF: [085-002]
Every process boundary occupies_temporal_region a zero-dimensional temporal region. (axiom label in BFO2 Reference: [085-002])
ProcessBoundary
p is a process boundary =Def. p is a temporal part of a process & p has no proper temporal parts. (axiom label in BFO2 Reference: [084-001])
p-boundary
process boundary
the boundary between the 2nd and 3rd year of your life.
material entity
(forall (x) (if (MaterialEntity x) (IndependentContinuant x))) // axiom label in BFO2 CLIF: [019-002]
material
MaterialEntity
(forall (x) (if (and (Entity x) (exists (y t) (and (MaterialEntity y) (continuantPartOfAt x y t)))) (MaterialEntity x))) // axiom label in BFO2 CLIF: [021-002]
(forall (x) (if (and (Entity x) (exists (y t) (and (MaterialEntity y) (continuantPartOfAt y x t)))) (MaterialEntity x))) // axiom label in BFO2 CLIF: [020-002]
A material entity is an independent continuant that has some portion of matter as proper or improper continuant part. (axiom label in BFO2 Reference: [019-002])
An independent continuant that is spatially extended whose identity is independent of that of other entities and can be maintained through time.
BFO 2 Reference: Material entities (continuants) can preserve their identity even while gaining and losing material parts. Continuants are contrasted with occurrents, which unfold themselves in successive temporal parts or phases [60
BFO 2 Reference: Object, Fiat Object Part and Object Aggregate are not intended to be exhaustive of Material Entity. Users are invited to propose new subcategories of Material Entity.
BFO 2 Reference: ‘Matter’ is intended to encompass both mass and energy (we will address the ontological treatment of portions of energy in a later version of BFO). A portion of matter is anything that includes elementary particles among its proper or improper parts: quarks and leptons, including electrons, as the smallest particles thus far discovered; baryons (including protons and neutrons) at a higher level of granularity; atoms and molecules at still higher levels, forming the cells, organs, organisms and other material entities studied by biologists, the portions of rock studied by geologists, the fossils studied by paleontologists, and so on.Material entities are three-dimensional entities (entities extended in three spatial dimensions), as contrasted with the processes in which they participate, which are four-dimensional entities (entities extended also along the dimension of time).According to the FMA, material entities may have immaterial entities as parts – including the entities identified below as sites; for example the interior (or ‘lumen’) of your small intestine is a part of your body. BFO 2.0 embodies a decision to follow the FMA here.
Every entity which has a material entity as continuant part is a material entity. (axiom label in BFO2 Reference: [020-002])
a flame
a forest fire
a human being
a hurricane
a photon
a puff of smoke
a sea wave
a tornado
an aggregate of human beings.
an energy wave
an epidemic
every entity of which a material entity is continuant part is also a material entity. (axiom label in BFO2 Reference: [021-002])
material entity
the undetached arm of a human being
immaterial entity
ImmaterialEntity
immaterial
BFO 2 Reference: Immaterial entities are divided into two subgroups:boundaries and sites, which bound, or are demarcated in relation, to material entities, and which can thus change location, shape and size and as their material hosts move or change shape or size (for example: your nasal passage; the hold of a ship; the boundary of Wales (which moves with the rotation of the Earth) [38, 7, 10
immaterial entity
peptide
Amide derived from two or more amino carboxylic acid molecules (the same or different) by formation of a covalent bond from the carbonyl carbon of one to the nitrogen atom of another with formal loss of water. The term is usually applied to structures formed from alpha-amino acids, but it includes those derived from any amino carboxylic acid. X = OH, OR, NH2, NHR, etc.
peptide
deoxyribonucleic acid
High molecular weight, linear polymers, composed of nucleotides containing deoxyribose and linked by phosphodiester bonds; DNA contain the genetic information of organisms.
deoxyribonucleic acid
molecular entity
Any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer etc., identifiable as a separately distinguishable entity.
molecular entity
We are assuming that every molecular entity has to be completely connected by chemical bonds. This excludes protein complexes, which are comprised of minimally two separate molecular entities. We will follow up with Chebi to ensure this is their understanding as well
nucleic acid
A macromolecule made up of nucleotide units and hydrolysable into certain pyrimidine or purine bases (usually adenine, cytosine, guanine, thymine, uracil), D-ribose or 2-deoxy-D-ribose and phosphoric acid.
nucleic acid
ribonucleic acid
High molecular weight, linear polymers, composed of nucleotides containing ribose and linked by phosphodiester bonds; RNA is central to the synthesis of proteins.
ribonucleic acid
macromolecule
A macromolecule is a molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass.
macromolecule
polymer
double-stranded DNA
double-stranded DNA
cell
A material entity of anatomical origin (part of or deriving from an organism) that has as its parts a maximally connected cell compartment surrounded by a plasma membrane.
cell
PMID:18089833.Cancer Res. 2007 Dec 15;67(24):12018-25. "...Epithelial cells were harvested from histologically confirmed adenocarcinomas .."
biological_process
Any process specifically pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. A process is a collection of molecular events with a defined beginning and end.
biological_process
response to stimulus
Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism.
response to stimulus
measurement unit label
2009-03-16: provenance: a term measurement unit was
proposed for OBI (OBI_0000176) , edited by Chris Stoeckert and
Cristian Cocos, and subsequently moved to IAO where the objective for
which the original term was defined was satisfied with the definition
of this, different, term.
2009-03-16: review of this term done during during the OBI workshop winter 2009 and the current definition was considered acceptable for use in OBI. If there is a need to modify this definition please notify OBI.
A measurement unit label is as a label that is part of a scalar measurement datum and denotes a unit of measure.
Examples of measurement unit labels are liters, inches, weight per volume.
PERSON: Alan Ruttenberg
PERSON: Melanie Courtot
measurement unit label
objective specification
2009-03-16: original definition when imported from OBI read: "objective is an non realizable information entity which can serve as that proper part of a plan towards which the realization of the plan is directed."
2014-03-31: In the example of usage ("In the protocol of a ChIP assay the objective specification says to identify protein and DNA interaction") there is a protocol which is the ChIP assay protocol. In addition to being concretized on paper, the protocol can be concretized as a realizable entity, such as a plan that inheres in a person. The objective specification is the part that says that some protein and DNA interactions are identified. This is a specification of a process endpoint: the boundary in the process before which they are not identified and after which they are. During the realization of the plan, the goal is to get to the point of having the interactions, and participants in the realization of the plan try to do that.
Answers the question, why did you do this experiment?
In the protocol of a ChIP assay the objective specification says to identify protein and DNA interaction.
OBI Plan and Planned Process/Roles Branch
OBI_0000217
PERSON: Alan Ruttenberg
PERSON: Barry Smith
PERSON: Bjoern Peters
PERSON: Jennifer Fostel
a directive information entity that describes an intended process endpoint. When part of a plan specification the concretization is realized in a planned process in which the bearer tries to effect the world so that the process endpoint is achieved.
goal specification
objective specification
action specification
Alan Ruttenberg
OBI Plan and Planned Process branch
Pour the contents of flask 1 into flask 2
a directive information entity that describes an action the bearer will take
action specification
datum label
9/22/11 BP: changed the rdfs:label for this class from 'label' to 'datum label' to convey that this class is not intended to cover all kinds of labels (stickers, radiolabels, etc.), and not even all kind of textual labels, but rather the kind of labels occuring in a datum.
A label is a symbol that is part of some other datum and is used to either partially define the denotation of that datum or to provide a means for identifying the datum as a member of the set of data with the same label
GROUP: IAO
datum label
http://www.golovchenko.org/cgi-bin/wnsearch?q=label#4n
software
GROUP: OBI
PERSON: Alan Ruttenberg
PERSON: Bjoern Peters
PERSON: Chris Stoeckert
PERSON: Melanie Courtot
Software is a plan specification composed of a series of instructions that can be
interpreted by or directly executed by a processing unit.
see sourceforge tracker discussion at http://sourceforge.net/tracker/index.php?func=detail&aid=1958818&group_id=177891&atid=886178
software
data item
2/2/2009 Alan and Bjoern discussing FACS run output data. This is a data item because it is about the cell population. Each element records an event and is typically further composed a set of measurment data items that record the fluorescent intensity stimulated by one of the lasers.
2014-03-31: See discussion at http://odontomachus.wordpress.com/2014/03/30/aboutness-objects-propositions/
2009-03-16: data item deliberatly ambiguous: we merged data set and datum to be one entity, not knowing how to define singular versus plural. So data item is more general than datum.
2009-03-16: removed datum as alternative term as datum specifically refers to singular form, and is thus not an exact synonym.
Data items include counts of things, analyte concentrations, and statistical summaries.
JAR: datum -- well, this will be very tricky to define, but maybe some
information-like stuff that might be put into a computer and that is
meant, by someone, to denote and/or to be interpreted by some
process... I would include lists, tables, sentences... I think I might
defer to Barry, or to Brian Cantwell Smith
JAR: A data item is an approximately justified approximately true approximate belief
PERSON: Alan Ruttenberg
PERSON: Chris Stoeckert
PERSON: Jonathan Rees
a data item is an information content entity that is intended to be a truthful statement about something (modulo, e.g., measurement precision or other systematic errors) and is constructed/acquired by a method which reliably tends to produce (approximately) truthful statements.
data
data item
information content entity
2014-03-10: The use of "thing" is intended to be general enough to include universals and configurations (see https://groups.google.com/d/msg/information-ontology/GBxvYZCk1oc/-L6B5fSBBTQJ).
A generically dependent continuant that is about some thing.
Examples of information content entites include journal articles, data, graphical layouts, and graphs.
OBI_0000142
PERSON: Chris Stoeckert
information content entity
information_content_entity 'is_encoded_in' some digital_entity in obi before split (040907). information_content_entity 'is_encoded_in' some physical_document in obi before split (040907).
Previous. An information content entity is a non-realizable information entity that 'is encoded in' some digital or physical entity.
scalar measurement datum
1
1
10 feet. 3 ml.
2009-03-16: we decided to keep datum singular in scalar measurement datum, as in
this case we explicitly refer to the singular form
PERSON: Alan Ruttenberg
PERSON: Melanie Courtot
Would write this as: has_part some 'measurement unit label' and has_part some numeral and has_part exactly 2, except for the fact that this won't let us take advantage of OWL reasoning over the numbers. Instead use has measurment value property to represent the same. Use has measurement unit label (subproperty of has_part) so we can easily say that there is only one of them.
a scalar measurement datum is a measurement datum that is composed of two parts, numerals and a unit label.
scalar measurement datum
directive information entity
2009-03-16: provenance: a term realizable information entity was proposed for OBI (OBI_0000337) , edited by the PlanAndPlannedProcess branch. Original definition was "is the specification of a process that can be concretized and realized by an actor" with alternative term "instruction".It has been subsequently moved to IAO where the objective for which the original term was defined was satisfied with the definitionof this, different, term.
2013-05-30 Alan Ruttenberg: What differentiates a directive information entity from an information concretization is that it can have concretizations that are either qualities or realizable entities. The concretizations that are realizable entities are created when an individual chooses to take up the direction, i.e. has the intention to (try to) realize it.
8/6/2009 Alan Ruttenberg: Changed label from "information entity about a realizable" after discussions at ICBO
An information content entity whose concretizations indicate to their bearer how to realize them in a process.
PERSON: Alan Ruttenberg
PERSON: Bjoern Peters
Werner pushed back on calling it realizable information entity as it isn't realizable. However this name isn't right either. An example would be a recipe. The realizable entity would be a plan, but the information entity isn't about the plan, it, once concretized, *is* the plan. -Alan
directive information entity
dot plot
A dot plot is a report graph which is a graphical representation of data where each data point is represented by a single dot placed on coordinates corresponding to data point values in particular dimensions.
Dot plot of SSC-H and FSC-H.
OBI_0000123
dot plot
group:OBI
person:Allyson Lister
person:Chris Stoeckert
graph
A diagram that presents one or more tuples of information by mapping those tuples in to a two dimensional space in a non arbitrary way.
OBI_0000240
PERSON: Lawrence Hunter
graph
group:OBI
person:Alan Ruttenberg
person:Allyson Lister
algorithm
A plan specification which describes the inputs and output of mathematical functions as well as workflow of execution for achieving an predefined objective. Algorithms are realized usually by means of implementation as computer programs for execution by automata.
OBI_0000270
PMID: 18378114.Genomics. 2008 Mar 28. LINKGEN: A new algorithm to process data in genetic linkage studies.
Philippe Rocca-Serra
PlanAndPlannedProcess Branch
adapted from discussion on OBI list (Matthew Pocock, Christian Cocos, Alan Ruttenberg)
algorithm
curation status specification
Better to represent curation as a process with parts and then relate labels to that process (in IAO meeting)
GROUP:OBI:<http://purl.obolibrary.org/obo/obi>
OBI_0000266
PERSON:Bill Bug
The curation status of the term. The allowed values come from an enumerated list of predefined terms. See the specification of these instances for more detailed definitions of each enumerated value.
curation status specification
density plot
A density plot is a report graph which is a graphical representation of data where the tint of a particular pixel corresponds to some kind of function corresponding the the amount of data points relativelly with their distance from the the pixel.
Density plot of SSC-H and FSC-H.
OBI_0000179
density plot
group:Flow Cytometry community
person:Allyson Lister
person:Chris Stoeckert
data format specification
2009-03-16: provenance: term imported from OBI_0000187, which had original definition "A data format specification is a plan which organizes
information. Example: The ISO document specifying what encompasses an
XML document; The instructions in a XSD file"
A data format specification is the information content borne by the document published defining the specification.
Example: The ISO document specifying what encompasses an XML document; The instructions in a XSD file
OBI branch derived
OBI_0000187
PERSON: Alan Ruttenberg
PlanAndPlannedProcess Branch
data format specification
data set
2009/10/23 Alan Ruttenberg. The intention is that this term represent collections of like data. So this isn't for, e.g. the whole contents of a cel file, which includes parameters, metadata etc. This is more like java arrays of a certain rather specific type
data list
2014-05-05: Data sets are aggregates and thus must include two or more data items. We have chosen not to add logical axioms to make this restriction.
A data item that is an aggregate of other data items of the same type that have something in common. Averages and distributions can be determined for data sets.
Intensity values in a CEL file or from multiple CEL files comprise a data set (as opposed to the CEL files themselves).
OBI_0000042
data set
group:OBI
person:Allyson Lister
person:Chris Stoeckert
image
An image is an affine projection to a two dimensional surface, of measurements of some quality of an entity or entities repeated at regular intervals across a spatial range, where the measurements are represented as color and luminosity on the projected on surface.
OBI_0000030
group:OBI
image
person:Alan Ruttenberg
person:Allyson
person:Chris Stoeckert
data about an ontology part
Person:Alan Ruttenberg
data about an ontology part
data about an ontology part is a data item about a part of an ontology, for example a term
plan specification
2/3/2009 Comment from OBI review.
Action specification not well enough specified.
Conditional specification not well enough specified.
Question whether all plan specifications have objective specifications.
Request that IAO either clarify these or change definitions not to use them
2009-03-16: provenance: a term a plan was proposed for OBI (OBI_0000344) , edited by the PlanAndPlannedProcess branch. Original definition was " a plan is a specification of a process that is realized by an actor to achieve the objective specified as part of the plan". It has been subsequently moved to IAO where the objective for which the original term was defined was satisfied with the definitionof this, different, term.
2014-03-31: A plan specification can have other parts, such as conditional specifications.
A directive information entity with action specifications and objective specifications as parts that, when concretized, is realized in a process in which the bearer tries to achieve the objectives by taking the actions specified.
Alan Ruttenberg
Alternative previous definition: a plan is a set of instructions that specify how an objective should be achieved
OBI Plan and Planned Process branch
OBI_0000344
PMID: 18323827.Nat Med. 2008 Mar;14(3):226.New plan proposed to help resolve conflicting medical advice.
plan specification
measurement datum
2/2/2009 is_specified_output of some assay?
A measurement datum is an information content entity that is a recording of the output of a measurement such as produced by a device.
Examples of measurement data are the recoding of the weight of a mouse as {40,mass,"grams"}, the recording of an observation of the behavior of the mouse {,process,"agitated"}, the recording of the expression level of a gene as measured through the process of microarray experiment {3.4,luminosity,}.
OBI_0000305
group:OBI
measurement datum
person:Chris Stoeckert
setting datum
2/3/2009 Feedback from OBI
This should be a "setting specification". There is a question of whether it is information about a realizable or not.
Pro other specification are about realizables.
Cons sometimes specifies a quality which is not a realizable.
A settings datum is a datum that denotes some configuration of an instrument.
Alan grouped these in placeholder for the moment. Name by analogy to measurement datum.
setting datum
conclusion textual entity
2009/09/28 Alan Ruttenberg. Fucoidan-use-case
2009/10/23 Alan Ruttenberg: We need to work on the definition still
A textual entity that expresses the results of reasoning about a problem, for instance as typically found towards the end of scientific papers.
Person:Alan Ruttenberg
conclusion textual entity
that fucoidan has a small statistically significant effect on AT3 level but no useful clinical effect as in-vivo anticoagulant, a paraphrase of part of the last paragraph of the discussion section of the paper 'Pilot clinical study to evaluate the anticoagulant activity of fucoidan', by Lowenthal et. al.PMID:19696660
material information bearer
A material entity in which a concretization of an information content entity inheres.
A page of a paperback novel with writing on it. The paper itself is a material information bearer, the pattern of ink is the information carrier.
GROUP: IAO
a brain
a hard drive
material information bearer
histogram
A histogram is a report graph which is a statistical description of a
distribution in terms of occurrence frequencies of different event classes.
GROUP:OBI
PERSON:Chris Stoeckert
PERSON:James Malone
PERSON:Melanie Courtot
histogram
heatmap
A heatmap is a report graph which is a graphical representation of data
where the values taken by a variable(s) are shown as colors in a
two-dimensional map.
GROUP:OBI
PERSON:Chris Stoeckert
PERSON:James Malone
PERSON:Melanie Courtot
heatmap
dendrogram
A dendrogram is a report graph which is a tree diagram
frequently used to illustrate the arrangement of the clusters produced by a
clustering algorithm.
Dendrograms are often used in computational biology to
illustrate the clustering of genes.
PERSON:Chris Stoeckert
PERSON:James Malone
PERSON:Melanie Courtot
WEB: http://en.wikipedia.org/wiki/Dendrogram
dendrogram
scatter plot
A scatterplot is a graph which uses Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
Comparison of gene expression values in two samples can be displayed in a scatter plot
PERSON:Chris Stoeckert
PERSON:James Malone
PERSON:Melanie Courtot
WEB: http://en.wikipedia.org/wiki/Scatterplot
scatter plot
scattergraph
obsolescence reason specification
PERSON: Alan Ruttenberg
PERSON: Melanie Courtot
The creation of this class has been inspired in part by Werner Ceusters' paper, Applying evolutionary terminology auditing to the Gene Ontology.
The reason for which a term has been deprecated. The allowed values come from an enumerated list of predefined terms. See the specification of these instances for more detailed definitions of each enumerated value.
obsolescence reason specification
textual entity
A textual entity is a part of a manifestation (FRBR sense), a generically dependent continuant whose concretizations are patterns of glyphs intended to be interpreted as words, formulas, etc.
AR, (IAO call 2009-09-01): a document as a whole is not typically a textual entity, because it has pictures in it - rather there are parts of it that are textual entities. Examples: The title, paragraph 2 sentence 7, etc.
MC, 2009-09-14 (following IAO call 2009-09-01): textual entities live at the FRBR (http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records) manifestation level. Everything is significant: line break, pdf and html versions of same document are different textual entities.
PERSON: Lawrence Hunter
Words, sentences, paragraphs, and the written (non-figure) parts of publications are all textual entities
text
textual entity
table
A textual entity that contains a two-dimensional arrangement of texts repeated at regular intervals across a spatial range, such that the spatial relationships among the constituent texts expresses propositions
PERSON: Lawrence Hunter
table
| T F
--+-----
T | T F
F | F F
figure
An information content entity consisting of a two dimensional arrangement of information content entities such that the arrangement itself is about something.
Any picture, diagram or table
PERSON: Lawrence Hunter
figure
diagram
A figure that expresses one or more propositions
A molecular structure ribbon cartoon showing helices, turns and sheets and their relations to each other in space.
PERSON: Lawrence Hunter
diagram
document
A collection of information content entities intended to be understood together as a whole
A journal article, patent application, laboratory notebook, or a book
PERSON: Lawrence Hunter
document
cartesian spatial coordinate datum
1
AR notes: We need to discuss whether it should include site.
2009-08-18 Alan Ruttenberg - question to BFO list about whether the BFO sense of the lower dimensional regions is that they are always part of actual space (the three dimensional sort) http://groups.google.com/group/bfo-discuss/browse_thread/thread/9d04e717e39fb617
A cartesian spatial coordinate datum is a representation of a point in a spatial region, in which equal changes in the magnitude of a coordinate value denote length qualities with the same magnitude
Alan Ruttenberg
cartesian spatial coordinate datum
http://groups.google.com/group/bfo-discuss/browse_thread/thread/9d04e717e39fb617
one dimensional cartesian spatial coordinate datum
1
A cartesion spatial coordinate datum that uses one value to specify a position along a one dimensional spatial region
Alan Ruttenberg
one dimensional cartesian spatial coordinate datum
two dimensional cartesian spatial coordinate datum
1
1
A cartesion spatial coordinate datum that uses two values to specify a position within a two dimensional spatial region
Alan Ruttenberg
two dimensional cartesian spatial coordinate datum
three dimensional cartesian spatial coordinate datum
1
1
1
A cartesion spatial coordinate datum that uses three values to specify a position within a three dimensional spatial region
Alan Ruttenberg
three dimensional cartesian spatial coordinate datum
length measurement datum
A scalar measurement datum that is the result of measurement of length quality
Alan Ruttenberg
length measurement datum
denotator type
A denotator type indicates how a term should be interpreted from an ontological perspective.
Alan Ruttenberg
Barry Smith, Werner Ceusters
The Basic Formal Ontology ontology makes a distinction between Universals and defined classes, where the formal are "natural kinds" and the latter arbitrary collections of entities.
denotator type
mass measurement datum
2009/09/28 Alan Ruttenberg. Fucoidan-use-case
A scalar measurement datum that is the result of measurement of mass quality
Person:Alan Ruttenberg
mass measurement datum
time measurement datum
2009/09/28 Alan Ruttenberg. Fucoidan-use-case
A scalar measurement datum that is the result of measuring a temporal interval
Person:Alan Ruttenberg
time measurement datum
documenting
6/11/9: Edited at OBI workshop. We need to be able identify a child form of information artifact which corresponds to something enduring (not brain like). This used to be restricted to physical document or digital entity as the output, but that excludes e.g. an audio cassette tape
Bjoern Peters
Recording the current temperature in a laboratory notebook. Writing a journal article. Updating a patient record in a database.
a planned process in which a document is created or added to by including the specified input in it.
documenting
wikipedia http://en.wikipedia.org/wiki/Documenting
line graph
A line graph is a type of graph created by connecting a series of data
points together with a line.
GROUP:OBI
PERSON:Chris Stoeckert
PERSON:Melanie Courtot
WEB: http://en.wikipedia.org/wiki/Line_chart
line chart
line graph
centrally registered identifier registry
A CRID registry is a dataset of CRID records, each consisting of a CRID symbol and additional information which was recorded in the dataset through a assigning a centrally registered identifier process.
CRID registry
Original proposal from Bjoern, discussions at IAO calls
PERSON: Alan Ruttenberg
PERSON: Bill Hogan
PERSON: Bjoern Peters
PERSON: Melanie Courtot
PubMed is a CRID registry. It has a dataset of PubMed identifiers associated with journal articles.
centrally registered identifier registry
time stamped measurement datum
time stamped measurement datum
time sampled measurement data set
A data set that is an aggregate of data recording some measurement at a number of time points. The time series data set is an ordered list of pairs of time measurement data and the corresponding measurement data acquired at that time.
Alan Ruttenberg
experimental time series
pmid:20604925 - time-lapse live cell microscopy
time sampled measurement data set
software method
A software method (also called subroutine, subprogram, procedure, method, function, or routine) is software designed to execute a specific task.
PERSON: Melanie Courtot
PERSON: Michel Dumontier
http://code.google.com/p/information-artifact-ontology/issues/detail?id=80
software method
software module
A software module is software composed of a collection of software methods.
PERSON: Melanei Courtot
PERSON: Michel Dumontier
http://code.google.com/p/information-artifact-ontology/issues/detail?id=80
software module
software library
A software library is software composed of a collection of software modules and/or software methods in a form that can be statically or dynamically linked to some software application.
PERSON: Melanie Courtot
PERSON: Michel Dumontier
http://code.google.com/p/information-artifact-ontology/issues/detail?id=80
software library
software application
A software application is software that can be directly executed by some processing unit.
PERSON: Melanie Courtot
PERSON: Michel Dumontier
http://code.google.com/p/information-artifact-ontology/issues/detail?id=80
software application
software script
A software script is software whose instructions can be executed using a software
interpreter.
PERSON: Melanie Courtot
PERSON: Michel Dumontier
http://code.google.com/p/information-artifact-ontology/issues/detail?id=80
software script
Viruses
Viruses
Bacteria
Bacteria
eubacteria
Archaea
Archaea
Eukaryota
Eukaryota
eucaryotes
eukaryotes
statistical data analysis
WEB: https://en.wikipedia.org/wiki/Statistics
a data transformation that has input of mulitple data and report overall trend of the data.
Jie Zheng, Oliver He
data collection
a planned process that gathers and measures information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. Data collection results in a collection of data.
Jie Zheng, Oliver He
WEB: http://en.wikipedia.org/wiki/Data_collection
F distribution
Yongqun He
A continuous probability distribution that is associated with the f statistic.
WEB: http://stattrek.com/probability-distributions/f-distribution.aspx
Fisher-Snedecor distribution
gamma distribution
Yongqun He
A continuous probability distribution that is a two-parameter family of continuous probability distributions.
WEB: http://en.wikipedia.org/wiki/Gamma_distribution
normal distribution
Yongqun He
WEB: http://en.wikipedia.org/wiki/Normal_distribution
A continuous probability distribution that has a symmetrical curve, whose position and shape is determined by its location and scale parameters, the mean and standard deviation respectively.
Gaussian distribution
Student's t distribution
Yongqun He
A continuous probability distribution that is is used to estimate population parameters when the sample size is small and/or when the population variance is unknown.
WEB: http://stattrek.com/probability-distributions/t-distribution.aspx
t-distribution
bivariate normality
WEB: SAS users guide
Marcy Harris
a continuous probability distribution of two varialbes that has the tradtional bell shpe; the distribution of one variable is normal each and every alue of the other variable
log-normal distribution
log normal, lognormal
Marcy Harris
a continuous probability that is the distribution of a random variable X if ln(X) is normally distributed
WEB: http://www.statistics.com
point biserial correlation
A special case of the Pearson product-moment correlation; calculated when either the independent variable or dependent variable is dichotomous while the other variable is non-dichotomous
Marcy Harris
probability distribution
Yongqun He, Jie Zheng
an information content entity that refers to a distribution of a random variable that can be described using a mathematical formula.
WEB: http://en.wikipedia.org/wiki/Probability_distribution
measurement scale
WEB: http://en.wikipedia.org/wiki/Level_of_measurement
an information content entity that represents a type of scale on which a variable is measured, including nominal, ordinal, interval, ratio.
level of measurement
scale of measure
Marcy Harris
outlier
Marcy Harris, Yongqun He
a data item that is numerically distant from the rest of the data; often indicative either measurement error or that the population has a high kurtosis
WEB: http://en.wikipedia.org/wiki/Outlier
test statistic
Marcy Harris, Yongqun He
a statistic measure that is a function of the samples and considered as a numberical summary of a data-set that reduces the data to one value that can be used to perform a hypothesis test. It can be used to test a finding for statistical signifiance
WEB: http://en.wikipedia.org/wiki/Test_statistic
weighted data
Marcy Harris
WEB: SAS users guide
weights are applied when one wants to adjust the impact of cases in the analysis
central tendency
Marcy Harris, Yongqun He
a data item that represents a typical value of a set of values. This term relates to the way in which quantitative data tend to cluster around some value
WEB: http://en.wikipedia.org/wiki/Central_tendency
mode
WEB: http://en.wikipedia.org/wiki/Mode_%28statistics%29
Marcy Harris
a data item that is the value that appears most frequent in a set of data. In a normal distribution the numerical value of the mode is the same as that of the mean and median
cohen's kappa measurement
a statistical measure of agreement for categorical data; a measure of inter-rater agreement or inter-annotator agreement
Marcy Harris
inter-rater agreement, inter-annotator agreement
WEB: http://en.wikipedia.org/wiki/Cohen%27s_kappa
causal model
WEB: http://en.wikipedia.org/wiki/Causal_model
Marcy Harris
an abstract, quantitative model of the causal dependencies and other interrelationships among observed or hypothetical models; an ordered triple , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V.
correlation statistical analysis
WEB: http://www.statistics.com
a meaure of the linear association between two variables that are measured on ordinal, interval or ratio scales
Marcy Harris
hierarchical linear model
Marcy Harris
Mendelian randomization
An inferential statistical data analysis that uses measured variation in genes of known function to examine the causal effect of a modifiable exposure on disease in non-experimental studies.
Yongqun He, Jie Zheng, Asiyah Yu Lin
URL: https://en.wikipedia.org/wiki/Mendelian_randomization
multilevel model
hierarchical models
statistical models of parameters that vary at more than one level; a type of regression model that explicitly takes into account structured/nested data
Marcy Harris SAS social science
multivariate analysis
MVA
WEB: http://www.camo.com/multivariate_analysis.html
a inferential statistical data analysis that is used to analyze data that arises from more than one variable
Yongqun He, Jie Zheng
power calculation
a data transformation that is used to calculate the power of a statistical analysis.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
univariate analysis
an inferential statistical data analysis that has only one independent variable
Yongqun He
statistical association
Marcy Harris
WEB: http://en.wikipedia.org/wiki/Association_%28statistics%29
any relationship between two measured quantities that renders them statistically dependent
cochran-armitage test
Marcy Harris
WEB: http://en.wikipedia.org/wiki/Cochran%E2%80%93Armitage_test_for_trend
a data transformation used in categorical data analysis when the aim is to assess for the presence of an association between a variable with two categories and a variable with k categories
data collapsing
Marcy Harris, Yongqun He
bracketing, grouping
a data transformation that combines categories or ranges of values to produce a smaller number of categories
WEB: SAS users guide
covariation
Marcy Harris
WEB: http://en.wikipedia.org/wiki/Covariation
a measure of the extent to which two variables are associated; the extent to which two random variables vary together
goodness of fit
WEB: http://en.wikipedia.org/wiki/Goodness_of_fit
Marcy Harris
describes how well a statistical model fits a set of observations; measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.
kolmogorov-smirnov two sample test
Marcy Harris
WEB: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
Kolmogorov-Smirnov
aunivariate nonparametric test for the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K鈥揝 test), or to compare two samples (two-sample K鈥揝 test).
levene's test
WEB: http://en.wikipedia.org/wiki/Levene%27s_test
Marcy Harris
a data transformation that is specifically an inferential statistic to assess the equality of variances in different sample; tests the null hypothesis that the population variances are equal (called homogeneity of variance or homoscedasticity).
loglinear analysis
a data transformation used for both hypothesis testing and model building to examine the relationship between more than two categorical variables; uses a likelihood ratio statistic that has an approximate chi-square distribution when the sample size is large
WEB: http://en.wikipedia.org/wiki/Loglinear_analysis
Marcy Harris
mcnemar's test
a normal approximation used on nominal data; applied to 2x2 contingency tables to determine whether the row and column marginal frequencies are equal ("marginal homogeneity").
WEB: http://en.wikipedia.org/wiki/McNemar%27s_test
Marcy Harris
statistical model
model
a directive information entity that represents a mathematical relationship which relates changes in a given response to changes in one or more factors. A statistical model is a formalization of relationships between variables in the form of mathematical equations.
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm; http://en.wikipedia.org/wiki/Statistical_model
Marcy Harris, Yongqun He
partitioning of variance components
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
Partitioning of the overall variation into assignable components
variance components
statistical effect
Marcy Harris, Yongqun He
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
a directive information entity that shows how changing the settings of a factor changes the response. The effect of a single factor is also called a main effect.
effect
statistical variable
variable
a directive information entity that specifies a statistical variable whose value may change within the scope of a given problem or set of operations
WEB: http://en.wikipedia.org/wiki/Variable_%28mathematics%29
Yongqun He
confounding variable specification
confounders
Marcy Harris
WEB: http://en.wikipedia.org/wiki/Mediation_%28statistics%29#Mediator_Variable
a variable specification that specifie a variable that may have a causal impact on both the independent variable and dependent variable; ignoring a confounding variable may bias empirical estimates of the causal effect of the independent variable.
covariate specification
WEB: http://www.statistics.com
a variable specification that specifies a variable used in statistical analysis to correct, adjust, or modify the values of a dependent variable; an independent variable not manipulated by the investigator
Marcy Harris
covariate
dichotomous variable specification
a variable that has only two categories
Marcy Harris
dichotomous variable
WEB: SAS users guide
dummy variable
Marcy Harris
a statistical variable with only two categories that reflect only part of the information available in a more comprehensive variable
WEB: SAS users guide
intervening variable specification
A varialbe postulated to be a predictor of one or more dependent variables, and simultaneously predicted by one or more independent variables
WEB: SAS Users Guide
Marcy Harris
intervening variable, mediating variable
binomial distribution
WEB: http://en.wikipedia.org/wiki/Binomial_distribution
A discrete probability distribution that has the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial.
Yongqun He
Poisson distribution
Yongqun He
WEB: http://en.wikipedia.org/wiki/Poisson_distribution
A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.
negative likelihood ratio
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
a likelihood ratio that is calculated by dividing 1 minus sensitivity by specificity ((1-sensitivity)/specificity).
positive likelihood ratio
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
a likelihood ratio that is calculated by dividing sensitivity by 1 minus specificity (sensitivity/(1-specificity)).
absolute risk
Yongqun He
WEB: http://medical-dictionary.thefreedictionary.com/absolute+risk
A data item of an observed or calculated probability of occurrence of an event, X, in a population related to exposure to a specific hazard, infection, trauma; the number of persons suffering from a disease when the exposed population is known with certainty.
accuracy
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
a data item that refers to the number of true positives and true negatives divided by the total number of observations.
censored data
censoring
WEB: SAS users guide
occurs when certain values of a measurement or observation are only partially known, not possible to observe
Marcy Harris
normal distribution probability density function
A probability density function that is for normal distribution probability
Jie Zheng, Asiyah Yu Lin, Yongqun He
f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))
WEB: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Normal.html
Incidence rate
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
incidence
A data item that refers to the number of new events that have occurred in a specific time interval divided by the population at risk at the beginning of the time interval. The result gives the likelihood of developing an event in that time interval.
Yongqun He
test validity
Yongqun He
http://en.wikipedia.org/wiki/Validity_(statistics)
a validity that refers to the degree to which evidence and theory support the interpretations of test scores (as entailed by proposed uses of tests).
odds ratio
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
A data item that refers to the odds that an individual with a specific condition has been exposed to a risk factor divided by the odds that a control has been exposed. The odds ratio is used in case-control studies. The odds ratio provides a reasonable estimate of the relative risk for uncommon conditions.
prevalence rate
Yongqun He
prevalence
A data item that refers to the number of individuals with a given disease at a given point in time divided by the population at risk at that point in time.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
relative risk
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
A data item that equals the incidence in exposed individuals divided by the incidence in unexposed individuals. The relative risk can be calculated from studies in which the proportion of patients exposed and unexposed to a risk is known, such as a cohort study.
reliability
Yongqun He
a data item that refers to the extent to which repeated measurements of a relatively stable phenomenon fall closely to each other.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
sensitivity
WEB: http://en.wikipedia.org/wiki/Sensitivity_and_specificity
Yongqun He
a data item that measures the proportion of actual positives which are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition).
true positive rate, recall
specificity
true negative rate
WEB: http://en.wikipedia.org/wiki/Sensitivity_and_specificity
a data item that refers to the proportion of negatives in a binary classification test which are correctly identified
Yongqun He
validity
http://en.wikipedia.org/wiki/Validity_(statistics)
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Here is another definition: Validity is the extend to which an observation reflects the "truth" of the phenomenon being measured. Reference: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
a data item that refers to the extent to which a concept conclusion or measurement is well-founded and corresponds accurately to the real world.
Yongqun He
polynomial regression
WEB: http://en.wikipedia.org/wiki/Polynomial_regression
A special case of multiple linear regression in which the relationship between the independent variable x and the dependent variable y is modelled as an nth order polynomial
Marcy Harris
data sampling design
WEB: http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc33.htm
sampling plan
a plan specification that provides a detailed outline of which measurements will be taken at what times, on which material, in what manner, and by whom. Sampling plans should be designed in such a way that the resulting data will contain a representative sample of the parameters of interest and allow for all questions, as stated in the goals, to be answered.
Marcy Harris
sampling design
random selection
Any method of sampling that uses some form of random selection, that is, one that will ensure that all units in the population have an equal probability or chance of being selected.
WEB: http://srmo.sagepub.com/view/the-sage-dictionary-of-social-research-methods/SAGE.xml
Marcy Harris
period prevalence rate
a prevalence rate that occurs at a specific period of time
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
point prevalence rate
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
a prevalence rate that occurs at a specific point of time
Yongqun He
continuous probability distribution
Yongqun He
A probability distribution that is associated with continuous variables and has a probability density function.
WEB: http://en.wikipedia.org/wiki/Continuous_probability_distribution#Continuous_probability_distribution
discrete probability distribution
WEB: http://en.wikipedia.org/wiki/Continuous_probability_distribution#Discrete_probability_distribution
Yongqun He
A probability distribution that is associated with discrete variables and is characterized by a probability mass function.
frequency distribution
WEB: http://statistics.com
Marcy Harris
a tabular summary of a set of data showing the number of items in each of several non-overlapping classes or groupings
cohen's kappa coefficient
WEB: http://en.wikipedia.org/wiki/Cohen%27s_kappa
a statistical measure of inter-rater agreement or inter-annotator agreement for qualitative (categorical) items.
confidence interval
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
A quantitative confidence value that refers to an interval give values within which there is a high probability (95 percent by convention) that the true population value can be found. The calculation of a confidence interval considers the standard deviation of the data and the number of observations. Thus, a confidence interval narrows as the number of observations increases, or its variance (dispersion) decreases.
credible interval
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
A quantitative confidence value that refers to
interquartile range
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
A quantitative confidence value that refers to the upper and lower values defining the central 50 percent of observations. The boundaries are equal to the 25th and 75th percentiles. The interquartile range can be depicted in a box and whiskers plot.
Yongqun He
percentile
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
A quantitative confidence value that equals the percentage of a distribution that is below a specific value. As an example, a child is in 90th percentile for weight if only 10 percent of children the same age weigh more than she does.
power
A quantitative confidence value that refers to the ability of a study to detect a true difference. Negative findings may reflect that the study was underpowered to detect a difference.
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
random error
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
A component of experimental error that occurs due to natural variation in the process.
Marcy Harris
range
Yongqun He
A quantitative confidence value that equals the difference between the largest and smallest observation.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
standard deviation
Yongqun He
A quantitative confidence value that measures the variability of data around the mean.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
type 1 error rate
alpha error
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
A quantitative confidence value that refers to the probability of incorrectly concluding that there is a statistically significant difference in a dataset. Alpha is the number after a p-value. Thus, a statistically significant difference reported as p<0.05 means that there is less than a 5 percent chance that the difference could have occurred by chance.
Yongqun He
type 2 error rate
beta error
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
A quantitative confidence value that refers to the probability of incorrectly concluding that there was no statistically significant difference in a dataset. This error often reflects insufficient power of the study.
bias
Marcy Harris
a quantitative confidence value that is a general statistical term meaning a systematic (not random) deviation from the true value
WEB: http://www.statistics.com
coefficient of variation
WEB: http://www.statistics.com
Marcy Harris, Yongqun He
A quantitative confidence value that is the standard deviation of a data set divided by the mean of the same data set; a normalized measure of dispersion of a probability distribution
variation coefficient
statistical error
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
error
Marcy Harris, Yongqun He
a quantitative confidence value that represents unexplained variation in a collection of observations; comopnents of error include random error and lack of fit error
expected value
Marcy Harris
A quantitative confidence value that represents theoretical average value of a statistic over an infinite number of samples from the same population; the weights correspond to the probabilities in the case of a discrete random variable or densities in the case of a continuous random variable.
WEB: http://en.wikipedia.org/wiki/Expected_value
intraclass correlation coefficient
ICC
WEB: http://en.wikipedia.org/wiki/Intraclass_correlation
a quantitative confidence value that is a descriptive statistic and can be used to describe how strongly units in the same group resemble each other; unlike other correlation measures it operates on data structured as groups, rather than data structured as paired observations.
Marcy Harris, Yongqun He
standardized coefficient
Marcy Harris
WEB: SAS users guide
a quantitative confidence value that has been standardized so that they have variances of 1.0; produces standardized regression coefficients (betas)
logit regression
a type of regression analysis used for predicting the outcome of a categorical dependent variable
logistic regression
WEB: http://en.wikipedia.org/wiki/Logistic_regression
Marcy Harris
nonlinear regression
Marcy Harris
a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables
WEB: http://en.wikipedia.org/wiki/Nonlinear_regression
non-randomization sampling design
non-randomization sampling plan
a data sampling design that does not use randomization for sample selection
Marcy Harris, Jie Zheng, Yongqun He
randomization sampling design
randomization sampling plan
a data sampling design that uses randomization for sample selection
Marcy Harris, Jie Zheng, Yongqun He
interval scale
Marcy Harris
WEB: SAS Users Guide
A measurement scale consisting of equal-sized units; the distance between any two positions is of known size.
WEB: WEB: http://en.wikipedia.org/wiki/Level_of_measurement
nominal scale
Marcy Harris
http://en.wikipedia.org/wiki/Nominal_scale#Nominal_scale
A measurement scale that placing of data into categories, without any order or structure (see related OBI term of categorical measurement datum).
ordinal scale
WEB: http://en.wikipedia.org/wiki/Nominal_scale#Ordinal_scale
Marcy Harris
A measurement scale that rankings on which data can be sorted however the size or magnitude of differences between any data points in a class is unknown, just that one ranking is greater than the other
ratio scale
WEB: http://en.wikipedia.org/wiki/Nominal_scale#Ratio_scale
A measurement scale that is similar to an interval scale, i.e. a magnitude of a continuous quantity and a unit magnitude of the same kind; the distinguishing feature of a ratio scale is a meaningful zero value that means the absence of whatever is measured.
Marcy Harris
disease test sensitivity
a sensitivity that refers to the number of patients with a positive test who have a disease divided by all patients who have the disease. A test with high sensitivity will not miss many patients who have the disease (i.e., few false negative results).
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
disease test specificity
a specificity that refers to the number of patients who have a negative test and do not have the disease divided by the number of patients who do not have the disease. A test with high specificity will infrequently identify patients as having a disease when they do not (i.e., few false positive results).
Yongqun He
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
fixed effect
Marcy Harris
A statistical effect that is associated with an input variable that has a limited number of levels or in which only a limited number of levels are of interest to the experimenter.
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
interaction effect
Marcy Harris
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
moderation
a statistical effect that represents the role of a variable in an estimated model (most often a regression model) and its effect on the dependent variable. A variable that has an interaction effect will have a different effect on the dependent variable, depending on the level of some third variable.
random effect
Marcy Harris
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
A statistical effect that is associated with input variables chosen at random from a population having a large or infinite number of possible values.
lack of fit error
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
Marcy Harris
A statistical error that occurs when the analysis omits one or more important terms or factors from the model
type I error
WEB: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
Marcy Harris
the null hypothesis is true but has been rejected; a test result that indicates a given condition has been fulfilled, when it actually has not been fulfilled
false positive
type II error
WEB: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
false negative
the null hypothesis is false but has been accepted; a test result indicates that a condition failed, while it actually was successful.
Marcy Harris
F test
any statistical test in which the test statistic has an F-distribution under the null hypothesis
Marcy Harris
WEB: http://en.wikipedia.org/wiki/F_test
mann-whitney U test
Marcy Harris
Wilcoxon rank-sum test
WEB: http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U
a non-parametric test of the null hypothesis that two populations are the same against an alternative hypothesis
balanced design
WEB: http://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htm
Marcy Harris
An experimental design where all cells (i.e. treatment combinations) have the same number of observation
case-control study design
a study design that starts with the outcome of interest and works backward to the exposure. For instance, patients with a disease are identified and compared with controls for exposure to a risk factor.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
cohort study design
a study design that starts with an exposure and moves forward to the outcome of interest, even if the data are collected retrospectively. As an example, a group of patients who have variable exposure to a risk factor of interest can be followed over time for an outcome.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
randomized controlled trial design
a study design in which patients are randomly assigned to two or more interventions.
WEB: http://www.uptodate.com/contents/glossary-of-common-biostatistical-and-epidemiological-terms
Yongqun He
complex sample design
A data sampling design that uses something other than simple random selection
Marcy Harris
WEB: SAS users guide
kappa statistic
a generic term for several similar measures of agreement used with categorical data; typically used in assessing the degree to which two or more raters, examining the same data, agree on assigning data to categories
Marcy Harris
WEB: http://www.statistics.com
bartlett's test
WEB: http://en.wikipedia.org/wiki/Bartlett%27s_test
Marcy Harris
a statistial test of whether k samples are from populations with equal variances.
mediating variable
a statistical variable that specifies a variable describing how, rather than when, effects will occur by accounting for the relationship between the independent and dependent variables. A mediating relationship is one in which the path relating A to C is mediated by a third variable (B).
mediating variable, mediation
WEB: http://en.wikipedia.org/wiki/Mediation_%28statistics%29#Mediator_Variable
Marcy Harris, Yongqun He
intervening variable
moderator variable
Marcy Harris
a statistical variable that specifies a variable affecting the direction and/or strength of the relation between dependent and independent variables; occurs when the relationship between two variables depends on a third variable
moderator, interaction
WEB: http://en.wikipedia.org/wiki/Moderator_variable
weighted kappa
Marcy Harris
a weighted data that measures the agreeement for categorical data; a generalization of the Kappa statistics to situations in which the categories are not equal in some respect so weighted by an objective or subjective function
WEB: http://www.statistics.com
median
Marcy Harris
WEB: http://en.wikipedia.org/wiki/Median
the middle value that separates the higher half from the lower half of the data sample, population, or probability distribution
mixed model
WEB: http://en.wikipedia.org/wiki/Mixed_model
a statistical model containing both fixed effects and random effects, that is mixed effects;
Marcy Harris
probability density function
Marcy Harris, Yongqun He
density function
WEB: http://en.wikipedia.org/wiki/Probability_density_function
A data transoformation that represents a mathematical function describing the relative likelihood of a continuous random variable to take on a value
rank order
WEB: http://www.merriam-webster.com/dictionary/rank%20order
a data item that represents an arrangement according to a rank, i.e., the position of a partiuclar case relative to other cases on a defined scale
Marcy Harris, Yongqun He
likelihood ratio
a quantitative confidence value that expresses how many times more likely the data are under one model than the other.
WEB: http://en.wikipedia.org/wiki/Likelihood-ratio_test
Yongqun He
chi-square distribution
WEB: http://en.wikipedia.org/wiki/Chi-square_distribution
Yongqun He
A probability distribution that with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.
data matrix
2
Person: Oliver He, Jie Zheng
A data item that consists of two or more data sets.
set of data sets
inferential statistical data analysis
Jie Zheng, Oliver He
a statistical data analysis that uses patterns in the sample data to draw inferences about the population represented, accounting for randomness.
significantly statistical data analysis
WEB: http://en.wikipedia.org/wiki/Statistics
transformed data matrix
2
A data item that consists of two or more data sets that are produced as the output of a data transformation.
Person: Oliver He, Jie Zheng
set of transformed data sets
transformed data set
A data set that is produced as the output of a data transformation.
Person: Jie Zheng, Oliver He
inferential statistical data analysis objective
Jie Zheng, Oliver He
a statistical data analysis objective where the aim is to make inference using population sample data.
exponential distribution
Yongqun He
WEB: http://en.wikipedia.org/wiki/Exponential_distribution
A continuous probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.
random variable
A statistical variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense).
WEB: http://en.wikipedia.org/wiki/Random_variable
Jie Zheng, Yongqun He
data collection objective
an objective specification where the aim is to collect data.
Jie Zheng, Oliver He
data collection from experiment
Jie Zheng, Oliver He
a data collection process that results in a collection of data generated from an experiment(s).
data collection from survey
a data collection by sampling process that results in a collection of data generated from an survey(s).
Jie Zheng, Oliver He
data collection from literature
Jie Zheng, Oliver He
a data collection process that results in a collection of data from the literature.
independent variable
Jie Zheng, Yongqun He
A statistical variable that represents the inputs or causes, or are tested to see if they are the cause in an experiment or a modeling.
WEB: http://en.wikipedia.org/wiki/Dependent_and_independent_variables
data collection by sampling
data sampling
Jie Zheng, Oliver He
a data collection process that results in a collection of data from a sampling process
dependent variable
WEB: http://en.wikipedia.org/wiki/Dependent_and_independent_variables
dependent variable
A statistical variable that represents the output or effect, or is tested to see if it is the effect in an experiment or a modeling.
Jie Zheng, Yongqun He
continuous random variable
Jie Zheng, Yongqun He
A random variable which can take an infinite number of possible values
WEB: http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html#contvar
discrete random variable
A random variable which may take on only a finite number of distinct values such as 0, 1, 2, 3, 4, ...
Jie Zheng, Yongqun He
WEB: http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html#contvar
Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten.
numerical data item
Jie Zheng, Yongqun He
A data item which consists of digits as opposed to letters of the alphabet of special characters
WEB: http://www.ask.com/question/what-is-numerical-data
numeric data
data collection design
Jie Zheng, Yongqun He
A plan specification that provides a detailed outline of how data is collected.
diagnostic validity
http://en.wikipedia.org/wiki/Validity_(statistics)
Yongqun He
a validity that refers to the validity of a diagnosis, and associated diagnostic tests or screening tests in a clinical field such as medicine.
data collection by censoring
Jie Zheng, Oliver He
a data collection by sampling process that results in a collection of data generated from an censoring.
robust multi-array average normalization
http://www.ncbi.nlm.nih.gov/pubmed/12925520
http://www.molmine.com/magma/loading/rma.htm
Person:Jie Zheng, Oliver He
RMA
A normalization data transformation that used to create normalized gene expression level from microarray raw data. The raw intensity values are background corrected, log2 transformed and then quantile normalized in the RMA normalization process.
significance analysis of microarrays
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC33173/
http://en.wikipedia.org/wiki/Significance_analysis_of_microarrays
Person: Jie Zheng, Oliver He
An inferential statistical data analysis that is established in 2001 by Virginia Tusher, Robert Tibshirani and Gilbert Chu, for determining whether changes in gene expression are statistically significant.
SAM
signal-2-noise statistical analysis
S2N
http://en.wikipedia.org/wiki/Signal-to-noise_ratio
http://www.ncbi.nlm.nih.gov/pubmed/10521349
signal-2-noise statistic
A univariate analysis that calculates the level of a desired signal to the level of background noise to identify which detected signal is more signal than noise. It can be used to identify differentially expressed genes.
Person: Jie Zheng, Oliver He
experimental validity
a validity that refers to whether an experiment (or a study) is able to scientifically answer the questions it is intended to answer.
http://en.wikipedia.org/wiki/Validity_(statistics)
Yongqun He
R squared value
http://en.wikipedia.org/wiki/Coefficient_of_determination
A quantitative confidence value that indicates how well data points fit a statistical model - sometimes simply a line or curve. It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model.
R squared
coefficient of determination
Person: Jie Zheng, Oliver He
contingency table
http://en.wikipedia.org/wiki/Confusion_matrix
Person: Jie Zheng, Oliver He
A data item with a specific table layout that allows visualization of the performance of an algorithm. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class.
confusion matrix
error matrix
2x2 contingency table
A contingency table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. This allows more detailed analysis than mere proportion of correct guesses (accuracy).
http://en.wikipedia.org/wiki/Confusion_matrix
Person: Jie Zheng, Oliver He
Gene Set Enrichment Analysis
GSEA
Person: Jie Zheng, Oliver He
http://www.pnas.org/content/102/43/15545.abstract
An inferential statistical data analysis that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
Wilcoxon signed-rank test
Person: Jie Zheng, Oliver He
http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
An inferential statistical data analysis used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it is a paired difference test). It can be used as an alternative to the paired Student's t-test, t-test for matched pairs, or the t-test for dependent samples when the population cannot be assumed to be normally distributed.
data collection from observation
A data collection process that results in a collection of data generated from observation(s).
Person: Jie Zheng, Oliver He
standardized mortality ratio
A data item which is the ratio of observed deaths in the study group to expected deaths in the general population.
http://en.wikipedia.org/wiki/Standardized_mortality_ratio
Person: Jie Zheng, Oliver He
SMR
statistical conclusion validity
http://en.wikipedia.org/wiki/Validity_(statistics)
an experimental validity that refers to the degree to which conclusions about the relationship among variables based on the data are correct or ‘reasonable’.
Yongqun He
internal validity
http://en.wikipedia.org/wiki/Validity_(statistics)
Yongqun He
an experimental validity that refers to the degree to which conclusions about causal relationships can be made (e.g. cause and effect), based on the measures used, the research setting, and the whole research design.
external validity
http://en.wikipedia.org/wiki/Validity_(statistics)
Yongqun He
an experimental validity that refers to the extent to which the (internally valid) results of a study can be held to be true for other cases, for example to different people, places or times. In other words, it is about whether findings can be validly generalized. If the same research study was conducted in those other cases, would it get the same results?
ecological validity
Yongqun He
an external validity that refers to the extent to which research results can be applied to real life situations outside of research settings.
http://en.wikipedia.org/wiki/Validity_(statistics)
construct validity
Yongqun He
http://en.wikipedia.org/wiki/Validity_(statistics)
a test validity that refers to the extent to which operationalizations of a construct (i.e., practical tests developed from a theory) do actually measure what the theory says they do. For example, to what extent is a questionnaire actually measuring "intelligence"?
convergent validity
http://en.wikipedia.org/wiki/Validity_(statistics)
a construct validity that refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with.
Yongqun He
discriminant validity
a construct validity that tests whether concepts or measurements that are supposed to be unrelated are, in fact, unrelated.
Yongqun He
http://en.wikipedia.org/wiki/Validity_(statistics)
content validity
Yongqun He
a test validity that determine whether it covers a representative sample of the behavior domain to be measured through systematic examination of the test content. For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature?
http://en.wikipedia.org/wiki/Validity_(statistics)
criterion validity
If the test data and criterion data are collected at the same time, this is referred to as concurrent validity evidence. If the test data are collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.
http://en.wikipedia.org/wiki/Validity_(statistics)
a test validity that refers to the correlation between the test and a criterion variable (or variables) taken as representative of the construct. In other words, it compares the test with other measures or outcomes (the criteria) already held to be valid. For example, employee selection tests are often validated against measures of job performance (the criterion), and IQ tests are often validated against measures of academic performance (the criterion).
Yongqun He
concurrent validity
a criterion validity that refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. When the measure is compared to another measure of the same type, they will be related (or correlated). In the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.
http://en.wikipedia.org/wiki/Validity_(statistics)
Yongqun He
predictive validity
http://en.wikipedia.org/wiki/Validity_(statistics)
Yongqun He
a criterion validity that refers to the degree to which the operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future. With the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.
representation validity
http://en.wikipedia.org/wiki/Validity_(statistics)
a content validity that refers to the extent to which an abstract theoretical construct can be turned into a specific practical test.
Yongqun He
translation validity
face validity
Yongqun He
a content validity that estimates whether a test appears to measure a certain criterion. It is not guaranteed that the test actually measures phenomena in that domain. When the test does not appear to be measuring what it is, it has low face validity.
http://en.wikipedia.org/wiki/Validity_(statistics)
translation validity
Extraction of Differential Gene Expression software
Storey J.D. (2007) The optimal discovery procedure: A new approach to simultaneous significance testing, Journal of the Royal Statistical Society, Series B, 69: 347-368.
Storey J.D., Dai J.Y., and Leek J.T. (2007) The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments, Biostatistics, 8: 414-432.
Leek J.T,. Monsen E.C., Dabney A.R., and Storey J.D. (2006) EDGE: Extraction and analysis of differential gene expression, Bioinformatics, 22: 507-508.
Storey J.D., Xiao W., Leek J.T., Tompkins R.G., and Davis R.W. (2005) Significance analysis of time course microarray experiments, Proceedings of the National Academy of Sciences, 102: 12837-12842.
EDGE
Beta Cell Biology Consortium
Elisabetta Manduchi
A software that is used for the differential gene expression significance analysis of DNA microarray experiments for both standard and time course experiments. The outputs consist of bothe p-values and q-values.
Patterns from Gene Expression software
The input consists of (replicated) intensities from a collection of array experiments from two or more conditions (or from a collection of direct comparisons on 2-channel arrays). The output consists of patterns, one for each row identifier in the data file. One condition is used as a reference to which the other types are compared. The length of a pattern equals the number of non-reference sample types. The symbols in the patterns are integers, where positive integers represent up-regulation as compared to the reference sample type and negative integers represent down-regulation. The patterns are based on the false discovery rates for each position in the pattern, so that the number of positive and negative symbols that appear in each position of the pattern is as descriptive as the data variability allows. The patterns generated are easily interpretable in that integers are used to represent different levels of up- or down-regulation as compared to the reference sample type.
A software that can be used to produce lists of differentially expressed genes with confidence measures attached. These lists are generated via a False Discovery Rate (FDR) method of controlling the false positives. Patterns from Gene Expression (PaGE) is more than a differential expression analysis tool. PaGE is a tool to attach descriptive, dependable, and easily interpretable expression patterns to genes across multiple conditions, each represented by a set of replicated array experiments.
Elisabetta Manduchi
PaGE
PMID:15797908
Grant G.R., Liu J., Stoeckert C.J.Jr. (2005) A practical false discovery rate approach to identifying patterns of differential expression in microarray data, Bioinformatics, 21(11): 2684-2690.
Beta Cell Biology Consortium
GLobal Identifier of Target Regions software
Beta Cell Biology Consortium
GLITR
Elisabetta Manduchi, Jie Zheng
A software that is used to identify transcription factor binding sites in a genome that have been enriched with aligned reads generated from ChIP-Seq technology.
PMID:19553195
Tuteja G., White P., Schug J., Kaestner K.H. (2009) Extracting transcription factor targets from ChIP-Seq data. Nucleic Acids Res., 37(17): e113. doi: 10.1093/nar/gkp536.
peak calling
http://en.wikipedia.org/wiki/Peak_calling
Jie Zheng
An inferential statistical data analysis to identify protein-binding regions in a genome sequence from the data generated from a ChIP-sequencing or ChIP-chip experiment. When the protein is a transcription factor, the region is a transcription factor binding site (TFBS).
differential expression analysis using LIMMA linear models for microarry data
WEB: http://www.bioconductor.org/packages/release/bioc/html/limma.html
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W and Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43, pp. doi: 10.1093/nar/gkv007.
LIMMA data analysis
Person: Oliver He, Jie Zheng
LIMMA
A differential expression analysis using LIMMA linear model to identify differential expression for microarray data.
summing data values
tracker: https://github.com/obcs/obcs/issues/7
http://dictionary.reference.com/browse/summing
PRISM
Person: Jie Zheng
A data transformation that add two or more numbers, magnitudes, quantities, or particulars as determined by or as if by the mathematical process of addition.
summing data transformation
sum value
Tracker:https://github.com/obcs/obcs/issues/7
Jie Zheng
Person: Jie Zheng
A data item that is produced as the output of a summing data transformation and represents the total value of the input data.
PRISM
data collection from online resource
a data collection process that is conducted through an online process.
Jie Zheng, Oliver He
data collection from online database
WEB: https://en.wikipedia.org/wiki/Online_database
Jie Zheng, Oliver He
an online data collection process that extracts data from an online database
data collection by web crawling
data collection using Web spider
https://en.wikipedia.org/wiki/Web_crawler
Jie Zheng, Oliver He
an online data collection process that extracts data from online using a web crawler. A Web crawler is an Internet bot which systematically browses the World Wide Web and gets data from WWW.
collection of incommensurate data
Jie Zheng, Oliver He
WEB: http://www.ncbi.nlm.nih.gov/pubmed/26428398
a data collection process that deals with incommensurate fields in the data.
web crawling software
web spider
URL: https://en.wikipedia.org/wiki/Web_crawler
web crawler
Jie Zheng, Yongqun He
web scutter
URL: http://www.sciencedaily.com/terms/web_crawler.htm
web robot
A software that is used to browses the World Wide Web in a methodical, automated manner.
permutation
Jie Zheng, Oliver He
WEB: http://stattrek.com/statistics/dictionary.aspx?definition=Permutation
a data transformation process that re-arrange the order of all or part of a set of data items.
WEB: https://en.wikipedia.org/wiki/Permutation
generation of missing data
Jie Zheng, Oliver He
WEB: https://www.niehs.nih.gov/about/visiting/events/pastmtg/assets/docs_k_m/missingdatar.r
a planned process that generates possible values of missing data
processing incompatible data
WEB: http://en.wikipedia.org/wiki/Loglinear_analysis
a data transformation process that transforms incompatible data to compatible data.
Jie Zheng, Yongqun He
random permutation
WEB: https://en.wikipedia.org/wiki/Random_permutation
a permutation process that randomly orders a set of data items.
Jie Zheng, Oliver He
factor analysis
WEB: https://en.wikipedia.org/wiki/Multivariate_analysis
a multivariate analysis that uncovers the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors.
WEB: https://en.wikipedia.org/wiki/Factor_analysis
Yongqun He, Jie Zheng
MANOVA
a multivariate analysis that compares multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables.
multivariate analysis of variance
WEB: https://en.wikipedia.org/wiki/Multivariate_analysis_of_variance
Yongqun He, Jie Zheng
Kriging
WEB: https://en.wikipedia.org/wiki/Kriging
a regression method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances, as opposed to a piecewise-polynomial spline chosen to optimize smoothness of the fitted values.
Gaussian process regression
Jie Zheng, Yongqun He
Bayesian analysis
An inferential statistical data analysis that estimates parameters of an underlying distribution based on the observed distribution
WEB: http://mathworld.wolfram.com/BayesianAnalysis.html
Jie Zheng, Oliver He
knuth shuffle
a random permutation process that generates a permutation of n items uniformly at random without retries. Specifically, it is to start with any permutation (for example, the identity permutation), and then go through the positions 1 through n − 1, and for each position i swap the element currently there with a randomly chosen element from positions i through n, inclusive.
Jie Zheng, Oliver He
WEB: https://en.wikipedia.org/wiki/Random_permutation
sorting
WEB: https://en.wikipedia.org/wiki/Sorting_algorithm
a data transformation process that puts the data items in a set in a certain order.
Jie Zheng, Oliver He
sorting based on numerical order
Jie Zheng, Oliver He
WEB: https://en.wikipedia.org/wiki/Sorting_algorithm
a sorting process that sorts a set of data items based on the numberical order.
sorting based on lexicorgraphic order
a sorting process that sorts a set of data items based on a lexicorgraphic order
Jie Zheng, Oliver He
WEB: https://en.wikipedia.org/wiki/Sorting_algorithm
data transformation in statistics
a data processing that applies a deterministic mathematical function to each point in a data set; that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function.
WEB: https://en.wikipedia.org/wiki/Data_transformation_(statistics)
Jie Zheng, Yongqun He
log2 transformation
A logarithmic transformation that uses the base-2 logarithm.
Jie Zheng, Yongqun He
WEB: http://en.wikipedia.org/wiki/Logarithm
square root transformation
A data transformation that performs an operation of square root.
Jie Zheng, Yongqun He
WEB: https://en.wikipedia.org/wiki/Square_root
Anscombe transformation
Jie Zheng, Yongqun He
WEB: https://en.wikipedia.org/wiki/Anscombe_transform
a variance-stabilizing transformation that transforms a random variable with a Poisson distribution into one with an approximately standard Gaussian distribution.
rank-order transformation
A data transformation that lists data items in a sequential arrangement
Jie Zheng, Yongqun He
numerical data value
mathematical value
Yongqun He, Jie Zheng
mathematical data value
an information content entity that refers to a number value of a data item.
continuous data value
a data value that is continuous, i.e., measured.
WEB: https://www.mathsisfun.com/data/data-discrete-continuous.html
Yongqun He, Jie Zheng
discrete data value
a data value that is descrete, ie, counted.
Yongqun He, Jie Zheng
WEB: https://www.mathsisfun.com/data/data-discrete-continuous.html
ordinal data value
Yongqun He, Jie Zheng
a data value that is an ordinal number, ie., a number that tells the position of something in a list.
cardinal data value
WEB: https://www.mathsisfun.com/numbers/cardinal-ordinal-nominal.html
a data value that is a cardinal number, i.e., a number that says how many of something there are, such as one, two, three, four, five.
Yongqun He, Jie Zheng
undirected graph
undirected network
a mathematical graph where a set of objects (called vertices or nodes) that are connected together, where all the edges are bidirectional.
WEB: http://mathinsight.org/definition/undirected_graph
Yongqun He, Jie Zheng
nominal data value
WEB: https://www.mathsisfun.com/numbers/cardinal-ordinal-nominal.html
a data value that is a nominal number, ie., a number used only as a name, or to identify something (not as an actual value or position)
Yongqun He, Jie Zheng
continuous variable
We will link continuous variable to measurement scale.
Jie Zheng, Yongqun He
A continuous variable over a particular range of the real numbers is one whose value in that range must be such that, if the variable can take values a and b in that range, then it can also take any value between a and b.
WEB: https://en.wikipedia.org/wiki/Continuous_and_discrete_variables
descrete variable
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
a discrete variable over a particular range of real values is one for which, for any value in the range that the variable is permitted to take on, there is a positive minimum distance to the nearest other permissible value.
We will link descrete variable to measurement scale.
WEB: https://en.wikipedia.org/wiki/Continuous_and_discrete_variables
Jie Zheng, Yongqun He
categorical variable
interval-scale variable
A continuous variable that has order and equal intervals.
Jie Zheng, Yongqun He
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
continuous ordinal variable
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
Jie Zheng, Yongqun He
A continuous variable that occur when the measurement is continuous
ratio-scale variable
Jie Zheng, Yongqun He
A continuous variable that is a continuous positive measurement on a nonlinear scale
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
nominal variable
A descrete variable that allows for only qualitative classification
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
Jie Zheng, Yongqun He
descrete ordinal variable
Jie Zheng, Yongqun He
A discrete ordinal variable is a nominal variable, but its different states are ordered in a meaningful sequence
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
dummy variable
Jie Zheng, Yongqun He
A quantitative variable can be transformed into a categorical variable, called a dummy variable by recoding the values
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
preference variable
Jie Zheng, Yongqun He
A preference variable is a specific discrete variable, whose value is either in a decreasing or increasing order.
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
multiple response variable
Jie Zheng, Yongqun He
WEB: http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm
Multiple response variables are those, which can assume more than one value. A typical example is a survey questionnaire about the use of computers in research. The respondents were asked to indicate the purpose(s) for which they use computers in their research work. The respondents could score more than one category.
histogram distribution
Yongqun He, Jie Zheng
A probability distribution that represents the probability distribution corresponding to a histogram of data values.
WEB: https://reference.wolfram.com/language/ref/HistogramDistribution.html
Box-Cox distribution
A normal distribution that is the distribution of a random variable X for which the Box–Cox transformation on X follows a truncated normal distribution.
WEB: http://en.wikipedia.org/wiki/Normal_distribution
Yongqun He
Gaussian distribution
power-normal distribution
directed graph
WEB: http://mathinsight.org/definition/directed_graph
a mathematical graph where the edges point in a direction is called a directed graph.
Yongqun He, Jie Zheng
relational graph
WEB: http://arxiv.org/abs/1412.2378
a mathematical graph where nodes are inter-connected via semantic relations.
Yongqun He, Jie Zheng
relational undirected graph
a mathematical graph where a set of objects (called vertices or nodes) that are connected together, where all the edges are bidirectional.
WEB: http://mathinsight.org/definition/undirected_graph
Yongqun He, Jie Zheng
undirected network
bipartite graph
bigraph
WEB: http://mathworld.wolfram.com/BipartiteGraph.html
Yongqun He, Jie Zheng
a mathematical graph where is a set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent.
mathematic graph
a graph that is a representation of a set of objects where some pairs of objects are connected by links.
Yongqun He, Jie Zheng
WEB: https://en.wikipedia.org/wiki/Graph_(mathematics)
hierarchical tree
a mathematical graph that represents the hierarchical nature of a structure in a graphical form.
Yongqun He, Jie Zheng
WEB: https://en.wikipedia.org/wiki/Tree_structure
derived data from statistical analysis
WEB: http://en.wikipedia.org/wiki/Statistic
statistic
Jie Zheng, Yongqun He, Marcy Harris, Asiyah Yu Lin
WEB: http://www.ask.com/question/what-is-numerical-data
A data item that is derived from a statistical data analysis.
derived data from descriptive statistical analysis
WEB: https://en.wikipedia.org/wiki/Descriptive_statistics
Jie Zheng, Yongqun He, Asiyah Yu Lin
descriptive statistic
A data item that is derived from a statistical data analysis
WEB: http://www.ask.com/question/what-is-numerical-data
derived data from inferential statistical analysis
A data item that is derived from a statistical data analysis
Jie Zheng, Yongqun He
inferential statistic
WEB: http://psc.dss.ucdavis.edu/sommerb/sommerdemo/stat_inf/intro.htm
WEB: http://www.ask.com/question/what-is-numerical-data
data set following probablility distribution
A data set that follows a probabililty distribution
Yongqun He, Jie Zheng, Yu Lin
normally distributed data set
A data set that follows a probabililty distribution
Yongqun He, Jie Zheng, Yu Lin
fluorescent reporter intensity
group:OBI
A measurement datum that represents the output of a scanner measuring the intensity value for each fluorescent reporter.
From the DT branch: This term and definition were originally submitted by the community to our branch, but we thought they best fit DENRIE. However we see several issues with this. First of all the name 'probe' might not be used in OBI. Instead we have a 'reporter' role. Also, albeit the term 'probe intensity' is often used in communities such as the microarray one, the name 'probe' is ambiguous (some use it to refer to what's on the array, some use it to refer to what's hybed to the array). Furthermore, this concept could possibly be encompassed by combining different OBI terms, such as the roles of analyte, detector and reporter (you need something hybed to a probe on the array to get an intensity) and maybe a more general term for 'measuring intensities'. We need to find the right balance between what is consistent with OBI and combinations of its terms and what is user-friendly. Finally, note that 'intensity' is already in the OBI .owl file and is also in PATO. Why didn't OBI import it from PATO? This might be a problem.
fluorescent reporter intensity
person:Chris Stoeckert
planned process
'Plan' includes a future direction sense. That can be problematic if plans are changed during their execution. There are however implicit contingencies for protocols that an agent has in his mind that can be considered part of the plan, even if the agent didn't have them in mind before. Therefore, a planned process can diverge from what the agent would have said the plan was before executing it, by adjusting to problems encountered during execution (e.g. choosing another reagent with equivalent properties, if the originally planned one has run out.)
6/11/9: Edited at workshop. Used to include: is initiated by an agent
Bjoern Peters
Injecting mice with a vaccine in order to test its efficacy
We are only considering successfully completed planned processes. A plan may be modified, and details added during execution. For a given planned process, the associated realized plan specification is the one encompassing all changes made during execution. This means that all processes in which an agent acts towards achieving some
objectives is a planned process.
branch derived
A processual entity that realizes a plan which is the concretization of a plan specification.
This class merges the previously separated objective driven process and planned process, as they the separation proved hard to maintain. (1/22/09, branch call)
planned process
biological feature identification objective
Biological_feature_identification_objective is an objective role carried out by the proposition defining the aim of a study designed to examine or characterize a particular biological feature.
Jennifer Fostel
biological feature identification objective
classified data set
PERSON: James Malone
PERSON: Monnie McGee
data set with assigned class labels
A data set that is produced as the output of a class prediction data transformation and consists of a data set with assigned class labels.
classified data set
processed material
Examples include gel matrices, filter paper, parafilm and buffer solutions, mass spectrometer, tissue samples
Is a material entity that is created or changed during material processing.
PERSON: Alan Ruttenberg
processed material
ratio of collected to emitted light
Submitted by the Flow Cytometry community in DigitalEntity-FlowCytometry-2007-03-30.txt
10%
A measurement datum measuring the amount of light collected s compared to the total amount of emitted light in the detector component of a flow cytometer instrument. The datum has a qualitative role
person:Chris Stoeckert
person:Kevin Clancy
ratio of collected to emitted light
investigation
Could add specific objective specification
Lung cancer investigation using expression profiling, a stem cell transplant investigation, biobanking is not an investigation, though it may be part of an investigation
study
Bjoern Peters
Following OBI call November 2012,26th: it was decided there was no need for adding "achieves objective of drawing conclusion" as existing relations were providing equivalent ability. this note closes the issue and validates the class definition to be part of the OBI core
editor = PRS
OBI branch derived
a planned process that consists of parts: planning, study design execution, documentation and which produce conclusion(s).
investigation
evaluant role
Feb 10, 2009. changes after discussion at OBI Consortium Workshop Feb 2-6, 2009. accepted as core term.
GROUP: Role Branch
OBI
Role call - 17nov-08: JF and MC think an evaluant role is always specified input of a process. Even in the case where we have an assay taking blood as evaluant and outputting blood, the blood is not the specified output at the end of the assay (the concentration of glucose in the blood is)
When a specimen of blood is assayed for glucose concentration, the blood has the evaluant role. When measuring the mass of a mouse, the evaluant is the mouse. When measuring the time of DNA replication, the evaluant is the DNA. When measuring the intensity of light on a surface, the evaluant is the light source.
a role that inheres in a material entity that is realized in an assay in which data is generated about the bearer of the evaluant role
evaluant role
examples of features that could be described in an evaluant: quality.... e.g. "contains 10 pg/ml IL2", or "no glucose detected")
assay
Assay the wavelength of light emitted by excited Neon atoms. Count of geese flying over a house.
any method
study assay
12/3/12: BP: the reference to the 'physical examination' is included to point out that a prediction is not an assay, as that does not require physical examiniation.
A planned process with the objective to produce information about the material entity that is the evaluant, by physically examining it or its proxies.
OBI branch derived
PlanAndPlannedProcess Branch
assay
measuring
scientific observation
quantitative confidence value
group:OBI
A data item which is used to indicate the degree of uncertainty about a measurement.
person:Chris Stoeckert
quantitative confidence value
diagnosis textual entity
Jennifer Fostel
diagnosis is an assessment of a disease or injury, its likely prognosis and treatment.
diagnosis textual entity
reagent role
Feb 10, 2009. changes after discussion at OBI Consortium Workshop Feb 2-6, 2009. accepted as core term.
May 28 2013. Updated definition taken from ReO based on discussions initiated in Philly 2011 workshop. Former defnition described a narrower view of reagents in chemistry that restricts bearers of the role to be chemical entities ("a role played by a molecular entity used to produce a chemical reaction to detect, measure, or produce other substances"). Updated definition allows for broader view of reagents in the domain of biomedical research to include larger materials that have parts that participate chemically in a molecular reaction or interaction.
PERSON:Matthew Brush
reagent
(copied from ReO)
Reagents are distinguished from instruments or devices that also participate in scientific techniques by the fact that reagents are chemical or biological in nature and necessarily participate in or have parts that participate in some chemical interaction or reaction during their intended participation in some technique. By contrast, instruments do not participate in a chemical reaction/interaction during the technique.
Reagents are distinguished from study subjects/evaluants in that study subjects and evaluants are that about which conclusions are drawn and knowledge is sought in an investigation - while reagents, by definition, are not. It should be noted, however, that reagent and study subject/evaluant roles can be borne by instances of the same type of material entity - but a given instance will realize only one of these roles in the execution of a given assay or technique. For example, taq polymerase can bear a reagent role or an evaluant role. In a DNA sequencing assay aimed at generating sequence data about some plasmid, the reagent role of the taq polymerase is realized. In an assay to evaluate the quality of the taq polymerase itself, the evaluant/study subject role of the taq is realized, but not the reagent role since the taq is the subject about which data is generated.
In regard to the statement that reagents are 'distinct' from the specified outputs of a technique, note that a reagent may be incorporated into a material output of a technique, as long as the IDENTITY of this output is distinct from that of the bearer of the reagent role. For example, dNTPs input into a PCR are reagents that become part of the material output of this technique, but this output has a new identity (ie that of a 'nucleic acid molecule') that is distinct from the identity of the dNTPs that comprise it. Similarly, a biotin molecule input into a cell labeling technique are reagents that become part of the specified output, but the identity of the output is that of some modified cell specimen which shares identity with the input unmodified cell specimen, and not with the biotin label. Thus, we see that an important criteria of 'reagent-ness' is that it is a facilitator, and not the primary focus of an investigation or material processing technique (ie not the specified subject/evaluant about which knowledge is sought, or the specified output material of the technique).
A role inhering in a biological or chemical entity that is intended to be applied in a scientific technique to participate (or have molecular components that participate) in a chemical reaction that facilitates the generation of data about some entity distinct from the bearer, or the generation of some specified material output distinct from the bearer.
Buffer, dye, a catalyst, a solvating agent.
PERSON:Matthew Brush
reagent role
material processing
A cell lysis, production of a cloning vector, creating a buffer.
PERSON: Frank Gibson
PERSON: Jennifer Fostel
PERSON: Melanie Courtot
PERSON: Philippe Rocca Serra
A planned process which results in physical changes in a specified input material
OBI branch derived
PERSON: Bjoern Peters
material processing
material transformation
measured expression level
OBI Data Transformation branch
A measurement datum that is the outcome of the quantification of an assay for the activity of a gene, or the number of RNA transcripts.
Examples are quantified data from an expression microarray experiment, PCR measurements, etc.
measured expression level
person:Chris Stoeckert
specimen role
22Jun09. The definition includes whole organisms, and can include a human. The link between specimen role and study subject role has been removed. A specimen taken as part of a case study is not considered to be a population representative, while a specimen taken as representing a population, e.g. person taken from a cohort, blood specimen taken from an animal) would be considered a population representative and would also bear material sample role.
GROUP: Role Branch
Note: definition is in specimen creation objective which is defined as an objective to obtain and store a material entity for potential use as an input during an investigation.
OBI
liver section; a portion of a culture of cells; a nemotode or other animal once no longer a subject (generally killed); portion of blood from a patient.
a role borne by a material entity that is gained during a specimen collection process and that can be realized by use of the specimen in an investigation
blood taken from animal: animal continues in study, whereas blood has role specimen.
something taken from study subject, leaves the study and becomes the specimen.
parasite example
- when parasite in people we study people, people are subjects and parasites are specimen
- when parasite extracted, they become subject in the following study
specimen can later be subject.
specimen role
intervention design
An intervention design is a study design in which a controlled process applied to the subjects (the intervention) serves as the independent variable manipulated by the experimentalist. The treatment (perturbation or intervention) defined can be defined as a combination of values taken by independent variable manipulated by the experimentalists are applied to the recruited subjects assigned (possibly by applying specific methods) to treatment groups. The specificity of intervention design is the fact that independent variables are being manipulated and a response of the biological system is evaluated via response variables as monitored by possibly a series of assays.
OBI branch derived
PMID: 18208636.Br J Nutr. 2008 Jan 22;:1-11.Effect of vitamin D supplementation on bone and vitamin D status among Pakistani immigrants in Denmark: a randomised double-blinded placebo-controlled intervention study.
Philppe Rocca-Serra
intervention design
gene list
group:OBI
A data set of the names or identifiers of genes that are the outcome of an analysis or have been put together for the purpose of an analysis.
Gene lists may arise from analysis to determine differentially expressed genes, may be collected from the literature for involvement in a particular process or pathway (e.g., inflammation), or may be the input for gene set enrichment analysis.
gene list
kind of report. (alan) need to be careful to distinguish from output of a data transformation or calculation. A gene list is a report when it is published as such? Relates to question of whether report is a whole, or whether it can be a part of some other narrative object.
person:Chris Stoeckert
number of particles in subset
Submitted by the Flow Cytometry community in DigitalEntity-FlowCytometry-2007-03-30.txt
500, 200, 0
A measurement datum measuring the number of subjects in a defined subset in a flow cytometer instrument. The datum has a qualitative role
number of particles in subset
person:Kevin Clancy
number of lost events electronic
Submitted by the Flow Cytometry community in DigitalEntity-FlowCytometry-2007-03-30.txt
74, 0, 14 events lost due to data acquisition electronic coincidence.
A measurement datum measuring the number of analysis events lost due to errors in data acquisition electronic coincidence in a flow cytometer instrument. The datum has a qualitative role.
number of lost events electronic
person:Kevin Clancy
cDNA library
GROUP: PSI
Mixed population of cDNAs (complementaryDNA) made from mRNA from a defined source, usually a specific cell type. This term should be associated only to nucleic acid interactors not to their proteins product. For instance in 2h screening use living cells (MI:0349) as sample process.
ALT DEF (PRS):: a cDNA library is a collection of host cells, typically E.Coli cells but not exclusively. modified by transfer of plasmid DNA molecule used as vector containing a fragment or totality of cDNA molecule (the insert) . cDNA library may have an array of role and applications.
PERSON: Luisa Montecchi
PERSON: Philippe Rocca-Serra
PMID:6110205. collection of cDNA derived from mouse splenocytes.
PRS: 22022008. class moved under population,
modification of definition and replacement of biomaterials in previous definition with 'material'
addition of has_role restriction
cDNA library
parameter threshold
Submitted by the Flow Cytometry community in DigitalEntity-FlowCytometry-2007-03-30.txt
0.01, 0.03
A measurement datum measuring the minimal signal that must be detected to generate an electrical event, as compared to the maximal detected signal in a flow cytometer instrument. The datum has a qualitative role
parameter threshold
person:Kevin Clancy
p-value
May be outside the scope of OBI long term, is needed so is retained
PMID:19696660
in contrast to the in-vivo data AT-III increased significantly from
113.5% at baseline to 117% after 4 days (n = 10, P-value= 0.02; Table 2).
WEB: http://en.wikipedia.org/wiki/P-value
A quantitative confidence value that represents the probability of obtaining a result at least as extreme as that actually obtained, assuming that the actual value was the result of chance alone.
PERSON:Chris Stoeckert
p-value
methodology testing objective
Jennifer Fostel
Methodology_testing_objective is an objective role carried out by a proposition defining the aim of the study is to examine the effect of using different methodologies.
methodology testing objective
standard error
group:OBI
A quantitative confidence value which is the standard deviations of the sample in a frequency distribution, obtained by dividing the standard deviation by the total number of cases in the frequency distribution.
person:Chris Stoeckert
see P-Value
standard error
software testing objective
Jennifer Fostel
Software_testing_objective is a hardware_optimization role describing a study designed to examine the effects of using different software or software parameters, e.g. data processing software.
software testing objective
organization
GROUP: OBI
PERSON: Alan Ruttenberg
PERSON: Bjoern Peters
PERSON: Philippe Rocca-Serra
PERSON: Susanna Sansone
An entity that can bear roles, has members, and has a set of organization rules. Members of organizations are either organizations themselves or individual people. Members can bear specific organization member roles that are determined in the organization rules. The organization rules also determine how decisions are made on behalf of the organization by the organization members.
BP: The definition summarizes long email discussions on the OBI developer, roles, biomaterial and denrie branches. It leaves open if an organization is a material entity or a dependent continuant, as no consensus was reached on that. The current placement as material is therefore temporary, in order to move forward with development. Here is the entire email summary, on which the definition is based:
1) there are organization_member_roles (president, treasurer, branch
editor), with individual persons as bearers
2) there are organization_roles (employer, owner, vendor, patent holder)
3) an organization has a charter / rules / bylaws, which specify what roles
there are, how they should be realized, and how to modify the
charter/rules/bylaws themselves.
It is debatable what the organization itself is (some kind of dependent
continuant or an aggregate of people). This also determines who/what the
bearer of organization_roles' are. My personal favorite is still to define
organization as a kind of 'legal entity', but thinking it through leads to
all kinds of questions that are clearly outside the scope of OBI.
Interestingly enough, it does not seem to matter much where we place
organization itself, as long as we can subclass it (University, Corporation,
Government Agency, Hospital), instantiate it (Affymetrix, NCBI, NIH, ISO,
W3C, University of Oklahoma), and have it play roles.
This leads to my proposal: We define organization through the statements 1 -
3 above, but without an 'is a' statement for now. We can leave it in its
current place in the is_a hierarchy (material entity) or move it up to
'continuant'. We leave further clarifications to BFO, and close this issue
for now.
PMID: 16353909.AAPS J. 2005 Sep 22;7(2):E274-80. Review. The joint food and agriculture organization of the United Nations/World Health Organization Expert Committee on Food Additives and its role in the evaluation of the safety of veterinary drug residues in foods.
organization
cluster
group:OBI
A data set which is a subset of data that are a similar to each other in some way.
Cluster of the lymphocytes population.
cluster
person:Allyson
person:Chris Stoeckert
organism feature identification objective
Jennifer Fostel
Organism_feature_identification_objective is a biological_feature_identification_objective role describing a study designed to examine or characterize a biological feature monitored at the level of the organism, e.g. height, weight, stage of development, stage of life cycle.
organism feature identification objective
number of lost events computer
Submitted by the Flow Cytometry community in DigitalEntity-FlowCytometry-2007-03-30.txt
0, 125, 787 events lost due to computer busy.
A measurement datum recording the number of measurement events lost due to overloading of the analysis chip in a flow cytometer instrument. The datum has a qualitative role
number of lost events computer
person:Kevin Clancy
protocol
study protocol
A plan specification which has sufficient level of detail and quantitative information to communicate it between investigation agents, so that different investigation agents will reliably be able to independently reproduce the process.
OBI branch derived + wikipedia (http://en.wikipedia.org/wiki/Protocol_%28natural_sciences%29)
PCR protocol, has objective specification, amplify DNA fragment of interest, and has action specification describes the amounts of experimental reagents used (e..g. buffers, dNTPS, enzyme), and the temperature and cycle time settings for running the PCR.
PlanAndPlannedProcess Branch
protocol
adding a material entity into a target
BP
Class was renamed from 'administering substance', as this is commonly used only for additions into organisms.
Injecting a drug into a mouse. Adding IL-2 to a cell culture. Adding NaCl into water.
branch derived
adding a material entity into a target
is a process with the objective to place a material entity bearing the 'material to be added role' into a material bearing the 'target of material addition role'.
material to be added role
9 March 09 from discussion with PA branch
OBI
Role Branch
drug added to a buffer contained in a tube; substance injected into an animal;
material to be added role
material to be added role is a protocol participant role realized by a material which is added into a material bearing the target of material addition role in a material addition process
drawing a conclusion based on data
Bjoern Peters
Concluding that a gene is upregulated in a tissue sample based on the band intensity in a western blot. Concluding that a patient has a infection based on measurement of an elevated body temperature and reported headache. Concluding that there were problems in an investigation because data from PCR and microarray are conflicting. Concluding that 'defects in gene XYZ cause cancer due to improper DNA repair' based on data from experiments in that study that gene XYZ is involved in DNA repair, and the conclusion of a previous study that cancer patients have an increased number of mutations in this gene.
PERSON: Bjoern Peters
PERSON: Jennifer Fostel
A planned process in which data gathered in an investigation is evaluated in the context of existing knowledge with the objective to generate more general conclusions or to conclude that the data does not allow one to draw general conclusion
drawing a conclusion based on data
planning
7/18/2011 BP: planning used to itself be a planned process. Barry Smith pointed out that this would lead to an infinite regression, as there would have to be a plan to conduct a planning process, which in itself would be the result of planning etc. Therefore, the restrictions on 'planning' were loosened to allow for informal processes that result in an 'ad hoc plan '. This required changing from 'has_specified_output some plan specifiction' to 'has_participant some plan specification'.
Bjoern Peters
Bjoern Peters
Plans and Planned Processes Branch
The process of a scientist thinking about and deciding what reagents to use as part of a protocol for an experiment. Note that the scientist could be human or a "robot scientist" executing software.
a process of creating or modifying a plan specification
planning
inductive reasoning
Bjoern Peters
wikipedia: http://en.wikipedia.org/wiki/Inductive_reasoning
BP: 10/22/122: After changing the parent class to drawing a conclusion *based on data* it is no longer clear that this class is needed; minimally it needs a better definition to distinguish it.
Proposal is to obsolete.
Based on the observation that all lung cancer patients treated with aspirin in our clinical trial survived longer than the control group, we conclude by inductive reasining that aspirin has a therapeutic effect on lung cancer.
a interpreting data that is used to ascribe properties or relations to types based on an observation instance (i.e., on a number of observations or experiences); or to formulate laws based on limited observations of recurring phenomenal patterns.
inductive reasoning
hypothesis driven investigation
OBI branch derived
PlanAndPlannedProcess Branch
hypothesis driven investigation
is an investigation with the goal to test one or more hypothesis
hypothesis generating investigation
OBI branch derived
PlanAndPlannedProcess Branch
hypothesis generating investigation
is an investigation in which data is generated and analyzed with the purpose of generating new hypothesis
information processor function
Frank Gibson
An information processor function is a function that converts information from one form to another, by a lossless process or an extraction process.
data processor function
information processor function
cloning vector role
PERSON: Helen Parkinson
pBluescript plays the role of a cloning vector
JZ: related tracker: https://sourceforge.net/p/obi/obi-terms/102/
A material to be added role played by a small, self-replicating DNA or RNA molecule - usually a plasmid or chromosome - and realized in a process whereby foreign DNA or RNA is inserted into the vector during the process of cloning.
cloning vector role
cloning insert role
Feb 20, 2009. from Wikipedia: cloning of any DNA fragment essentially involves four steps: DNA fragmentation with restriction endonucleases, ligation of DNA fragments to a vector, transfection, and screening/selection. There are multiple processes involved, it is not just "cloning process"
GROUP: Role branch
OBII and Wikipedia
cloning insert role
cloning insert role is a role which inheres in DNA or RNA and is realized by the process of being inserted into a cloning vector in a cloning process.
averaging objective
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
A mean calculation which has averaging objective is a descriptive statistics calculation in which the mean is calculated by taking the sum of all of the observations in a data set divided by the total number of observations. It gives a measure of the 'center of gravity' for the data set. It is also known as the first moment.
An averaging objective is a data transformation objective where the aim is to perform mean calculations on the input of the data transformation.
James Malone
averaging objective
adding material objective
BP
creating a mouse infected with LCM virus
adding material objective
is the specification of an objective to add a material into a target material. The adding is asymmetric in the sense that the target material largely retains its identity
assay objective
PPPB branch
PPPB branch
the objective to determine the weight of a mouse.
an objective specification to determine a specified type of information about an evaluated entity (the material entity bearing evaluant role)
assay objective
target of material addition role
From Branch discussion with BP, AR, MC -- there is a need for the recipient to interact with the administered material. for example, a tooth receiving a filling was not considered to be a target role.
GROUP: Role Branch
OBI
peritoneum of an animal receiving an interperitoneal injection; solution in a tube receiving additional material; location of absorbed material following a dermal application.
target of material addition role is a role realized by an entity into which a material is added in a material addition process
target of material addition role
normalized data set
PERSON: James Malone
PERSON: Melanie Courtot
A data set that is produced as the output of a normalization data transformation.
normalized data set
measure function
A glucometer measures blood glucose concentration, the glucometer has a measure function.
PERSON: Daniel Schober
PERSON: Helen Parkinson
PERSON: Melanie Courtot
PERSON:Frank Gibson
Measure function is a function that is borne by a processed material and realized in a process in which information about some entity is expressed relative to some reference.
measure function
consume data function
PERSON: Daniel Schober
PERSON: Frank Gibson
PERSON: Melanie Courtot
Process data function is a function that is borne by in a material entity by virtue of its structure. When realized the material entity consumes data.
consume data function
material transformation objective
GROUP: OBI PlanAndPlannedProcess Branch
PERSON: Bjoern Peters
PERSON: Frank Gibson
PERSON: Jennifer Fostel
PERSON: Melanie Courtot
PERSON: Philippe Rocca-Serra
The objective to create a mouse infected with LCM virus. The objective to create a defined solution of PBS.
an objective specifiction that creates an specific output object from input materials.
artifact creation objective
material transformation objective
study design execution
6/11/9: edited at workshop. Used to be: study design execution is a process with the objective to generate data according to a concretized study design. The execution of a study design is part of an investigation, and minimally consists of an assay or data transformation.
a planned process that realizes the concretization of a study design
branch derived
injecting a mouse with PBS solution, weighing it, and recording the weight according to a study design.
removed axiom has_part some (assay or 'data transformation') per discussion on protocol application mailing list to improve reasoner performance. The axiom is still desired.
study design execution
DNA sequencing
DNA sequencing
DNA sequencing is a sequencing process which uses deoxyribonucleic acid as input and results in a the creation of DNA sequence information artifact using a DNA sequencer instrument.
Genomic deletions of OFD1 account for 23% of oral-facial-digital type 1 syndrome after negative DNA sequencing. Thauvin-Robinet C, Franco B, Saugier-Veber P, Aral B, Gigot N, Donzel A, Van Maldergem L, Bieth E, Layet V, Mathieu M, Teebi A, Lespinasse J, Callier P, Mugneret F, Masurel-Paulet A, Gautier E, Huet F, Teyssier JR, Tosi M, Frébourg T, Faivre L. Hum Mutat. 2008 Nov 19. PMID: 19023858
OBI Branch derived
Philippe Rocca-Serra
nucleotide sequencing
clustered data set
A clustered data set is the output of a K means clustering data transformation
AR thinks could be a data item instead
PERSON: James Malone
PERSON: Monnie McGee
data set with assigned discovered class labels
A data set that is produced as the output of a class discovery data transformation and consists of a data set with assigned discovered class labels.
clustered data set
data set of features
PERSON: James Malone
PERSON: Monnie McGee
A data set that is produced as the output of a descriptive statistical calculation data transformation and consists of producing a data set that represents one or more features of interest about the input data set.
data set of features
material combination
Mixing two fluids. Adding salt into water. Injecting a mouse with PBS.
bp
bp
created at workshop as parent class for 'adding material into target', which is asymmetric, while combination encompasses all addition processes.
is a material processing with the objective to combine two or more material entities as input into a single material entity as output.
material combination
fuzzy clustering objective
PERSON: James Malone
PERSON: Ryan Brinkman
A fuzzy clustering objective is a data transformation objective where the aim is to assign input objects (typically vectors of attributes) a probability that a point belongs to a class, where the number of class and their specifications are not known a priori.
James Malone
fuzzy clustering objective
data set of predicted values according to fitted curve
PERSON: James Malone
PERSON: Monnie McGee
A data set which is the output of a curve fitting data transformation in which the aim is to find a curve which matches a series of data points and possibly other constraints.
data set of predicted values according to fitted curve
data representational model
2009-02-28: work on this term has been finalized during the OBI workshop winter 2009
Data representational model is an information content entity of the relationships between data items. A data representational model is encoded in a data format specification such as for cytoscape or biopax.
GROUP: OBI
Melanie Courtot
data representational model
data structure
data structure specification
gene regulatory graph model
phylogenetic tree
protein interaction network
specimen collection process
5/31/2012: This process is not necessarily an acquisition, as specimens may be collected from materials already in posession
6/9/09: used at workshop
A planned process with the objective of collecting a specimen.
Bjoern Peters
Note: definition is in specimen creation objective which is defined as an objective to obtain and store a material entity for potential use as an input during an investigation.
specimen collection process
drawing blood from a patient for analysis, collecting a piece of a plant for depositing in a herbarium, buying meat from a butcher in order to measure its protein content in an investigation
specimen collection
label changed to 'specimen collection process' on 10/27/2014, details see tracker:
http://sourceforge.net/p/obi/obi-terms/716/
Philly2013: A specimen collection can have as part a material entity acquisition, such as ordering from a bank. The distinction is that specimen collection necessarily involves the creation of a specimen role. However ordering cell lines cells from ATCC for use in an investigation is NOT a specimen collection, because the cell lines already have a specimen role.
Philly2013: The specimen_role for the specimen is created during the specimen collection process.
background corrected data set
PERSON: James Malone
PERSON: Melanie Courtot
A data set that is produced as the output of a background correction data transformation.
background corrected data set
error corrected data set
PERSON: James Malone
PERSON: Monnie McGee
A data set that is produced as the output of an error correction data transformation and consists of producing a data set which has had erroneous contributions from the input to the data transformation removed (corrected for).
error corrected data set
class prediction data transformation
James Malone
supervised classification data transformation
A class prediction data transformation (sometimes called supervised classification) is a data transformation that has objective class prediction.
PERSON: James Malone
class prediction data transformation
background correction data transformation
James Malone
A background correction data transformation (sometimes called supervised classification) is a data transformation that has the objective background correction.
PERSON: James Malone
background correction data transformation
error correction data transformation
EDITORS
Monnie McGee
An error correction data transformation is a data transformation that has the objective of error correction, where the aim is to remove (correct for) erroneous contributions from the input to the data transformation.
James Malone
error correction data transformation
statistical hypothesis test
James Malone
A statistical hypothesis test data transformation is a data transformation that has objective statistical hypothesis test.
PERSON: James Malone
statistical hypothesis test
center value
PERSON: James Malone
PERSON: Monnie McGee
median
A data item that is produced as the output of a center calculation data transformation and represents the center value of the input data.
center value
statistical hypothesis test objective
Person:Helen Parkinson
WEB: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
hypothesis test objective
is a data transformation objective where the aim is to estimate statistical significance with the aim of proving or disproving a hypothesis by means of some data transformation
James Malone
statistical hypothesis test objective
reduced dimension data set
PERSON: James Malone
PERSON: Monnie McGee
A data set that is produced as the output of a data vector reduction data transformation and consists of producing a data set which has fewer vectors than the input data set.
reduced dimension data set
average value
PERSON: James Malone
PERSON: Monnie McGee
arithmetic mean
mean
A data item that is produced as the output of an averaging data transformation and represents the average value of the input data.
average value
specimen collection objective
A objective specification to obtain a material entity for potential use as an input during an investigation.
Bjoern Peters
Bjoern Peters
The objective to collect bits of excrement in the rainforest. The objective to obtain a blood sample from a patient.
specimen collection objective
material combination objective
PPPB branch
bp
is an objective to obtain an output material that contains several input materials.
material combination objective
support vector machine
A support vector machine is a data transformation with a class prediction objective based on the construction of a separating hyperplane that maximizes the margin between two data sets of vectors in n-dimensional space.
James Malone
PERSON: Ryan Brinkman
Ryan Brinkman
SVM
support vector machine
self-organizing map
A self-organizing map (SOM) is an artificial neural network with objective class discovery that uses a neighborhood function to preserve the topological properties of a dataset to produce low-dimensional (typically 2) discretized representation of the training data set. A set of artificial neurons learn to map points in an input space to coordinates in an output space. The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.
James Malone
PERSON: Ryan Brinkman
Ryan Brinkman
SOM
self-organizing map
decision tree induction objective
A decision tree induction objective is a data transformation objective in which a tree-like graph of edges and nodes is created and from which the selection of each branch requires that some type of logical decision is made.
James Malone
decision tree induction objective
decision tree building data transformation
James Malone
A decision tree building data transformation is a data transformation that has objective decision tree induction.
PERSON: James Malone
decision tree building data transformation
library preparation
PMID: 19570239. Construction and analysis of cotton (Gossypium arboreum L.) drought-related cDNA library. Zhang L, Li FG, Liu CL, Zhang CJ, Zhang XY. BMC Res Notes. 2009 Jul 2;2:120.
Philippe Rocca-Serra
is a process which results in the creation of a library from fragments of DNA using cloning vectors or oligonucleotides with the role of adaptors.
library construction
library preparation
GenePattern software
GenePattern software
James Malone
Person:Helen Parkinson
WEB: http://www.broadinstitute.org/cancer/software/genepattern/
a software that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks.
paired-end library
PMID: 19339662. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 2009 Apr;19(4):521-32. Fullwood MJ, Wei CL, Liu ET, Ruan Y.
Philippe Rocca-Serra
adapted from information provided by Solid web site
is a collection of short paired tags from the two ends of DNA fragments are extracted and covalently linked as ditag constructs
mate-paired library
paired-end library
paired-end tag (PET) library
peak matching
James Malone
PERSON: Ryan Brinkman
Peak matching is a data transformation performed on a dataset of a graph of ordered data points (e.g. a spectrum) with the objective of pattern matching local maxima above a noise threshold
Ryan Brinkman
peak matching
k-nearest neighbors
k-NN
A k-nearest neighbors is a data transformation which achieves a class discovery or partitioning objective, in which an input data object with vector y is assigned to a class label based upon the k closest training data set points to y; where k is the largest value that class label is assigned.
James Malone
PERSON: James Malone
k-nearest neighbors
recombinant vector
A recombinant vector is created by a recombinant vector cloning process, and contains nucleic acids that can be amplified. It retains functions of the original cloning vector.
recombinant vector
single fragment library
Philippe Rocca-Serra
fragment library
is a collection of short tags from DNA fragments, are extracted and covalently linked as single tag constructs
single fragment library
cloning vector
A cloning vector is an engineered material that is used as an input material for a recombinant vector cloning process to carry inserted nucleic acids. It contains an origin of replication for a specific destination host organism, encodes for a selectable gene product and contains a cloning site.
cloning vector
Student's t-test
James Malone
Studen't t-test is a data transformation with the objective of a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.
Student's t-test
WEB: http://en.wikipedia.org/wiki/T-test
topologically preserved clustered data set
A clustered data set in which the topology, i.e. the spatial properties between data points, is preserved from the original input data from which it was derived.
James Malone
PERSON: James Malone
the output data set generated from a self-organizing map.
topologically preserved clustered data set
CART
classification and regression trees
A CART (classification and regression trees) is a data transformation method for producing a classification or regression model with a tree-based structure.
BOOK: David J. Hand, Heikki Mannila and Padhraic Smyth (2001) Principles of Data Mining.
CART
James Malone
study design independent variable
2009-03-16: work has been done on this term during during the OBI workshop winter 2009 and the current definition was considered acceptable for use in OBI. If there is a need to modify thisdefinition please notify OBI.
PERSON: Alan Ruttenberg
PERSON: Bjoern Peters
PERSON: Chris Stoeckert
Web: http://en.wikipedia.org/wiki/Dependent_and_independent_variables
study factor
In a study in which gene expression is measured in patients between 8 month to 4 years old that have mild or severe malaria and in which the hypothesis is that gene expression in that age group is a function of disease status, disease status is the independent variable.
2/2/2009 Original definition - In the design of experiments, independent variables are those whose values are controlled or selected by the person experimenting (experimenter) to determine its relationship to an observed phenomenon (the dependent variable). In such an experiment, an attempt is made to find evidence that the values of the independent variable determine the values of the dependent variable (that which is being measured). The independent variable can be changed as required, and its values do not represent a problem requiring explanation in an analysis, but are taken simply as given. The dependent variable on the other hand, usually cannot be directly controlled.
independent variable
In the Philly 2013 workshop the label was chosen to distinguish it from "dependent variable" as used in statistical modelling. See: http://en.wikipedia.org/wiki/Statistical_modeling
a directive information entity that is part of a study design. Independent variables are entities whose values are selected to determine its relationship to an observed phenomenon (the dependent variable). In such an experiment, an attempt is made to find evidence that the values of the independent variable determine the values of the dependent variable (that which is being measured). The independent variable can be changed as required, and its values do not represent a problem requiring explanation in an analysis, but are taken simply as given. The dependent variable on the other hand, usually cannot be directly controlled
experimental factor
study design independent variable
study design dependent variable
2009-03-16: work has been done on this term during during the OBI workshop winter 2009 and the current definition was considered acceptable for use in OBI. If there is a need to modify thisdefinition please notify OBI.
PERSON: Alan Ruttenberg
PERSON: Bjoern Peters
PERSON: Chris Stoeckert
WEB: http://en.wikipedia.org/wiki/Dependent_and_independent_variables
In a study in which gene expression is measured in patients between 8 month to 4 years old that have mild or severe malaria and in which the hypothesis is that gene expression in that age group is a function of disease status, the gene expression is the dependent variable.
2/2/2009 In the design of experiments, independent variables are those whose values are controlled or selected by the person experimenting (experimenter) to determine its relationship to an observed phenomenon (the dependent variable). In such an experiment, an attempt is made to find evidence that the values of the independent variable determine the values of the dependent variable (that which is being measured). The independent variable can be changed as required, and its values do not represent a problem requiring explanation in an analysis, but are taken simply as given. The dependent variable on the other hand, usually cannot be directly controlled.
In the Philly 2013 workshop the label was chosen to distinguish it from "dependent variable" as used in statistical modelling. See: http://en.wikipedia.org/wiki/Statistical_modeling
dependent variable
dependent variable specification is part of a study design. The dependent variable is the event studied and expected to change when the independent variable varies.
study design dependent variable
survival rate
A measurement data that represents the percentage of people or animals in a study or treatment group who are alive for a given period of time after diagnosis or initiation of monitoring.
Oliver He
adapted from wikipedia
http://en.wikipedia.org/wiki/Survival_rate
survival rate
multiple testing correction objective
A multiple testing correction objectives is a data transformation objective where the aim is to correct for a set of statistical inferences considered simultaneously
Application of the Bonferroni correction
http://en.wikipedia.org/wiki/Multiple_Testing_Correction
multiple comparison correction objective
multiple testing correction objective
statistical model validation
A data transformation which assesses how the results of a statistical analysis will generalize to an independent data set.
Helen Parkinson
Using the expression levels of 20 proteins to predict whether a cancer patient will respond to a drug. A practical goal would be to determine which subset of the 20 features should be used to produce the best predictive model. - wikipedia
http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29
statistical model validation
spike train datum
A measurement datum which represents information about an ordered series of action potentials in an organism's CNS measured over time.
Helen Parkinson, Alan Ruttenberg
Jessica Turner, NIF
Measurement of temporal regularity of spike train responses in auditory nerve fibers of the green treefrog
needs more work to see exactly what the data set looks like - HP
spike train datum
spike train measurement
primary structure of DNA macromolecule
BP et al
a quality of a DNA molecule that inheres in its bearer due to the order of its DNA nucleotide residues.
placeholder for SO
primary structure of DNA macromolecule
measurement device
A device in which a measure function inheres.
A ruler, a microarray scanner, a Geiger counter.
GROUP:OBI Philly workshop
OBI
measurement device
Likelihood-ratio test
Likelihood-ratio is a data transformation which tests whether there is evidence of the need to move from a simple model to a more complicated one (where the simple model is nested within the complicated one); tests of the goodness-of-fit between two models.
Likelihood-ratio test
Tina Boussard
pattern matching objective
A pattern matching objective aims to detect the presence of the constituents of a given pattern. In contrast to pattern recognition, the pattern is rigidly specified. Patterns are typicall sequences or trees.
Tina Boussard
http://en.wikipedia.org/wiki/Pattern_matching
pattern matching objective
study intervention
GROUP: OBI
PERSON: Bjoern Peters
study intervention
the part of the execution of an intervention design study which is varied between two or more subjects in the study
categorical measurement datum
A measurement datum that is reported on a categorical scale
Bjoern Peters
Bjoern Peters
categorical measurement datum
nominal mesurement datum
handedness assay
A handedness assay measures the unequal distribution of fine motor skill between the left and right hands typically in human subjects by means of some questionnaire and scoring procedure.
Helen Parkinson
The Edinburgh handedness assay is a specific method of determing handedness
handedness assay
handedness test
http://en.wikipedia.org/wiki/Handedness
compound treatment design
MO_555 compound_treatment_design
PERSON: Bjoern Peters
This is meant to include all kinds of material administrations, including vaccinations, chemical compounds etc.
an intervention design in which the treatment is the administration of a compound
compound treatment design
categorical label
A label that is part of a categorical datum and that indicates the value of the data item on the categorical scale.
Bjoern Peters
Bjoern Peters
The labels 'positive' vs. 'negative', or 'left handed', 'right handed', 'ambidexterous', or 'strongly binding', 'weakly binding' , 'not binding', or '+++', '++', '+', '-' etc. form scales of categorical labels.
categorical label
device
2012-12-17 JAO: In common lab usage, there is a distinction made between devices and reagents that is difficult to model. Therefore we have chosen to specifically exclude reagents from the definition of "device", and are enumerating the types of roles that a reagent can perform.
2013-6-5 MHB: The following clarifications are outcomes of the May 2013 Philly Workshop. Reagents are distinguished from devices that also participate in scientific techniques by the fact that reagents are chemical or biological in nature and necessarily participate in some chemical interaction or reaction during the realization of their experimental role. By contrast, devices do not participate in such chemical reactions/interactions. Note that there are cases where devices use reagent components during their operation, where the reagent-device distinction is less clear. For example:
(1) An HPLC machine is considered a device, but has a column that holds a stationary phase resin as an operational component. This resin qualifies as a device if it participates purely in size exclusion, but bears a reagent role that is realized in the running of a column if it interacts electrostatically or chemically with the evaluant. The container the resin is in (“the column”) considered alone is a device. So the entire column as well as the entire HPLC machine are devices that have a reagent as an operating part.
(2) A pH meter is a device, but its electrode component bears a reagent role in virtue of its interacting directly with the evaluant in execution of an assay.
(3) A gel running box is a device that has a metallic lead as a component that participates in a chemical reaction with the running buffer when a charge is passed through it. This metallic lead is considered to have a reagent role as a component of this device realized in the running of a gel.
In the examples above, a reagent is an operational component of a device, but the device itself does not realize a reagent role (as bearing a reagent role is not transitive across the part_of relation). In this way, the asserted disjointness between a reagent and device holds, as both roles are never realized in the same bearer during execution of an assay.
A material entity that is designed to perform a function in a scientific investigation, but is not a reagent.
A voltmeter is a measurement device which is intended to perform some measure function.
An autoclave is a device that sterlizes instruments or contaminated waste by applying high temperature and pressure.
OBI development call 2012-12-17.
PERSON: Helen Parkinson
device
instrument
dose specification
a directive information entity that describes the dose that will be administered to a target
a protocol specifying to administer 1 ml of vaccine to a mouse
dose specification
scalar score from composite inputs
1
questionaire score
scalar score from composite inputs
A measurement datum which is the result of combining multiple datum. For example, a mean or summary score.
JT: We included this because we wanted to talk about an output from a questionnaire that summarized the answers to the questionnaire, but which was not actually the answer to any single question.
JZ: can we defined it logically as the output of some data transformation, like aggragate data transformation?
Person: Jessica Turner
Person: Jessica Turner
sequence data
A measurement datum that representing the primary structure of a macromolecule(it's sequence) sometimes associated with an indicator of confidence of that measurement.
GROUP: OBI
Person:Chris Stoeckert
example of usage: the representation of a nucleotide sequence in FASTA format used for a sequence similarity search.
sequence data
handedness categorical measurement datum
A datum used to record the answer to a self assessment of whether a person uses their left hand, right hand primarily or each hand equally
PERSON:Alan Ruttenberg
PERSON:Jessica Turner
handedness categorical measurement datum
dose
A measurement datum that measures the quantity of something that may be administered to an organism or that an organism may be exposed to. Quantities of nutrients, drugs, vaccines and toxins are referred to as doses.
An organism has been injected 1ml of vaccine
dose
growth condition intervention design
A study design in which the independent variable is the environmental condition in which the specimen is growing
MO_588 growth_condition_design
PERSON: Bjoern Peters
growth condition intervention design
performing a diagnosis
Diagnosing that a patient has pneumonia based on information on measurements of temperature, sound of breathing, and patient complaining about a headache.
The interpretation of the information available about bodily features (clinical picture) of a patient resulting in a diagnosis
performing a diagnosis
Edinburgh score
1
Edinburgh score
PMID:5146491#Oldfield, R.C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9, 97-113
Person: Alan Ruttenberg
Person:Jessica Turner
WEB:http://www.cse.yorku.ca/course_archive/2006-07/W/4441/EdinburghInventory.html
A score that measures the dominance of a person's right or left hand in everyday activities.
administration of material to specimen
Bjoern Peters
Bjoern Peters
Staining cells in a tissue slice with a dye.
The directed combination of a material entity with a specimen.
administration of material to specimen
growth environment
OBI group
PERSON:Richard Scheuermann, Jie Zheng, Bjoern Peters
Right now this may be incomplete. Should also cover e.g. sound, light as well.
The collection of material entities and their qualities that are located near a live organism, tissue or cell and can influence its growth.
growth environment
questionnaire
Need to clarify if this is a document or a directive information entity (or what their connection is))
questionnaire
A document with a set of printed or written questions with a choice of answers, devised for the purposes of a survey or statistical study.
JT: It plays a role in collecting data that could be fleshed out more; but I'm thinking it is, in itself, an edited document.
JZ: based on textual definition of edited document, it can be defined as N&S. I prefer to leave questionnaire as a document now. We can add more restrictions in the future and use that to determine it is an edited document or not.
Merriam-Webster
PERSON: Jessica Turner
Edinburgh handedness assay
Edinburgh handedness assay
PERSON:Jessica Turner
Person:Alan Ruttenberg
The Edinburgh Handedness assay is an assay in which a set of questions = the Edinburgh Handedness inventory - is asked and the answers to these questions are turned into a score, used to assess the dominance of a person's right or left hand in everyday activities. The inventory can be used by an observer assessing the person, or by a person self-reporting hand use. The latter method tends to be less reliable due to a person over-attributing tasks to the dominant hand.
WEB:http://en.wikipedia.org/wiki/Edinburgh_Handedness_Inventory
feature extraction
A planed process with objective of obtaining quantified values from an image.
MO_928: feature_extraction
PERSON: Jie Zheng
feature extraction
binding constant
10/6/11 BP: The distinction between binding datum and binding constant is based on the later being part of an equation. That should be captured in the logical definition here, and used to make it to a defined class.
A binding datum about the disposition of two or more material entities to form complexes which comes in the form of a scalar and unit that are utilized in equations that model the binding process
PERSON: Bjoern Peters, Randi Vita, Jason Greenbaum
The predicted or measured binding affinity of a peptide to a MHC molecule can be captured in the binding constants "IC50 = 12 nM" or "t 1/2 = 30 minutes".
binding constant
genetically modified material
GROUP: OBI
PERSON: Jie Zheng
a material entity, organism or cell, that is the output of a genetic transformation process.
genetically modified material
term is proposed by BP on Oct 25, 2010 dev call
genetic transformation objective
suggested to be added by BP and AR during Oct 25, 2010 dev call
Person: Jie Zheng
Person: Jie Zheng
a material transformation objective aims to create genetically modified organism or cell
genetic transformation objective
3D structural organization datum
3D structural organization datum
A measurement datum that describes the structural orientation of a material entity in 3D space.
PERSON: Jason Greenbaum, Randi Vita, Bjoern Peters
The atom coordinates found in a PDB (Protein Data Bank) file, generated by X Ray crystallography or NMR.
age since planting measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since planting, the process of placing a plant in media (e.g. soil) to allow it to grow, which excludes sowing.
Discussed by Jie and Chris, proposed to combine with different kinds of processes as initial time point. Proposed 'age measurement assay' is proceeded by some process. The process can be any kind of process defined in OBI. Think it is more flexible. However, it is hard to model due to lake of temporal predicates on Nov 15, 2010 dev call.
Term proposed by Bjoern on Nov 8, 2010 dev call
Supported by Alan on Nov 15, 2010 dev call
MO_495 planting
PERSON:Chris Stoeckert, Jie Zheng
age since planting measurement datum
age since hatching measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since hatching, the process of emergence from an egg.
MO_745 hatching
PERSON:Chris Stoeckert, Jie Zheng
age since hatching measurement datum
age measurement assay
An assay that measures the duration of temporal interval of a process that is part of the life of the bearer, where the initial time point of the measured process is the beginning of some transitional state of the bearer such as birth or when planted.
OBI group
PERSON: Alan Ruttenberg
This assay measures time not developmental stage. we recognize that development takes different time periods under different conditions such as media / temperature. For example, age measurement assay of fly age, the output likes 28 days but not mid-life of age at room temperature.
age measurement assay
age since egg laying measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since egg laying, the process of the production of egg(s) by an organism.
MO_767 egg laying
PERSON:Chris Stoeckert, Jie Zheng
age since egg laying measurement datum
age since germination measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since germination, the process consisting of physiological and developmental changes by a seed, spore, pollen grain (microspore), or zygote that occur after release from dormancy, and encompassing events prior to and including the first visible indications of growth.
Definition of germination comes from GO. However, the term is deprecated from GO now because it is a grouping term without biological significance.
MO_590 germination
PERSON:Chris Stoeckert, Jie Zheng
age since germination measurement datum
age since eclosion measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since eclosion, the process of emergence of an adult insect from its pupa or cocoon.
MO_876 eclosion
PERSON:Chris Stoeckert, Jie Zheng
age since eclosion measurement datum
age since sowing measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since sowing, the process of placing a seed or spore in some media with the intention to invoke germination.
MO_748 sowing
PERSON:Chris Stoeckert, Jie Zheng
age since sowing measurement datum
age since coitus measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since coitus, the process of copulation that occurs during the process of sexual reproduction.
MO_783 coitus
PERSON:Chris Stoeckert, Jie Zheng
age since coitus measurement datum
age measurement datum
A time measurement datum that is the result of measurement of age of an organism
In MageTab file, we use
initialTimePoint (a process) + age (a number expected) + TimeUnit (definied in UO, such as year, hour, day, etc.)
Now we use the term label indicating the start time point of measuring the age, (number + TimeUnit) are expected instances of the class
MO_178 Age
PERSON: Alan Ruttenberg, Chris Stoeckert, Jie Zheng
discussed on Nov 15, dev call
All subtype will be defined by textual definition now.
note that we are currently defining subtypes of age measurement datum that specify when the age is relative to, e.g. planting, as we don't have adequate temporal predicates yet.
life of bearer doesn't imply organism
this assay measures time not developmental stage. we recognize that development can take different time periods under different conditions such as media / temperature
age as a quality is dubious; we plan to revisit
stages in development are currently handled with controlled vocabulary, such as 2-somite stage
age measurement datum
age since fertilization measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since fertilization, the process of the union of gametes of opposite sexes during the process of sexual reproduction to form a zygote.
Definition of fertilization comes from GO.
MO_701 fertilization
PERSON:Chris Stoeckert, Jie Zheng
age since fertilization measurement datum
age since birth measurement datum
An age measurement datum that is the result of the measurement of the age of an organism since birth, the process of emergence and separation of offspring from the mother.
MO_710 birth
PERSON:Chris Stoeckert, Jie Zheng
age since birth measurement datum
half life datum (t 1/2)
Bjoern Peters
Bjoern Peters
The time it takes for 50% of a class of stochastic processes to occur.
half life datum (t 1/2)
t 1/2
dose response curve
A data item of paired values, one indicating the dose of a material, the other quantitating a measured effect at that dose. The dosing intervals are chosen so that effect values be interpolated by a plotting a curve.
Bjoern Peters; Randi Vita
dose response curve
half maximal effective concentration (EC50)
Bjoern Peters; Randi Vita
Determining the potentency of a drug / antibody / toxicant by measuring a graded dose response curve, and determining the concentration of the compound where 50% of its maximal effect is observed.
half maximal effective concentration (EC50)
half maximal effective concentration (EC50) is a scalar measurement datum corresponding to the concentration of a compound which induces a response halfway between the baseline and maximum after some specified exposure time.
wikipedia
binding datum
A data item that states if two or more material entities have the disposition to form a complex, and if so, how strong that disposition is.
Bjoern Peters; Randi Vita
binding datum
negative binding datum
A binding datum that states that there is no significant disposition of two or more entities to form a complex
negative binding datum
half maximal inhibitory concentration (IC50)
Bjoern Peters; Randi Vita
Half maximal inhibitory concentration (IC50) is a scalar measurement datum that measures the effectiveness of a compound to competitively inhibit a given process, and corresponds to the concentration of the compound at which it reaches half of its maximum inhibitory effect.
Interpolating that at a dose of IC50=12 nM, half of the binding of a comptetitive ligand is inhibited.
half maximal inhibitory concentration (IC50)
wikipedia
normalization testing design
Person: Chris Stoeckert, Jie Zheng
A study design that tests different normalization procedures.
MO_729 normalization_testing_design
normalization testing design
genetic population background information
Group: OBI group
Group: OBI group
a genetic characteristics information which is a part of genotype information that identifies the population of organisms
genetic population background information
genotype information 'C57BL/6J Hnf1a+/-' in this case, C57BL/6J is the genetic population background information
proposed and discussed on San Diego OBI workshop, March 2011
FWER adjusted p-value
FWER adjusted p-value
A quantitative confidence value resulting from a multiple testing error correction method which adjusts the p-value used as input to control for Type I error in the context of multiple pairwise tests
PERS:Philippe Rocca-Serra
adapted from wikipedia (http://en.wikipedia.org/wiki/Familywise_error_rate)
http://ugrad.stat.ubc.ca/R/library/LPE/html/mt.rawp2adjp.html
wild type organism genotype information
C57BL/6J wild type
Group: OBI group
Group: OBI group
a genotype information about an organism and includes information that there are no known modifications to the genetic background. Generally it is the genotype information of a representative individual from a class of organisms.
proposed and discussed on San Diego OBI workshop, March 2011
wild type organism genotype information
genotype information
Genotype information can be: Mus musculus wild type (in this case the genetic population background information is Mus musculus), C57BL/6J Hnf1a+/- (in this case, C57BL/6J is the genetic population background information and Hnf1a+/- is the allele information
Group: OBI group
Group: OBI group
a genetic characteristics information that is about the genetic material of an organism and minimally includes information about the genetic background and can in addition contain information about specific alleles, genetic modifications, etc.
discussed on San Diego OBI workshop, March 2011
genotype information
allele information
MO_58 Allele
Person: Chris Stoeckert, Jie Zheng
a genetic alteration information that about one of two or more alternative forms of a gene or marker sequence and differing from other alleles at one or more mutational sites based on sequence. Polymorphisms are included in this definition.
allele information
discussed on San Diego OBI workshop, March 2011
genotype information 'C57BL/6J Hnf1a+/-' in this case, Hnf1a+/- is the allele information
post-transcriptional modification design
Person: Chris Stoeckert, Jie Zheng
A study design in which a modification of the transcriptome, proteome (not genome) is made, for example RNAi, antibody targeting.
MO_392 cellular_modification_design
post transcription modification design?
or more clear RNAi design / antibody targeting design?
need to check the use cases
post-transcriptional modification design
genetic alteration information
Group: OBI group
Group: OBI group
a genetic characteristics information that is about known changes or the lack thereof from the genetic background, including allele information, duplication, insertion, deletion, etc.
genetic alteration information
proposed and discussed on San Diego OBI workshop, March 2011
wild type allele information
MO_605 genotype
Person: Chris Stoeckert, Jie Zheng
an allele information that is about the allele found most frequently in natural populations, or in standard laboratory stocks for a given organism.
discussed on San Diego OBI workshop, March 2011
wild type allele information
stimulus or stress design
Person: Chris Stoeckert, Jie Zheng
A study design in which the response of an organism(s) to the stress or stimulus is studied, e.g. osmotic stress, heat shock, radiation exposure, behavioral treatment etc.
MO_568 stimulus_or_stress_design
stimulus or stress design
genetic characteristics information
MO definition:
The genotype of the individual organism from which the biomaterial was derived. Individual genetic characteristics include polymorphisms, disease alleles, and haplotypes.
examples in ArrayExpress
wild_type
MutaMouse (CD2F1 mice with lambda-gt10LacZ integration)
AlfpCre; SNF5 flox/knockout
p53 knock out
C57Bl/6 gp130lox/lox MLC2vCRE/+
fer-15; fem-1
df/df
pat1-114/pat1-114 ade6-M210/ade6-M216 h+/h+ (cells are diploid)
MO_66 IndividualGeneticCharacteristics
Person: Chris Stoeckert, Jie Zheng
a data item that is about genetic material including polymorphisms, disease alleles, and haplotypes.
genetic characteristics information
dose response design
Person: Chris Stoeckert, Jie Zheng
A study design that examines the relationship between the size of the administered dose and the extent of the response.
MO_485 dose_response_design
dose response design
q-value
PMID: 20483222. Comp Biochem Physiol Part D Genomics Proteomics. 2008 Sep;3(3):234-42. Analysis of Sus scrofa liver proteome and identification of proteins differentially expressed between genders, and conventional and genetically enhanced lines.
"After controlling the false discovery rate (FDR</=0.1) using the Storey q value only four proteins (EPHX1, CAT, PAH, ST13) were shown to be differentially expressed between genders (Males/Females) and two proteins (SELENBP2, TAGLN) were differentially expressed between two lines (Transgenic/Conventional pigs)"
q-value
A quantitative confidence value that measures the minimum false discovery rate that is incurred when calling that test significant.
To compute q-values, it is necessary to know the p-value produced by a test and possibly set a false discovery rate level.
Adapted from several sources, including
http://.en/wikipedia.org/wiki/False_discovery_rate
http://svitsrv25.epfl.ch/R-doc/library/qvalue.html
FDR adjusted p-value
PERS:Philippe Rocca-Serra
genetic modification design
Person: Chris Stoeckert, Jie Zheng
A study design in which an organism(s) is studied that has had genetic material removed, rearranged, mutagenized or added, such as in a knock out.
MO_447 genetic_modification_design
genetic modification design
lowess group transformation
A lowess transformation where a potentially different normalization curve is generated and used for two or more groups (delineated by some criteria); criteria could include blocks (e.g. print-tip groups) on an array, or the day on which mass spectrometry was performed.
MO_861 lowess_group_normalization
Person: Elisabetta Manduchi
lowess group transformation
lowess transformation
A data transformation of normalizing ratio data by using a locally weighted polynomial regression (typically after a log transformation). The regression can be performed on log ratios resulting from the relation of two data sets versus the average log intensity data from the same two data sets or it can be performed on raw or log transformed values from one data set versus values from another. The goal could be to remove intensity-dependent dye-specific effects from the set of pair wise ratios. This method can be applied globally, or limited by one or more specified criteria.
MO_720 lowess_normalization
Person: Elisabetta Manduchi
lowess transformation
lowess global transformation
A lowess transformation where the same normalization curve is used for all members of the data set; e.g. Features on an array, picked spots on a gel, or measured metabolites in a sample.
MO_692 lowess_global_normalization
Person: Elisabetta Manduchi
lowess global transformation
sampling time measurement datum
A time measurement datum when an observation is made or a sample is taken from a material as measured from some reference point.
MO_738 timepoint
Person: Chris Stoeckert
sampling time measurement datum
time point
minimal inhibitory concentration
A scalar measurement datum that indicates the lowest concentration at which a specific compound significantly inhibits a process from occurring compared to in the absence of the compound.
Bjoern Peters, coordinated with Albert Goldfain
Created following request by Albert Goldfain
PERSON:Bjoern Peters
minimal inhibitory concentration
sequence assembly algorithm
NIAID GSCID-BRC
An algorithm used to assemble individual sequence reads into larger contiguous sequences (contigs). Assembly details include but are not limited to assembler type (overlap-layout-consensus, deBruijn), assembler version, and any relevant quality control information such as per cent known genes/ESTs captured.
Assembly Algorithm
NIAID GSCID-BRC metadata working group
Person: Chris Stoeckert, Jie Zheng
sequence assembly algorithm
PDB file
A 3d structural organization datum capturing the results of X-ray crystallography or NMR experiment that is formatted as specified by the Protein Databank (http://www.wwpdb.org/docs.html). A PDB file can describe the structure of multiple molecules, each of which has a different chain identifier assigned.
PDB file
PERSON: Bjoern Peters, Dorjee Tamang, Jason Greenbaum
The file found in the pdb with the identifier 3pe4
http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=3pe4
equilibrium dissociation constant (KD)
A binding constant defined as the ratio of kon over koff (on-rate of binding divided by off-rate)
IEDB
KD = 32 nM is the equilibrium dissociation rate found for peptide SIINFEKL binding to H-2 Kb
PERSON: Bjoern Peters, Randi Vita
equilibrium dissociation constant (KD)
comparative phenotypic assessment
6/1/2012: We will utilize 'comparative qualities' once they are available in BFO2
Interpreting data from assays that evaluate the qualities or dispositions inhering in an organism or organism part and comparing it to data from other organisms to make a conclusion about a phenotypic difference
Philly workshop 2011
Philly workshop 2011
comparative phenotypic assessment
equilibrium association constant (KA)
A binding constant defined as the ratio of koff over kon (off-rate of binding divided by on-rate)
IEDB
KA = 10^-12 M^-1 is the equilibirum association constant maximally found for antibody binding to haptens.
PERSON: Bjoern Peters, Randi Vita
equilibrium association constant (KA)
rate measurement datum
A scalar measurement datum that represents the number of events occuring over a time interval
IEDB
PERSON: Bjoern Peters, Randi Vita
The rate of disassociation of a peptide from a complex with an MHC molecule measured by the ratio of bound and unbound peptide per unit of time.
rate measurement datum
50% dissociation of binding temperature (Tm)
50% dissociation of binding temperature (Tm)
A binding datum that specifies the temperature at which half of the binding partners are forming a complex and the other half are unbound.
IEDB
PERSON: Bjoern Peters, Randi Vita
Preparing a complex of a purified HLA-A*02:01 bound to a specific peptide ligand, varying the temperature while detecting the fraction of bound complexes with a complex conformation specific antibody, and interpolating the temperature at which 50% of complexes are dissociated.
melting temperature (Tm)
equilibrium dissociation constant (KD) approximated by IC50
A measurement of an IC50 value under specific assay conditions approximates KD, namely the binding reaction is at an equilibrium, there is a single population of sites on the receptor that competitor and ligand are binding to, and the concentration of the receptor must be much less than the KD for the competitor and the ligand. In this case, according to Cheng and Prussoff, KD = IC50 / (1 + Lstot / KDs), in which Lstot is the total concentration of the labeled competitor and KDs is the KD value of that competitor.
PERSON: Bjoern Peters, Randi Vita
equilibrium dissociation constant (KD) approximated by IC50
http://dx.doi.org/10.1016/0006-2952(73)90196-2
DNA sequence data
8/29/11 call: This is added after a request from Melanie and Yu. They should review it further. This should be a child of 'sequence data', and as of the current definition will infer there.
A sequence data item that is about the primary structure of DNA
DNA sequence data
OBI call; Bjoern Peters
OBI call; Melanie Courtout
The part of a FASTA file that contains the letters ACTGGGAA
assigning gene property based on phenotypic assessment
Interpreting data from assays that evaluate the qualities or dispositions inhering in an organism or organism part and comparing it to data from other organisms that have a defined genetic difference, and assigning a property to the product of the targeted gene as a result.
Philly workshop 2011
Philly workshop 2011
assigning gene property based on phenotypic assessment
equilibrium dissociation constant (KD) approximated by EC50
A measurement of an EC50 value under specific assay conditions approximates KD, namely the binding reaction is at an equilibrium, and the concentration of the receptor must be much less than the KD for the ligand.
Assay Development: Fundamentals and Practices, By Ge Wu, page 74
PERSON: Bjoern Peters, Randi Vita
equilibrium dissociation constant (KD) approximated by EC50
half life of binding datum
A half life datum of the time it takes for 50% of bound complexes in an ensemble to disassociate in absence of re-association.
IEDB
PERSON: Bjoern Peters, Randi Vita
The 45 minute period in which one half of the complexes formed by peptide ligand bound to a HLA-A*0201molecule disassociate.
half life of binding datum
binding
9/28/11 BP: The disposition referenced is the one of the ligand to bind the molecule. This along with binding as a function / process needs to be figured out with GO which is inconsistent at this point.
A peptide binding to an MHC molecule to form a complex.
IEDB
PERSON: Bjoern Peters, Randi Vita
The process of material entities forming complexes.
binding
PDB file chain
A 3D structural organization datum that is part of a PDB file and has a specific chain identifier that identifies the entire information on a subset of the material entities
IEDB
PDB file chain
PERSON: Bjoern Peters, Dorjee Tamang, Jason Greenbaum
The 'D' chain in the PDB file 2BSE identifies the heavy chain of the antibody in the protein:antibody complex
binding off rate measurement datum (koff)
A rate measurement datum of how quickly bound complexes disassociate
IEDB
PERSON: Bjoern Peters, Randi Vita
binding off rate measurement datum (koff)
binding on rate measurement datum (kon)
A rate measurement datum of how quickly bound complexes form
IEDB
PERSON: Bjoern Peters, Randi Vita
binding on rate measurement datum (kon)
average depth of sequence coverage
NIAID GSCID-BRC
Depth of Coverage - Average
An average value of the depth of sequence coverage based both on external (e.g. Cot-based size estimates) and internal (average coverage in the assembly) measures of genome size.
NIAID GSCID-BRC metadata working group
Person: Chris Stoeckert, Jie Zheng
average depth of sequence coverage
specimen collection time measurement datum
NIAID GSCID-BRC
Specimen Collection Date
A time measurement datum that is the measure of the time when the specimens are collected.
NIAID GSCID-BRC metadata working group
Person: Chris Stoeckert, Jie Zheng
collection date
specimen collection time measurement datum
latitude coordinate measurement datum
NIAID GSCID-BRC
NIAID GSCID-BRC metadata working group
Person: Chris Stoeckert, Jie Zheng
Specimen Collection Location - Latitude
A measurement datum that is the measure of the latitude coordinate of a site.
latitude
latitude coordinate measurement datum
longitude coordinate measurement datum
A measurement datum that is the measure of the longitude coordinate of a site.
NIAID GSCID-BRC
NIAID GSCID-BRC metadata working group
Person: Chris Stoeckert, Jie Zheng
Specimen Collection Location - Longitude
longitude
longitude coordinate measurement datum
drawing a conclusion
Concluding that the length of the hypotenuse is equal to the square root of the sum of squares of the other two sides in a right-triangle.
Concluding that a gene is upregulated in a tissue sample based on the band intensity in a western blot. Concluding that a patient has a infection based on measurement of an elevated body temperature and reported headache. Concluding that there were problems in an investigation because data from PCR and microarray are conflicting.
A planned process in which new information is inferred from existing information.
drawing a conclusion
sequence assembly process
NIAID GSCID-BRC metadata working group
A data transformation that assembles two or more individual sequence reads into contiguous sequences (i.e., contigs).
Alejandra Gonzalez-Beltran
NIAID GSCID-BRC
PERSON: Jie Zheng, Chris Stoeckert
PRS/AGB:
changed to restrictions by adding 2 possible specified outputs (N50 and genome coverage) for sequence assembly.
Philippe Rocca-Serra
sequence assembly process
number of errors
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118. see table2
PRS, AGB
a data item that is the number of times that a given process failed, as an integer
number of errors
random access memory size
Alejandra Gonzalez-Beltran
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118.
"However, the error correction module in SOAPdenovo was designed for short Illumina reads (35-50 bp), which consumes an excessive amount of computational time and memory on longer reads, for example, over 150 GB memory running for two days using 40-fold 100 bp paired-end Illumina HiSeq 2000 reads"
PRS, AGB
Philippe Rocca-Serra
random access memory size
random access memory size is a scalar measurement datum which denotes the amount of physical memory know as random access memory present of a computer or required by a computational process or data transformation
random access memory
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
RAM
Random-access memory (RAM) is a form of computer data storage. A random-access device allows stored data to be accessed directly in any random order. In contrast, other data storage media such as hard disks, CDs, DVDs and magnetic tape, as well as early primary memory types such as drum memory, read and write data only in a predetermined order, consecutively, because of mechanical design limitations. Therefore, the time to access a given data location varies significantly depending on its physical location
http://en.wikipedia.org/wiki/RAM
last accessed: 2013-12-02
random access memory
testable hypothesis
Group:2013 Philly Workshop group
An information content entity that expresses an assertion that is intended to be tested.
In the Philly 2013 workshop, we recognized the limitations of "hypothesis textual entity", and we introduced this as more general. The need for the 'textual entity' term going forward is up for future debate.
Group:2013 Philly Workshop group
hypothesis
that fucoidan has a small statistically significant effect on AT3 level but no useful clinical effect as in-vivo anticoagulant, a paraphrase of part of the last paragraph of the discussion section of the paper 'Pilot clinical study to evaluate the anticoagulant activity of fucoidan', by Lowenthal et. al.PMID:19696660
testable hypothesis
conclusion based on data
Group:2013 Philly Workshop group
An information content entity that is inferred from data.
Group:2013 Philly Workshop group
conclusion based on data
In the Philly 2013 workshop, we recognized the limitations of "conclusion textual entity", and we introduced this as more general. The need for the 'textual entity' term going forward is up for future debate.
The conclusion that a gene is upregulated in a tissue sample based on the band intensity in a western blot. The conclusion that a patient has a infection based on measurement of an elevated body temperature and reported headache. The conclusion that there were problems in an investigation because data from PCR and microarray are conflicting.
The following are NOT conclusions based on data: data themselves; results from pure mathematics, e.g. "13 is prime".
computation run time
Alejandra Gonzalez-Beltran
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118.
See Table 4
PRS,AGB
Philippe Rocca-Serra
computation run time
computation run time datum
computation run time is a time measurement datum which corresponds the time expressed in second, minute, hour necessary for a computer program to complete a process execution, for example genome assembly. It is an important metrics as it indicates the resource occupancy and computer program efficiency.
categorical value specification
PERSON:Bjoern Peters
categorical value specification
A value specification that is specifies one category out of a fixed number of nominal categories
scalar value specification
1
1
scalar value specification
A value specification that consists of two parts: a numeral and a unit label
PERSON:Bjoern Peters
value specification
This term is currently a descendant of 'information content entity', which requires that it 'is about' something. A value specification of '20g' for a measurement data item of the mass of a particular mouse 'is about' the mass of that mouse. However there are cases where a value specification is not clearly about any particular. In the future we may change 'value specification' to remove the 'is about' requirement.
The value of 'positive' in a classification scheme of "positive or negative"; the value of '20g' on the quantitative scale of mass.
value specification
PERSON:Bjoern Peters
An information content entity that specifies a value within a classification scheme or on a quantitative scale.
genome coverage
A beginner's guide to eukaryotic genome annotation. Yandell M, Ence D.
Nat Rev Genet. 2012 Apr 18;13(5):329-42. doi: 10.1038/nrg3174.
PMID: 22510764
A data item that is the total number of bases in reads, divided by genome size, assumed to be the reference size (for instance of 3.10 Gb for human and 2.73 Gb for mouse) and refers to the percentage of the genome that is contained in the assembly based on size estimates; these are usually based on cytological techniques. Genome coverage of 90–95% is generally considered to be good, as most genomes contain a considerable fraction of repetitive regions that are difficult to sequence. So it is not a cause for concern if the genome coverage of an assembly is a bit less than 100%.
Alejandra Gonzalez-Beltran
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118.
"The genome coverage increased from 81.16% to 93.91%"
genome coverage
Philippe Rocca-Serra
N50
the weighted median item size or N50 is a weighted median of the lengths of items, equal to the length of the longest item i such that the sum of the lengths of items greater than or equal in length to i is greater than or equal to half the length of all of the items. With regard to assemblies the items are typically contigs or scaffolds. It therefore denotes the ability of the software to create contigs and provides information about the resulting sequence assembly
Alejandra Gonzalez-Beltran
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118.
"Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp"
N50
Philippe Rocca-Serra
adapted from:
"http://genome.cshlp.org/content/21/12/2224.full?sid=74019122-f944-4ccc-bffe-d16fdd0e7d6c"
(from table 7)
and from "http://www.nature.com/nrg/journal/v14/n3/full/nrg3367.html"
weighted median item size
contig N50
N50 statistic computed for the contigs produced by the assembly process. A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list.
Alejandra Gonzalez-Beltran
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118.
"Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp"
Philippe Rocca-Serra
adapted from: nature:http://www.nature.com/nrg/journal/v13/n5/box/nrg3174_BX1.html
contig N50
scaffold N50
Alejandra Gonzalez-Beltran
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
PMID: 23587118.
"Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp"
N50 statistic computed for the scaffold produced by the assembly process. The method for computing the value is similar to that of contig N50 but uses scaffold information instead of contig information
Philippe Rocca-Serra
adapted from: nature:http://www.nature.com/nrg/journal/v13/n5/box/nrg3174_BX1.html
scaffold N50
sequencing library multiplexing
OBI
PERSON:Philippe Rocca-Serra
a planned process which consists in running a set of samples as a pool in one single instrument run of data acquisition process while retaining the ability to associate individual results to each of the individual input samples thanks to the use of a multiplex identifier, introduced during the ligation step of the individual library preparation and specific to a given sample.
http://www.illumina.com/technology/multiplexing_sequencing_assay.ilmn
sequencing library multiplexing
sequence library deconvolution
PERSON: Philippe Rocca-Serra
PRS for OBI
is a data transformation which uses sequence alignment and 'multiplex identifier sequence' information to pull together all reads belonging to a given single sample following the sequencing of a multiplexed library which combining several samples in one sequencing event
sequence library deconvolution
multiplexing sequence identifier
A multiplexing sequence identifier is a nucleic acid sequence which is used in a ligation step of library preparation process to allow pooling of samples while maintaining ability to identify individual source material and creation of a multiplexed library
OBI
PERSON:Philippe Rocca-Serra
We designed primers specifically to amplify protease and reverse transcriptase from Brazilian HIV subtypes and developed a multiplexing scheme using multiplex identifier tags to minimize cost while providing more robust data than traditional genotyping techniques. in http://www.ncbi.nlm.nih.gov/pubmed/22574170
multiplexing sequence identifier
operational taxonomic unit matrix
Operational Taxonomic Unit matrix is a data item, organized as a table, where organismal taxonomic units, computed by sequence analysis and genetic distance calculation, are counted in a set of biological or environmental samples. The table is used to appraise biodiversity of a population or community of living organism.
PERSON:Philippe Rocca-Serra
PRS for OBI
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1610290/
operational taxonomic unit matrix
multiplexed sequencing library
DNA-barcoded sequencing library
PERSON:Philippe Rocca-Serra
PRS for OBI
a multiplexed library is a material entity which is the output of a library preparation process that uses a ligation step to attach a unique multiplexing sequence identifier to a specific sample, then mixes several such tagged samples prior to the library amplification process proper. A multiplexed library allows the sequencing of several samples in one sequencing run.
http://www.ncbi.nlm.nih.gov/pubmed/24997861
Nat Methods. 2014 Aug;11(8):834-40. doi: 10.1038/nmeth.3022. Epub 2014 Jul 6.
Accelerated chromatin biochemistry using DNA-barcoded nucleosome libraries.
multiplexed sequencing library
organism
10/21/09: This is a placeholder term, that should ideally be imported from the NCBI taxonomy, but the high level hierarchy there does not suit our needs (includes plasmids and 'other organisms')
13-02-2009:
OBI doesn't take position as to when an organism starts or ends being an organism - e.g. sperm, foetus.
This issue is outside the scope of OBI.
GROUP: OBI Biomaterial Branch
A material entity that is an individual living system, such as animal, plant, bacteria or virus, that is capable of replicating or reproducing, growth and maintenance in the right environment. An organism may be unicellular or made up, like humans, of many billions of cells divided into specialized tissues and organs.
WEB: http://en.wikipedia.org/wiki/Organism
animal
fungus
organism
plant
virus
specimen
Biobanking of blood taken and stored in a freezer for potential future investigations stores specimen.
Note: definition is in specimen creation objective which is defined as an objective to obtain and store a material entity for potential use as an input during an investigation.
PERSON: James Malone
PERSON: Philippe Rocca-Serra
A material entity that has the specimen role.
GROUP: OBI Biomaterial Branch
specimen
data processing
Philippe Rocca-Serra
The application of a clustering protocol to microarray data or the application of a statistical testing method on a primary data set to determine a p-value.
A planned process that produces output data from input data.
Branch editors
Elisabetta Manduchi
Helen Parkinson
James Malone
Melanie Courtot
Richard Scheuermann
Ryan Brinkman
Tina Hernandez-Boussard
data analysis
data transformation
data transformation
logistic-log curve fitting
A logistic-log curve fitting is a curve fitting where a curve of the form y=d+((a-d)/(1+(x/c)^b)) is obtained, where a, b, c, and d are determined so to optimize its fit to the input data points (x_1, y_1), (x_2, y_2), ..., (x_n, y_n).
ARTICLE: Plikaytis B.D. et al. (1991), J. Clin. Microbiol. 29(7): 1439-1448
Elisabetta Manduchi
James Malone
Melanie Courtot
Ryan Brinkman
Typically used in an enzyme-linked immunosorbent assay (ELISA) to model the relationship between optical density (OD) and dilution. In this case a and d correspond to the theoretical OD of the assay at zero and infinite concentrations, respectively; c is the dilution associated with the point of symmetry of the sigmoid and is located at the midpoint of the assay found at the inflection point of the curve; b is a curvature parameter and is related to the slope of the curve.
logistic-log curve fitting
logit-log curve fitting
A logit-log curve fitting is a curve fitting where first the limits y_0 an y_infty of y when x->0 and x->infinity, respectively, are estimated from the input data points (x_1, y_1), (x_2,y_2), ..., (x_n, y_n). Then a curve with equation log((y-y_0)/(y_infty-y))=a+b log(x) is obtained, where a and b are determined to optimize its fit to the input data points.
ARTICLE: Plikaytis B.D. et al. (1991), J. Clin. Microbiol. 29(7): 1439-1448
Elisabetta Manduchi
James Malone
Melanie Courtot
Ryan Brinkman
The above definition refers to the 'fully specified' logit-log model. The reduced form of this, when it is assumed that y_0=0, is named 'partially specified' logit-log model.
Typically used in an enzyme-linked immunosorbent assay (ELISA) to model the relationship between optical density (OD) and dilution. In this case OD_0 (also referred to OD_min) and OD_infty (also referred to OD_max) correspond to the theoretical OD of the assay at zero and infinite concentrations, respectively.
logit-log curve fitting
log-log curve fitting
A log-log curve fitting is a curve fitting where first a logarithmic transformation is applied both to the x and the y coordinates of the input data points (x_1, y_1), (x_2, y_2), ..., (x_n, y_n), and then coefficients a and b are determined to optimize the fit of log(y)=a+b*log(x) to these input data points.
ARTICLE: Plikaytis B.D. et al. (1991), J. Clin. Microbiol. 29(7): 1439-1446
Elisabetta Manduchi
James Malone
Melanie Courtot
Ryan Brinkman
Typically used in an enzyme-linked immunosorbent assay (ELISA) to model the relationship between optical density (OD) and dilution.
log-log curve fitting
feature extraction objective
Elisabetta Manduchi
A feature extraction objective is a data transformation objective where the aim of the data transformation is to generate quantified values from a scanned image.
James Malone
TERM: http://mged.sourceforge.net/ontologies/MGEDOntology.owl#feature_extraction
feature extraction objective
biexponential transformation
A biexponential transformation is a data transformation that, for each (one dimensional) real number input x, outputs an approximation (found, e.g. with the Newton's method) to a solution y of the equation B(y)-x=0, where B denotes a b transformation.
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry.
WEB: http://flowcyt.sourceforge.net/gating/latest.pdf
biexponential transformation
box-cox transformation
A box-cox transformation is a data transformation according to the methods of Box and Cox as described in the article Box, G. E. P. and Cox, D.R. (1964) An analysis of transformations. Journal of Royal Statistical Society, Series B, vol. 26, pp. 211-246.
ARTICLE: Box, G. E. P. and Cox, D.R. (1964), "An analysis of transformations", Journal of Royal Statistical Society, Series B, vol. 26, pp. 211-246.
Ryan Brinkman
box-cox transformation
hyperlog transformation
A hyperlog transformation ia a data transformation that, for each (one dimensional) real number input x, outputs an approximation (found, e.g. with the Newton's method) to a solution y of the equation EH(y)-x=0, where EH denotes an eh transformation.
ARTICLE: Bagwell C.B. (2006), "Hyperlog - a flexible log-like transform for negative, zero, and positive valued data", Cytometry A 64, 34-42."
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry
http://flowcyt.sourceforge.net/gating/latest.pdf
hyperlog transformation
loess scale group transformation one-channel
A loess scale group transformation one-channel is a loess scale group transformation consisting in the application of a scale adjustment following a loess group transformation one-channel, to render the M group variances similar.
Elisabetta Manduchi
Loess scale group normalization applied to data from two one-channel expression microarray assays.
OTHER: Editor's adjustment based on MGED Ontology term
loess scale group transformation one-channel
logical transformation
A logical transformation is a data transformation that, for each (one dimensional) real number input x, outputs an approximation (found, e.g. with the Newton's method) to a solution y of the equation S(y)-x=0, where S denotes an s transformation.
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry.
WEB: http://flowcyt.sourceforge.net/gating/latest.pdf
logical transformation
loess scale group transformation two-channel
A loess scale group transformation two-channel is a loess scale group transformation consisting in the application of a scale adjustment following a loess group transformation two-channel, to render the M group variances similar.
Elisabetta Manduchi
Loess scale group normalization applied to data from a two-channel expression microarray assay.
OTHER: Adjusted from MGED Ontology
loess scale group transformation two-channel
loess global transformation one-channel
A loess global transformation one-channel is a loess global transformation in the special case where the input is the result of an MA transformation applied to intensities from two related one-channel assays.
Elisabetta Manduchi
Loess global normalization applied to data from two one-channel expression microarray assays, where the curve is obtained using all reporters. The goal is to remove intensity-dependent biases.
OTHER: Editor's generalization based on MGED Ontology term
loess global transformation one-channel
split-scale transformation
A split-scale transformation is a data transformation which is an application of a function f described as follows to a (one dimensional) real number input. f(x)=a*x+b if x=for x>t; where log denotes a logarithmic transformation and a, b, c, d, r, t are real constants, with a, c, d, r, t positive, chosen so that f is continuous with a continuous derivative at the transition point t.
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry
WEB: http://flowcyt.sourceforge.net/gating/latest.pdf
split-scale transformation
loess global transformation two-channel
A loess global transformation two-channel is a loess global transformation in the special case where the input the result of an MA transformation applied to intensities from the two channels of a two-channel assay.
Elisabetta Manduchi
Loess global normalization applied to data from a two-channel expression microarray assay, where the curve is obtained using all reporters. The goal is to remove intensity-dependent biases.
OTHER: Editor's generalization based on MGED Ontology term
loess global transformation two-channel
sine transformation
A sine transformation is a data transformation which consists in applying the sine function to a (one dimensional) real number input. The sine function is one of the basic trigonometric functions and a definition is provided, e.g., at http://mathworld.wolfram.com/Sine.html.
Elisabetta Manduchi
WEB: http://mathworld.wolfram.com/Sine.html
sine transformation
sine(0)=0, sine(pi/2)=1, sine(pi)=0, sine(3*pi/2)=-1, sine(pi/6)=1/2, sine(x+2*k*pi)=sine(x) where k is any integer, etc.
cosine transformation
Philippe Rocca-Serra
A cosine transformation is a data transformation which consists in applying the cosine function to a (one dimensional) real number input. The cosine function is one of the basic trigonometric functions and a definition is provided, e.g., at http://mathworld.wolfram.com/Cosine.html.
Elisabetta Manduchi
WEB: http://mathworld.wolfram.com/Cosine.html
cosine transformation
cosine(0)=1, cosine(pi/2)=0, cosine(pi)=-1, cosine(3*pi/2)=0, cosine(pi/3)=1/2, cosine(x+2*k*pi)=cosine(x) where k is any integer, etc.
loess group transformation one-channel
A loess group transformation one-channel is a loess group transformation in the special case where the input is the result of an MA transformation applied to intensities from two related one-channel assays.
A loess group transformation one-channel is a loess group transformation in the special case where the input is the result of an MA transformation applied to intensities from two related one-channel assays.
Elisabetta Manduchi
OTHER: Editor's generalization based on MGED Ontology term
loess group transformation one-channel
loess group transformation two-channel
A loess group transformation two-channel is a loess group transformation in the special case where the input is the result of an MA transformation applied to intensities from the two channels of a two-channel assay.
A loess group transformation two-channel is a loess group transformation in the special case where the input is the result of an MA transformation applied to intensities from the two channels of a two-channel assay.
Elisabetta Manduchi
OTHER: Editor's generalization based on MGED Ontology term
loess group transformation two-channel
homogeneous polynomial transformation
A homogeneous polynomial transformation is a polynomial transformation where all the term of the polynomial have the same degree.
Elisabetta Manduchi
WEB: http://mathworld.wolfram.com/HomogeneousPolynomial.html
a*x, with a non-zero, is a homogeneous polynomial of degree 1 in 1 variable, a*x^2, with a non-zero, is a homogeneous polynomial of degree 2 in 1 variable; a_1*x_1+...+a_n*x_n, with at least one of the a_i's non-zero, is a homogeneous polynomial of degree one in n variables; a*x_n^3+b*x_1*x_2*x_3, with at least one of a and b non-zero, is a homogeneous polynomial of degree 3 in n variables.
homogeneous polynomial transformation
linlog transformation
Philippe Rocca-Serra
A linlog transformation is a data transformation, described in PMID 16646782, whose input is a matrix [y_ik] and whose output is a matrix obtained by applying formula (9) of this paper, where values below an appropriately determined threshold (dependent on the row i) are transformed via a polynomial of degree 1, and values above this threshold are transformed via a logarithm.
Elisabetta Manduchi
PMID: 16646782
This can be used for microarray normalization, e.g. to normalize the data from a two-channel expression microarray assay, as described in PMID 16646782.
linlog transformation
variance stabilizing transformation
A variance stabilizing transformation is a data transformation, described in PMID 12169536, whose input is a matrix [y_ik] and whose output is a matrix obtained by applying formula (6) in this paper. One of the goals is to obtain an output matrix whose rows have equal variances. The method relies on various assumptions described in the paper.
Elisabetta Manduchi
James Malone
Melanie Courtot
PMID: 12169536
This can be used for expression microarray assay normalization and it is referred to as "variance stabilizing normalization", according to the procedure described e.g. in PMID 12169536.
variance stabilising transformation
variance stabilizing transformation
loess global transformation
Philippe Rocca-Serra
A loess global transformation is a loess transformation where only one loess fitting is performed, utilizing one subset of (or possibly all of) the data points in the input so that there is only one resulting loess curve y=f(x) which is used for the transformation.
Elisabetta Manduchi
James Malone
Melanie Courtot
OTHER: Editor's generalization based on MGED Ontology term
loess global transformation
loess group transformation
Philippe Rocca-Serra
A loess group transformation is a loess transformation where the input is partitioned into groups and for each group a loess fitting is performed, utilizing a subset of (or possibly all of) the data points in that group. Thus, a collection of loess curves y=f_i(x) is generated, one per group. Each (x, y) in the input is transformed into (x, y-f_i(x)), where f_i(x) is the curve corresponding to the group to which that data point belongs.
Elisabetta Manduchi
James Malone
Melanie Courtot
OTHER: Editor's generalization based on MGED Ontology term
loess group transformation
loess scale group transformation
A loess scale group transformation is a data transformation consisting in the application of a scale adjustment following a loess group transformation, to render the group variances for the second variable (y) similar. Has objective scaling.
Elisabetta Manduchi
James Malone
Melanie Courtot
OTHER: Editor's generalization based on MGED Ontology term
loess scale group transformation
total intensity transformation single
Philippe Rocca-Serra
A total intensity transformation single is a data transformation that takes as input an n-dimensional (real) vector and multiplies each component of this vector by a coefficient, where the coefficient is obtained by taking the sum of the input components or of a subset of these, multiplied by a constant of choice.
Elisabetta Manduchi
Helen Parkinson
James Malone
Melanie Courtot
Note that if the word "sum" is replaced by the word "mean" in the definition, the resulting definition is equivalent.
OTHER: Adjusted from MGED Ontology
This can be used as a simple normalization method for expression microarray assays. For example, each intensity from a one-channel microarray assay is multiplied by a constant so that the output mean intensity over the microarray equals a desired target T (the multiplicative constant in this case is the T/(mean intensity)).
total intensity transformation single
total intensity transformation paired
Philippe Rocca-Serra
A total intensity transformation paired is a data transformation that takes as input two n-dimensional (real) vectors and multiplies each component of the first vector by a coefficient, where the coefficient is obtained by taking the ratio of the sum of the second input components or of a subset of these by the sum of the first input components or of a subset of these (the same subset is used for the two vectors).
Elisabetta Manduchi
Note that if the word "sum" is replaced by the word "mean" in the definition, the resulting definition is equivalent.
OTHER: Adjusted from MGED Ontology
This can be used as a simple normalization method for the two channels from a two-channel expression microarray assay or from two related one-channel expression microarray assays.
total intensity transformation paired
quantile transformation
A quantile transformation is a data transformation that takes as input a collection of data sets, where each can be thought as an n-dimensional (real) vector, and which transforms each data set so that the resulting output data sets have equal quantiles.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
This can be used for expression microarray assay normalization and it is referred to as "quantile normalization", according to the procedure described e.g. in PMID 12538238.
quantile transformation
mean centering
Philippe Rocca-Serra
A mean centering is a data transformation that takes as input an n-dimensional (real) vector, performs a mean calculation on its components, and subtracts the resulting mean from each component of the input.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
This can be used as a normalization method in expression microarray assays. For example, given a two-channel microarray assay, the log ratios of the two channels (M values) can be mean-centered.
mean centering
mean centring
median centering
Philippe Rocca-Serra
A median centering is a data transformation that takes as input an n-dimensional (real) vector, performs a median calculation on its components, and subtracts the resulting median from each component of the input.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
This can be used as a normalization method in expression microarray assays. For example, given a two-channel microarray assay, the log ratios of the two channels (M values) can be median-centered.
median centering
median centring
differential expression analysis objective
A differential expression analysis objective is a data transformation objective whose input consists of expression levels of entities (such as transcripts or proteins), or of sets of such expression levels, under two or more conditions and whose output reflects which of these are likely to have different expression across such conditions.
Analyses implemented by the SAM (http://www-stat.stanford.edu/~tibs/SAM), PaGE (www.cbil.upenn.edu/PaGE) or GSEA (www.broad.mit.edu/gsea/) algorithms and software
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
differential expression analysis objective
K-fold cross validation method
K-fold cross-validation randomly partitions the original sample into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K - 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used
Person:Helen Parkinson
Tina Boussard
K-fold cross validation method
leave one out cross validation method
2009-11-10. Tracker: https://sourceforge.net/tracker/?func=detail&aid=2893049&group_id=177891&atid=886178
Person:Helen Parkinson
The authors conducted leave-one-out cross validation to estimate the strength and accuracy of the differentially expressed filtered genes. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/3/368
is a data transformation : leave-one-out cross-validation (LOOCV) involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data
leave one out cross validation method
jackknifing method
Helen Parkinson
Jacknifing is a re-sampling data transformation process used to estimate the precision of sampling statistics and is a resampling method
http://en.wikipedia.org/wiki/Resampling_%28statistics%29
simple weighting procedure is suggested for combining information over alleles and loci, and sample variances may be estimated by a jackknife procedure
jackknifing
jackknifing method
boostrapping
Although widely accepted that high throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved.
Bootstrapping is a data transformation process which estimates the precision of sampling statistics by drawing randomly with replacement from a set of data points
Helen Parkinson
Bootstrapping is a statistical method for estimating the sampling distribution of a statistic by sampling with replacement from the original data, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient
boostrapping
Benjamini and Hochberg false discovery rate correction method
A data transformation process in which the Benjamini and Hochberg method sequential p-value procedure is applied with the aim of correcting false discovery rate
Helen Parkinson
Helen Parkinson
Statistical significance of the 8 most represented biological processes (GO level 4) among E7 6 month upregulated genes following analysis with DAVID software; Benjamini-Hochberg FDR (false discovery rate)
2011-03-31: [PRS].
specified input and output of dt which were missing
Benjamini and Hochberg false discovery rate correction method
Philippe Rocca-Serra
pareto scaling
A pareto scaling is a data transformation that divides all measurements of a variable by the square root of the standard deviation of that variable.
Elisabetta Manduchi
PMID:16762068
Philippe Rocca-Serra
pareto scaling
modular decomposition
Molecular decomposition is the partition of a network into distinct subgraphs for the purpose of identifying functional clusters. The network data is run through any of several existing algorithms designed to partition a network into distinct subgraphs for the purpose of isolating groups of functionally linked biological elements such as proteins.
Tina Hernandez-Boussard
editor
modular decomposition
k-means clustering
Elisabetta Manduchi
Philippe Rocca-Serra
A k-means clustering is a data transformation which achieves a class discovery or partitioning objective, which takes as input a collection of objects (represented as points in multidimensional space) and which partitions them into a specified number k of clusters. The algorithm attempts to find the centers of natural clusters in the data. The most common form of the algorithm starts by partitioning the input points into k initial sets, either at random or using some heuristic data. It then calculates the mean point, or centroid, of each set. It constructs a new partition by associating each point with the closest centroid. Then the centroids are recalculated for the new clusters, and the algorithm repeated by alternate applications of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed).
James Malone
WEB: http://en.wikipedia.org/wiki/K-means
k-means clustering
hierarchical clustering
A hierarchical clustering is a data transformation which achieves a class discovery objective, which takes as input data item and builds a hierarchy of clusters. The traditional representation of this hierarchy is a tree (visualized by a dendrogram), with the individual input objects at one end (leaves) and a single cluster containing every object at the other (root).
James Malone
WEB: http://en.wikipedia.org/wiki/Data_clustering#Hierarchical_clustering
hierarchical clustering
average linkage hierarchical clustering
An average linkage hierarchical clustering is an agglomerative hierarchical clustering which generates successive clusters based on a distance measure, where the distance between two clusters is calculated as the average distance between objects from the first cluster and objects from the second cluster.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
average linkage hierarchical clustering
complete linkage hierarchical clustering
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
an agglomerative hierarchical clustering which generates successive clusters based on a distance measure, where the distance between two clusters is calculated as the maximum distance between objects from the first cluster and objects from the second cluster.
complete linkage hierarchical clustering
single linkage hierarchical clustering
Elisabetta Manduchi
A single linkage hierarchical clustering is an agglomerative hierarchical clustering which generates successive clusters based on a distance measure, where the distance between two clusters is calculated as the minimum distance between objects from the first cluster and objects from the second cluster.
PERSON: Elisabetta Manduchi
single linkage hierarchical clustering
Benjamini and Yekutieli false discovery rate correction method
A data transformation in which the Benjamini and Yekutieli method is applied with the aim of correcting false discovery rate
Helen Parkinson
Helen Parkinson
The expression set was compared univariately between the stroke patients and controls, gene list was generated using False Discovery Rate correction (Benjamini and Yekutieli)
2011-03-31: [PRS].
specified input and output of dt which were missing
Benjamini and Yekutieli false discovery rate correction method
Philippe Rocca-Serra
dimensionality reduction
Philippe Rocca-Serra
data projection
A dimensionality reduction is data partitioning which transforms each input m-dimensional vector (x_1, x_2, ..., x_m) into an output n-dimensional vector (y_1, y_2, ..., y_n), where n is smaller than m.
Elisabetta Manduchi
James Malone
Melanie Courtot
PERSON: Elisabetta Manduchi
PERSON: James Malone
PERSON: Melanie Courtot
dimensionality reduction
principal components analysis dimensionality reduction
Philippe Rocca-Serra
A principal components analysis dimensionality reduction is a dimensionality reduction achieved by applying principal components analysis and by keeping low-order principal components and excluding higher-order ones.
Elisabetta Manduchi
James Malone
Melanie Courtot
PERSON: Elisabetta Manduchi
PERSON: James Malone
PERSON: Melanie Courtot
pca data reduction
principal components analysis dimensionality reduction
probabilistic algorithm
A probabilistic algorithm is one which involves an element of probability or randomness in the transformation of the data.
James Malone
PERSON: James Malone
probabilistic algorithm
expectation maximization
EM is a probabilistic algorithm used to estimate the maximum likelihood of parameters from existing data where the model involves unobserved latent variables. The input to this method is the data model for which the estimation is performed over and the output is an approximated probability function.
James Malone
PERSON: James Malone
expectation maximization
global modularity calculation
A network graph quality calculation in which an input data set of subgraph modules and their in-degree and out-degree qualities is used to calculate the average modularity of subgraphs within the network.
PERSON: Richard Scheuermann
Richard Scheuermann
global modularity calculation
dye swap merge
A dye swap merge is a replicate analysis which takes as input data from paired two-channel microarray assays where the sample labeled with one dye in the first assay is labeled with the other dye in the second assay and vice versa. The output for each reporter is obtained by combining its (raw or possibly pre-processed) M values in the two assays, where the M value in an assay is defined as the difference of the log intensities in the two channels. This can be used as a normalization step, when appropriate assumptions are met.
Elisabetta Manduchi
James Malone
PERSON: Elisabetta Manduchi
PERSON: James Malone
dye swap merge
moving average
Philippe Rocca-Serra
A moving average is a data transformation in which center calculations, usually mean calculations, are performed on values within a sliding window across the input data set.
Elisabetta Manduchi
Helen Parkinson
PERSON: Elisabetta Manduchi
PERSON: Helen Parkinson
The moving average is often used to handle data from tiling arrays.
moving average
replicate analysis
A replicate analysis is a data transformation in which data from replicates are combined, e.g. through descriptive statistics calculations, and the results might be utilized for a variety of purposes, like assessing reproducibility, identifying outliers, normalizing, etc.
Elisabetta Manduchi
Helen Parkinson
PERSON: Helen Parkinson
PERSON:Elisabetta Manduchi
Replicate analysis can be used in microarray analysis to identify and potentially exclude low quality data.
replicate analysis
b cell epitope prediction
A B cell epitope prediction takes as input an antigen sequence, and through an analysis of this sequence, produces as output a prediction of the likelihood the biomaterial is a B Cell Epitope.
Helen Parkinson
PERSON: Helen Parkinson
b cell epitope prediction
mhc binding prediction
An MHC binding prediction takes an input of a biomaterial sequence and through an analysis of this sequence, produces as output a prediction of the likelihood that the biomaterial will bind to an MHC molecule.
Helen Parkinson
PERSON: Helen Parkinson
mhc binding prediction
t cell epitope prediction
A T cell epitope prediction takes as input an antigen sequence, and through an analysis of this sequence, produces as output a prediction of the likelihood the biomaterial is a T cell epitope.
Helen Parkinson
PERSON: Helen Parkinson
t cell epitope prediction
data imputation
ARTICLE: Little, RJA and Rubin, DB (2002). Statistical Analysis with Missing Data, Second Edition. John Wiley: Hoboken New Jersey, pp. 59-60.
Imputation is a means of filling in missing data values from a predictive distribution of the missing values. The predictive distribution can be created either based on a formal statistical model (i,e, a multivariate normal distribution) or an algorithm.
Monnie McGee
data imputation
continuum mass spectrum
PERSON: James Malone
A continuum mass spectrum is a data transformation that contains the full profile of the detected signals for a given ion.
PERSON: Tina Boussard
PERSON: Tina Hernandez-Boussard
continuum mass spectrum
characteristic path length calculation
PERSON: Tina Hernandez-Boussard
PERSON: Tina Hernandez-Boussard
Quantifying subgraph navigability based on shortest-path length averaged over all pairs of subgraph vertices
characteristic path length calculation
centroid mass spectrum
centroid mass spectrum calculation
centroiding
A centroid mass spectrum is a data transformation in which many points are used to delineate a mass spectral peak, is converted into mass-centroided data by a data compression algorithm. The centroided mass peak is located at the weighted center of mass of the profile peak. The normalized area of the peak provides the mass intensity data.
Person:Tina Hernandez-Boussard
centroid mass spectrum
centroid mass spectrum
Holm-Bonferroni family-wise error rate correction method
Person:Helen Parkinson
WEB: http://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method
a data transformation that performs more than one hypothesis test simultaneously, a closed-test procedure, that controls the familywise error rate for all the k hypotheses at level ? in the strong sense. Objective: multiple testing correction
t-tests were used with the type I error adjusted for multiple comparisons, Holm's correction (HOLM 1979), and false discovery rate, http://www.genetics.org/cgi/content/full/172/2/1179
2011-03-14: [PRS]. Class Label has been changed to address the conflict with the definition
Also added restriction to specify the output to be a FWER adjusted p-value
The 'editor preferred term' should be removed
Holm-Bonferroni family-wise error rate correction method
Philippe Rocca-Serra
edge weighting
Edge weighting is the substitution or transformation of edge length using numerical data. Data input include a symmetric adjacency matrix for a network and a second data set, for example a list of interactor pairs and a confidence score associated with the experimental detection of each pair's interaction. Each element in the adjacency matrix is transformed or replaced with the corresponding number in the second data set. Output data are a modified adjacency matrix reflecting the transformed state of the network.
Tina Hernandez-Boussard
edge weighting
editor
was classified under algorithm class which is not acceptable super-class
TO BE DEALT WITH STILL BY RICHARD. JAMES
loess transformation
Philippe Rocca-Serra
A loess transformation is a data transformation that takes as input a collection of real number pairs (x, y) and, after performing (one or more) loess fittings, utilizes the resulting curves to transform each (x, y) in the input into (x, y-f(x)) where f(x) is one of the fitted curves.
Elisabetta Manduchi
James Malone
Melanie Courtot
OTHER: Editor's generalization based on MGED Ontology term
loess transformation
curve fitting data transformation
A curve fitting is a data transformation that has objective curve fitting and that consists of finding a curve which matches a series of data points and possibly other constraints.
Elisabetta Manduchi
James Malone
Melanie Courtot
WEB: http://en.wikipedia.org/wiki/Curve_fitting
curve fitting data transformation
family wise error rate correction method
A family wise error rate correction method is a multiple testing procedure that controls the probability of at least one false positive.
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 19
Monnie McGee
2011-03-31: [PRS].
creating a defined class by specifying the necessary output of dt
allows correct classification of FWER dt
FWER correction
Philippe Rocca-Serra
family wise error rate correction method
submatrix extraction
A submatrix extraction is a projection whose input is a matrix and whose output is a matrix obtained by selecting certain rows and columns from the input. (Note that, if one represents the input matrix as a vector obtained by concatenating its rows, then extracting a submatrix is equivalent to projecting this vector into that composed by the entries belonging to the rows and columns of interest from the input matrix.)
Elisabetta Manduchi
James Malone
Melanie Courtot
Note that this can be considered as a special case of projection if one represents the input matrix as a vector obtained by concatenating its rows. Then extracting a submatrix is equivalent to projecting this vector into the entries belonging to the rows and columns of interest from the input matrix.
WEB: http://en.wikipedia.org/wiki/Submatrix
When presented with the data from an expression microarray experiment in the form of a matrix, whose rows correspond to genes and whose columns correspond to samples, if one filters some of the genes and/or some of the samples out, the resulting data set corresponds to a submatrix of the original set.
submatrix extraction
row submatrix extraction
A row submatrix extraction is a submatrix extraction where all the columns of the input matrix are retained and selection only occurs on the rows.
Elisabetta Manduchi
James Malone
Melanie Courtot
PERSON: Elisabetta Manduchi
PERSON: James Malone
PERSON: Melanie Courtot
When presented with the data from an expression microarray experiment in the form of a matrix, whose rows correspond to genes and whose columns correspond to samples, if one filters some of the genes out, the resulting data set corresponds to a row submatrix of the original set.
row submatrix extraction
column submatrix extraction
A column submatrix extraction is a submatrix extraction where all the rows of the input matrix are retained and selection only occurs on the columns.
Elisabetta Manduchi
James Malone
Melanie Courtot
PERSON: Elisabetta Manduchi
PERSON: James Malone
PERSON: Melanie Courtot
When presented with the data from an expression microarray experiment in the form of a matrix, whose rows correspond to genes and whose columns correspond to samples, if one filters some of the samples out, the resulting data set corresponds to a column submatrix of the original set.
column submatrix extraction
gating
Gating is a property-based vector selection with the objective of partitioning a data vector set into vector subsets based on dimension values of individual vectors (events), in which vectors represent individual physical particles (often cells) of a sample and dimension values represent light intensity qualities as measured by flow cytometry.
James Malone
Josef Spidlen
Melanie Courtot
PERSON: James Malone
PERSON: Josef Spidlen
PERSON: Richard Scheuermann
PERSON: Ryan Brinkman
PERSON:Melanie Courtot
Richard Scheuermann
Ryan Brinkman
gating
descriptive statistical calculation objective
A descriptive statistical calculation objective is a data transformation objective which concerns any calculation intended to describe a feature of a data set, for example, its center or its variability.
Elisabetta Manduchi
James Malone
Melanie Courtot
Monnie McGee
PERSON: Elisabetta Manduchi
PERSON: James Malone
PERSON: Melanie Courtot
PERSON: Monnie McGee
descriptive statistical calculation objective
mean calculation
Philippe Rocca-Serra
A mean calculation is a descriptive statistics calculation in which the mean is calculated by taking the sum of all of the observations in a data set divided by the total number of observations. It gives a measure of the 'center of gravity' for the data set. It is also known as the first moment.
From Monnie's file comments - need to add moment_calculation and center_calculation roles but they don't exist yet - (editor note added by James Jan 2008)
James Malone
Monnie McGee
PERSON: James Malone
PERSON: Monnie McGee
mean calculation
network analysis
network topology analysis
A data transformation that takes as input data that describes biological networks in terms of the node (a.k.a. vertex) and edge graph elements and their characteristics and generates as output properties of the constituent nodes and edges, the sub-graphs, and the entire network.
PERSON: Richard Scheuermann
Richard Scheuermann
network analysis
sequence analysis objective
James Malone
PERSON: James Malone
A sequence analysis objective is a data transformation objective which aims to analyse some ordered biological data for sequential patterns.
sequence analysis objective
longitudinal data analysis
PERSON: James Malone
PERSON: Tina Boussard
correlation analysis
Longitudinal analysis is a data transformation used to perform repeated observations of the same items over long periods of time.
longitudinal data analysis
longitudinal data analysis
survival analysis objective
A data transformation objective which has the data transformation aims to model time to event data (where events are e.g. death and or disease recurrence); the purpose of survival analysis is to model the underlying distribution of event times and to assess the dependence of the event time on other explanatory variables
Kaplan meier data transformation
PERSON: James Malone
PERSON: Tina Boussard
http://en.wikipedia.org/wiki/Survival_analysis
survival analysis
survival analysis objective
mass spectrometry analysis
A data transformation which has the objective of spectrum analysis.
mass spectrometry analysis
spread calculation data transformation
EDITOR
A spread calculation is a data transformation that has objective spread calculation.
James Malone
spread calculation data transformation
Kaplan Meier
PERSON: James Malone
PERSON: Tina Boussard
a nonparametric (actuarial) data transformation technique for estimating time-related events. It is a univariate analysis that estimates the probability of the proportion of subjects in remission at a particular time, starting from the initiation of active date (time zero), and takes into account those lost to follow-up or not yet in remission at end of study (censored)
http://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator
Kaplan Meier
multiple testing correction method
A multiple testing correction method is a hypothesis test performed simultaneously on M > 1 hypotheses. Multiple testing procedures produce a set of rejected hypotheses that is an estimate for the set of false null hypotheses while controlling for a suitably define Type I error rate
Monnie McGee
PAPER: Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 9-10.
multiple testing correction method
multiple testing procedure
inter-rater reliability objective
A study was conducted to determine the inter-rater reliability of common clinical examination procedures proposed to identify patients with lumbar segmental instability.
Examples include joint-probability of agreement, Cohen's kappa and the related Fleiss' kappa, inter-rater correlation, concordance correlation coefficient and intra-class correlation.
Person:Alan Ruttenberg
Person:Helen Parkinson
a data transformation objective of determining the concordance or agreement between human judges.
http://en.wikipedia.org/wiki/Inter-rater_reliability
inter-rater agreement
inter-rater reliability objective
Westfall and Young family wise error rate correction
Helen Parkinson
Is a data transformation process in which the Westfall and Young method is applied with the aim of controlling for multiple testing
2011-03-31: [PRS].
specified input and output of dt which were missing
PRS: 2011-03-31: set specified input and specified output to the data transformation
Westfall and Young FWER correction
Westfall and Young family wise error rate correction
polynomial transformation
A polynomial transformation is a data transformation that is obtained through a polynomial, where a polynomial is a mathematical expression involving a sum of powers in one or more variables multiplied by coefficients (e.g. see http://mathworld.wolfram.com/Polynomial.html). The number of variables and the degree are properties of a polynomial. The degree of a polynomial is the highest power of its terms, where the terms of a polynomial are the individual summands with the coefficients omitted.
Elisabetta Manduchi
WEB: http://mathworld.wolfram.com/Polynomial.html
a*x+b, with a non-zero, is a polynomial of degree one in one variable; a*x^2+b*x+c, with a nonzero, is a polynomial of degree 2 in 1
variable; a*x*y+b*y+c, with a non-zero, is a polynomial of degree 2 in 2 variables (x and y); a_1*x_1+...+a_n*x_n+b, with at least one of the a_i's non-zero, is a polynomial of degree one in n variables
polynomial transformation
logarithmic transformation
A logarithmic transformation is a data transformation consisting in the application of the logarithm function with a given base a (where a>0 and a is not equal to 1) to a (one dimensional) positive real number input. The logarithm function with base a can be defined as the inverse of the exponential function with the same base. See e.g. http://en.wikipedia.org/wiki/Logarithm.
Elisabetta Manduchi
WEB: http://en.wikipedia.org/wiki/Logarithm
logarithmic transformation
exponential transformation
An exponential transformation is a data transformation consisting in the application of the exponential function with a given base a (where a>0 and a is typically not equal to 1) to a (one dimensional) real number input. For alternative definitions and properties of this function see, e.g., http://en.wikipedia.org/wiki/Exponential_function and http://en.wikipedia.org/wiki/Characterizations_of_the_exponential_function.
Elisabetta Manduchi
WEB: http://en.wikipedia.org/wiki/Characterizations_of_the_exponential_function
WEB: http://en.wikipedia.org/wiki/Exponential_function
exponential transformation
non-negative matrix factorization
Non negative matrix factorization is a data transformation in which factorises a matrix and which forces that all elements must be equal to or greater than zero.
Non-negative matrix factorization is used in text mining where document-term matrix is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents. This matrix is factored into a term-feature and a feature-document matrix.
http://en.wikipedia.org/wiki/Non-negative_matrix_factorization
non-negative matrix factorization
soft independent modeling of class analogy analysis
SIMCA
Soft independent modeling by class analogy (SIMCA) is a descriptive statistics method for supervised classification of data. The method requires a training data set consisting of samples (or objects) with a set of attributes and their class membership. The term soft refers to the fact the classifier can identify samples as belonging to multiple classes and not necessarily producing a classification of samples into non-overlapping classes.
Tina Hernandez-Boussard
WEB: http://en.wikipedia.org/wiki/Soft_independent_modelling_of_class_analogies
soft independent modeling of class analogy analysis
discriminant function analysis
Discriminant function analysis is a form of discriminant analysis used to determine which variables discriminate between two or more naturally occurring groups. Analysis is used to determine which variable(s) are the best predictors of a particular outcome.
Tina Hernandez-Boussard
WEB: http://www.statsoft.com/textbook/stdiscan.html
discriminant function analysis
canonical variate analysis
CVA
Tina Hernandez-Boussard
WEB: http://en.wikipedia.org/wiki/Canonical_analysis
canonical variate analysis
canonical variate analysis is a form of discriminant analysis that takes several continuous predictor variables and uses the entire set to predict several criterion variables, each of which is also continuous. CVA simultaneously calculates a linear composite of all x variables and a linear composite of all y variables. Unlike other multivariate techniques, these weighted composites are derived in pairs. Each linear combination is called a canonical variate and takes the general linear form.
linear discriminant functional analysis
Linear discriminant functional analysis (LDFA) is a multivariate technique used in special applications where there are several intact groups (random assignment may be impossible) and they have been measured on several independent measures. Thus, you will want to describe how these groups differ on the basis of these measures. In this case, classification and prediction is the main objective.
PERSON: Tina Hernandez-Boussard
Tina Hernandez-Boussard
linear discriminant functional analysis
regression analysis method
BOOK: Richard A. Berk, Regression Analysis: A Constructive Critique, Sage Publications (2004) 978-0761929048
Regression analysis is a descriptive statistics technique that examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). Regression analysis can be used as a descriptive method of data analysis (such as curve fitting) without relying on any assumptions about underlying processes generating the data.
Tina Hernandez-Boussard
regression analysis method
multiple linear regression analysis
Tina Hernandez-Boussard
WEB:http://en.wikipedia.org/wiki/Linear_regression
multiple linear regression analysis
multiple linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xi, i = 1, ..., p, and a random term epsilon. The model can be written as
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots +\beta_p X_p + \varepsilon
where \beta_0 = 0 is the intercept ("constant" term), the \beta_i s are the respective parameters of independent variables, and p is the number of parameters to be estimated in the linear regression.
principal component regression
The Principal Component Regression method is a regression analysis method that combines the Principal Component Analysis (PCA)spectral decomposition with an Inverse Least Squares (ILS) regression method to create a quantitative model for complex samples. Unlike quantitation methods based directly on Beer's Law which attempt to calculate the absorbtivity coefficients for the constituents of interest from a direct regression of the constituent concentrations onto the spectroscopic responses, the PCR method regresses the concentrations on the PCA scores.
Tina Hernandez-Boussard
WEB: : http://www.thermo.com/com/cda/resources/resources_detail/1,2166,13414,00.html
principal component regression
partial least square regression analysis
ARTICLE: de Jong, S. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18: 251-263.
PLS-RA
Partial least squares regression is an extension of the multiple linear regression model (see, e.g., Multiple Regression or General Stepwise Regression). In its simplest form, a linear model specifies the (linear) relationship between a dependent (response) variable Y, and a set of predictor variables, the X's, so that
Y = b0 + b1X1 + b2X2 + ... + bpXp
In this equation b0 is the regression coefficient for the intercept and the bi values are the regression coefficients (for variables 1 through p) computed from the data.
Tina Hernandez-Boussard
partial least square regression analysis
discriminant analysis
Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. Analysis is used to determine which variable(s) are the best predictors of a particular outcome.
Tina Hernandez-Boussard
WEB: http://www.statsoft.com/textbook/stdiscan.html
discriminant analysis
partial least square discriminant analysis
PLS Discriminant Analysis (PLS-DA) is a discriminant analysis performed in order to sharpen the separation between groups of observations, by hopefully rotating PCA (Principal Components Analysis) components such that a maximum separation among classes is obtained, and to understand which variables carry the class separating information.
WEB: http://www.camo.com/rt/Resources/pls-da.html
James Malone
PLS-DA
partial least square discriminant analysis
eh transformation
An eh transformation is a data transformation obtained by applying the function EH described in what follows to a (one dimensional) real number input. EH(x)=exp(x*d/r)+b*(d/r)*x-1, if x>=0, and EH(x)=-exp(-x*d/r)+b*(d/r)*x+1, otherwise. Here exp denotes an exponential transformation and b, d, r are positive real constants with the objective of normalization.
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry.
WEB: http://flowcyt.sourceforge.net/gating/latest.pdf
eh transformation
b transformation
A b transformation is a data transformation obtained by applying the function B described in what follows to a (one dimensional) real number input. B(x)= a*exp(b*x)-c*exp(-d*x)+f, where exp denotes an exponential transformation and a, b, c, d, f are real constants with a, b, c, d positive with the objective of normalization.
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry.
WEB: http://flowcyt.sourceforge.net/gating/latest.pdf
b transformation
s transformation
An s transformation is a data transformation obtained by applying the function S described in what follows to a (one dimensional) real number input. S(x)=T*exp(w-m)*(exp(x-w)-(p^2)*exp((w-x)/p)+p^2-1) if x>=w, S(x)=-S(w-x) otherwise; where exp denotes an exponential_transformations, 'p^' denotes the exponential transformation with base p; T, w, m, p are real constants with T, m, and p positive and w non-negative, and where w and p are related by w=2p*ln(p)(p+1) with the objective of normalization.
Elisabetta Manduchi
Joseph Spliden
Ryan Brinkman
This type of transformation is typically used in flow cytometry.
WEB: http://flowcyt.sourceforge.net/gating/latest.pdf
s transformation
data visualization
Generation of a heatmap from a microarray dataset
Possible future hierarchy might include this:
information_encoding
>data_encoding
>>image_encoding
data encoding as image
An planned process that creates images, diagrams or animations from the input data.
Elisabetta Manduchi
James Malone
Melanie Courtot
PERSON: Elisabetta Manduchi
PERSON: James Malone
PERSON: Melanie Courtot
PERSON: Tina Boussard
Tina Boussard
data visualization
visualization
similarity calculation
A similarity calculation is a data transformation that attaches to each pair of objects in the input a number that is meant to reflect how 'close' or 'similar' those objects are.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
similarity calculation
euclidean distance calculation
An euclidean distance calculation is a similarity calculation that attaches to each pair of real number vectors of the same dimension n the square root of the sum of the square differences between corresponding components. The smaller this number, the more similar the two vectors are considered.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
euclidean distance calculation
pearson correlation coefficient calculation
A pearson correlation coefficient calculation is a similarity calculation which attaches to each pair of random variables X and Y the ratio of their covariance by the product of their standard deviations. Given a series of n measurements of X and Y written as x_i and y_i where i = 1, 2, ..., n, then their Pearson correlation coefficient refers to the "sample correlation coefficient" and is written as the sum over i of the ratios (x_i-xbar)*(y_i-ybar)/((n-1)*s_x*s_y) where xbar and ybar are the sample means of X and Y , s_x and s_y are the sample standard deviations of X and Y. The closer the pearson correlation coefficient is to 1, the more similar the inputs are considered.
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
WEB: http://en.wikipedia.org/wiki/Correlation
pearson correlation coefficient calculation
loess fitting
Philippe Rocca-Serra
A loess fitting is a curve fitting obtained by localized regression. The latter refers to fitting a polynomial (straight line, quadratic, cubic, etc) to data values within a window covering a fraction of the total number of observations. As the window slides along the axis, a new polynomial is fit to the observations falling within the window. This continues until all points are fit with a local polynomial. The results are then smoothed together to form a curve. The smoothness of loess fits is controlled by a smoothing parameter (often denoted as alpha, usually between 1/4 and 1) and the degree of the polynomial that is fitted by the method (usually denoted by lambda).
ARTICLE: Mathematical details of loess fits are given in Cleveland, William (1993) Visualizing Data. Hobart Press, Summit, New Jersey, pp. 94-101.
Monnie McGee
loess fitting
mode calculation
A mode calculation is a descriptive statistics calculation in which the mode is calculated which is the most common value in a data set. It is most often used as a measure of center for discrete data.
From Monnie's file comments - need to add center_calculation role but it doesn't exist yet - (editor note added by James Jan 2008)
James Malone
Monnie McGee
PERSON: James Malone
PERSON: Monnie McGee
mode calculation
quantile calculation
A quantile calculation is a descriptive statistics calculation in which the kth quantile is the data value for which an approximate k fraction of the data is less than or equal to that value. See http://www.stat.wvu.edu/SRS/Modules/Quantiles/quantiles.html for details.
Monnie McGee
WEB: http://www.stat.wvu.edu/SRS/Modules/Quantiles/quantiles.html
quantile calculation
median calculation
A median calculation is a descriptive statistics calculation in which the midpoint of the data set (the 0.5 quantile) is calculated. First, the observations are sorted in increasing order. For an odd number of observations, the median is the middle value of the sorted data. For an even number of observations, the median is the average of the two middle values.
From Monnie's file comments - need to add center_calculation role but it doesn't exist yet - (editor note added by James Jan 2008)
James Malone
Monnie McGee
PERSON: James Malone
PERSON: Monnie McGee
median calculation
variance calculation
A variance calculation is a descriptive statistics calculation in which the variance is defined as the average squared distance of each observation in the data set to the mean of the data set. It is also known as the second central moment.
From Monnie's file comments - need to add spread_calculation and moment_calculation roles but they don't exist yet - (editor note added by James Jan 2008)
Monnie McGee
PERSON: Monnie McGee
variance calculation
standard deviation calculation
A standard deviation calculation is a descriptive statistics calculation defined as the square root of the variance. Also thought of as the average distance of each value to the mean.
From Monnie's file comments - need to add spread calculation role but they doesn't exist yet - (editor note added by James Jan 2008)
Monnie McGee
PERSON: Monnie McGee
standard deviation calculation
interquartile-range calculation
From Monnie's file comments - need to add spread calculation role but they doesn't exist yet - (editor note added by James Jan 2008)
Monnie McGee
PERSON: Monnie McGee
The interquartile range is a descriptive statistics calculation defined as the difference between the 0.75 quantile and the 0.25 quantile for a set of data.
interquartile-range calculation
skewness calculation
A skewness calculation is a descriptive statistics calculation defined as a parameter that describes how much a distribution (or a data set) varies from a bell-shaped curve. See http://www.riskglossary.com/link/skewness.htm for details. It is also known as the third central moment
From Monnie's file comments - need to add moment calculation role but they doesn't exist yet - (editor note added by James Jan 2008)
Monnie McGee
WEB: http://www.riskglossary.com/link/skewness.htm
skewness calculation
kurtosis calculation
A kurtosis calculation is a descriptive statistics calculation defined as a parameter that measures how large or small the tails of a distribution are relative to the mean. For details, see http://davidmlane.com/hyperstat/A53638.html
From Monnie's file comments - need to add moment calculation role but they doesn't exist yet - (editor note added by James Jan 2008)
Monnie McGee
WEB: http://davidmlane.com/hyperstat/A53638.html
kurtosis calculation
data combination
data pooling
A data transformation in which individual input data elements and values are merged together into a output set of data elements and values.
Richard Scheuermann
data combination
editor
network graph construction
A network analysis in which an input data set describing objects and relationships between objects is transformed into an output representation of these objects as nodes and the relationships as edges of a network graph.
PERSON: Richard Scheuermann
Richard Scheuermann
network graph construction
weighted network graph construction
A network graph construction in which an input data set describing objects and quantitative relationships between objects is transformed into and output representation of these objects as nodes and the quantitative relationships as weighted edges of a network graph.
PERSON: Richard Scheuermann
Richard Scheuermann
weighted network graph construction
directed network graph construction
A network graph construction in which an input data set describing objects and directional relationships between objects is transformed into and output representation of these objects as nodes and the directional relationships as directed edges of a network graph.
PERSON: Richard Scheuermann
Richard Scheuermann
directed network graph construction
node quality calculation
A network analysis in which an input data set describing node objects and edge relationships between node objects is used to determine the output quality of one of the node objects in the network.
PERSON: Richard Scheuermann
Richard Scheuermann
node quality calculation
node degree calculation
A node quality calculation in which an input data set describing object nodes and relationship edges between object nodes is used to enumerate the number of unique relationships of an individual object node.
PERSON: Richard Scheuermann
Richard Scheuermann
node degree calculation
quantitative node degree calculation
A node quality calculation in which an input data set describing object nodes and quantitative relationship edges between object nodes is used to sum all of the quantitative relationships of an individual object node.
PERSON: Richard Scheuermann
Richard Scheuermann
quantitative node degree calculation
node in-degree calculation
A node quality calculation in which an input data set describing object nodes and directional relationship edges between object nodes is used to enumerate the number of unique relationships pointing into an individual object node.
PERSON: Richard Scheuermann
Richard Scheuermann
node in-degree calculation
node out-degree calculation
A node quality calculation in which an input data set describing object nodes and directional relationship edges between object nodes is used to enumerate the number of unique relationships pointing out of an individual object node.
PERSON: Richard Scheuermann
Richard Scheuermann
node out-degree calculation
node shortest path identification
A node quality calculation in which a path describing the shortest path needed to transverse through connected nodes and edges to arrive at a specific target node in the network.
PERSON: Richard Scheuermann
Richard Scheuermann
node shortest path identification
edge quality calculation
A network analysis in which an input data set describing node objects and edge relationships between node objects is used to determine the output quality of one of the edge relationships in the network.
PERSON: Richard Scheuermann
Richard Scheuermann
edge quality calculation
edge betweenness calculation
An edge quality calculation in which the input is a data sets of shortest paths between all pairs of node in the network and the output is the sum of all shortest paths that traverse the specific edge.
PERSON: Richard Scheuermann
Richard Scheuermann
edge betweenness calculation
network subgraph quality calculation
A network analysis in which an input data set describing node objects and edge relationships between node objects is used to determine the output quality of a subgraph partition of the network.
PERSON: Richard Scheuermann
Richard Scheuermann
network subgraph quality calculation
subgraph degree calculation
A network subgraph quality calculation in which an input data set describing subgraphs and relationship edges between subgraphs and other network objects is used to enumerate the number of unique relationships of an individual subgraph.
PERSON: Richard Scheuermann
Richard Scheuermann
subgraph degree calculation
quantitative subgraph degree calculation
A network subgraph quality calculation in which an input data set describing subgraphs and quantitative relationship edges between subgraphs and other network objects is used to sum the quantitative relationships of an individual subgraph.
PERSON: Richard Scheuermann
Richard Scheuermann
quantitative subgraph degree calculation
mathematical feature
PERSON: James Malone
James Malone
This class is temporary and will be placed outside of data transformation ultimately (if it still remains at all after review)
feature is a (parent_class) that describes a characteristic, trait or quality of a data transformation
mathematical feature
log base
Elisabetta Manduchi
The log base is a feature of a logarithmic function which is defined in http://en.wikipedia.org/wiki/Logarithm. Its value can be any positive real number different from 1.
WEB: http://en.wikipedia.org/wiki/Logarithm
logarithm base
logarithmic base
log base
subgraph in-degree calculation
A network subgraph quality calculation in which an input data set describing subgraphs and directional relationship edges between subgraphs and other network objects is used to enumerate the number of unique relationships pointing into an individual subgraph.
PERSON: Richard Scheuermann
Richard Scheuermann
subgraph in-degree calculation
subgraph out-degree calculation
A network subgraph quality calculation in which an input data set describing subgraphs and relationship edges between subgraphs and other network objects is used to enumerate the number of unique relationships pointing out of an individual subgraph.
PERSON: Richard Scheuermann
Richard Scheuermann
subgraph out-degree calculation
intra subgraph connectivity calculation
A network subgraph quality calculation in which an input data set describing internal nodes, edges and node degrees is used to determine the average node degree within the subgraph.
PERSON: Richard Scheuermann
Richard Scheuermann
intra subgraph connectivity calculation
subgraph modularity calculation
A network subgraph quality calculation in which an input data set of subgraph in-degree and out-degree qualities is used to calculate the ratio of indegree to outdegree as a measure of modularity.
PERSON: Richard Scheuermann
Richard Scheuermann
subgraph modularity calculation
network graph quality calculation
A network analysis in which an input data set describing node objects and edge relationships between node objects is used to determine the output quality of the network as a whole.
PERSON: Richard Scheuermann
Richard Scheuermann
network graph quality calculation
unit-variance scaling
A unit-variance scaling is a data transformation that divides all measurements of a variable by the standard deviation of that variable.
Elisabetta Manduchi
PMID:16762068
Philippe Rocca-Serra
autoscaling
unit-variance scaling
MA transformation
An MA transformation is a data transformation which takes as input a collection of data points (g_1, r_1), (g_2, r_2), ..., (g_n, r_n) with the r_i and g_i positive real numbers, and whose output is the collection of data points (A_1, M_1), (A_2, M_2), ..., (A_n, M_n) where, for each i, A_i=(log(g_i)+log(r_i))/2 and M_i=log(r_i)-log(g_i). Here log denotes a logarithmic transformation.
Elisabetta Manduchi
MA transformation
MA transformations are typically used in microarray data analyses. In this context, the g_i and r_i represent the reporter intensities in the two channels of a 2-channel assay or the reporter intensities in two related one-channel assays. Typically the base used for the logarithm is 2.
PERSON: Elisabetta Manduchi
Philippe Rocca-Serra
exponential base
Elisabetta Manduchi
The exponential base is a feature of an exponential function which is defined in http://en.wikipedia.org/wiki/Exponential_function. Its value can be any positive real number (typically different from 1).
WEB: http://en.wikipedia.org/wiki/Exponential_function
exponential base
polynomial degree
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
The polynomial degree is a feature of a polynomial function defined as the highest power of the polynomial's terms, where the terms of a polynomial are the individual summands with the coefficients omitted.
polynomial degree
number of variables
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
The number of variables is a feature of any function (including polynomial functions) with domain contained in an n-dimensional vector space and is defined as n, the dimension of such space.
number of variables
agglomerative hierarchical clustering
Elisabetta Manduchi
bottom-up hierarchical clustering
An agglomerative hierarchical clustering is a hierarchical clustering which starts with separate clusters and then successively combines these clusters until there is only one cluster remaining.
James Malone
PERSON: Elisabetta Manduchi
agglomerative hierarchical clustering
divisive hierarchical clustering
Elisabetta Manduchi
top-down hierarchical clustering
A divisive hierarchical clustering is a hierarchical clustering which starts with a single cluster and then successively splits resulting clusters until only clusters of individual objects remain.
James Malone
PERSON: Elisabetta Manduchi
divisive hierarchical clustering
data partitioning
Data partitioning is a data transformation with the objective of partitioning or separating input data into output subsets.
James Malone
PERSON: Melanie Courtot
PERSON: Richard Scheuermann
PERSON: Ryan Brinkman
data partitioning
data vector reduction objective
James Malone
Data vector reduction is a data transformation objective in which k m-dimensional input vectors are reduced to j m-dimensional output vectors, where j is smaller than k.
PERSON: Richard H. Scheuermann
Richard H. Scheuermann
data vector reduction objective
generalized family wise error rate correction method
A generalized FWER correction method is a multiple testing procedure that controls the probability of at least k+1 false positives, where k is a user-supplied integer.
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 19
Monnie McGee
gFWER correction
generalized family wise error rate correction method
quantile number of false positives correction method
A quantile number of false positives correction method is a MTP that controls for the pth quantile of the distribution of the number of false positives out of the total number of tests performed'
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 19
Monnie McGee
QNFP
quantile number of false positives correction method
tail probability for the proportion of false positives correction method
A TPPFP correction method is a MTP that controls the probability that the proportion of false positives among all rejected hypotheses is no greater than a constant q, where q is between 0 and 1.
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 20
Monnie McGee
TPPFP correction method
tail probability for the proportion of false positives correction method
false discovery rate correction method
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 21 and http://www.wikidoc.org/index.php/False_discovery_rate
Monnie McGee
The false discovery rate is a data transformation used in multiple hypothesis testing to correct for multiple comparisons. It controls the expected proportion of incorrectly rejected null hypotheses (type I errors) in a list of rejected hypotheses. It is a less conservative comparison procedure with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors. .
2011-03-31: [PRS].
creating a defined class by specifying the necessary output of dt
allows correct classification of FDR dt
FDR correction method
Philippe Rocca-Serra
false discovery rate correction method
proportion of expected false positives correction method
A proportion of expected false positives correction method is a multiple testing procedure that controls the ratio of the expected value of the numbers of false positives to the expected value of the numbers of rejected hypotheses.
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 21
Monnie McGee
PEFP correction method
proportion of expected false positives correction method
quantile proportion of false positives correction method
A quantile proportion of false positives correction method is a multiple testing procedure that controls the pth quantile of the distribution of the proportion of false positives among the rejected hypothesis (false discovery rate).
Dudoit, Sandrine and van der Laan, Mark J. (2008) Multiple Testing Procedures with Applications to Genomics. New York: Springer , p. 21
Monnie McGee
QPFP correction method
quantile proportion of false positives correction method
data transformation objective
Modified definition in 2013 Philly OBI workshop
An objective specification to transformation input data into output data
James Malone
PERSON: James Malone
data transformation objective
normalize objective
data normalization objective
Elisabetta Manduchi
Helen Parkinson
PERSON: Elisabetta Manduchi
PERSON: Helen Parkinson
A normalization objective is a data transformation objective where the aim is to remove
systematic sources of variation to put the data on equal footing in order
to create a common base for comparisons.
James Malone
PERSON: James Malone
Quantile transformation which has normalization objective can be used for expression microarray assay normalization and it is referred to as "quantile normalization", according to the procedure described e.g. in PMID 12538238.
data normalization objective
correction objective
PERSON: James Malone
PERSON: Melanie Courtot
A correction objective is a data transformation objective where the aim is to correct for error, noise or other impairments to the input of the data transformation or derived from the data transformation itself
James Malone
Type I error correction
correction objective
normalization data transformation
James Malone
A normalization data transformation is a data transformation that has objective normalization.
PERSON: James Malone
normalization data transformation
averaging data transformation
James Malone
An averaging data transformation is a data transformation that has objective averaging.
PERSON: James Malone
averaging data transformation
partitioning data transformation
James Malone
A partitioning data transformation is a data transformation that has objective partitioning.
PERSON: James Malone
partitioning data transformation
partitioning objective
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
A k-means clustering which has partitioning objective is a data transformation in which the input data is partitioned into k output sets.
A partitioning objective is a data transformation objective where the aim is to generate a collection of disjoint non-empty subsets whose union equals a non-empty input set.
James Malone
partitioning objective
background correction objective
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
A background correction objective is a data transformation objective where the aim is to remove irrelevant contributions from the measured signal, e.g. those due to instrument noise or sample preparation.
James Malone
background correction objective
curve fitting objective
Elisabetta Manduchi
PERSON: Elisabetta Manduchi
A curve fitting objective is a data transformation objective in which the aim is to find a curve which matches a series of data points and possibly other constraints.
James Malone
curve fitting objective
class discovery data transformation
James Malone
clustering data transformation
unsupervised classification data transformation
A class discovery data transformation (sometimes called unsupervised classification) is a data transformation that has objective class discovery.
PERSON: James Malone
class discovery data transformation
Fisher's exact test
Fisher's exact test
Fisher's exact test is a data transformation used to determine if there are nonrandom associations between two Fisher's exact test is a statistical significance test used in the analysis of contingency tables where sample sizes are small where the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.
James Malone
WEB:http://mathworld.wolfram.com/FishersExactTest.html
center calculation objective
PERSON: James Malone
A center calculation objective is a data transformation objective where the aim is to calculate the center of an input data set.
A mean calculation which has center calculation objective is a data transformation in which the center of the input data is discovered through the calculation of a mean average.
James Malone
center calculation objective
class discovery objective
PERSON: Elisabetta Manduchi
PERSON: James Malone
clustering objective
A class discovery objective (sometimes called unsupervised classification) is a data transformation objective where the aim is to organize input data (typically vectors of attributes) into classes, where the number of classes and their specifications are not known a priori. Depending on usage, the class assignment can be definite or probabilistic.
James Malone
class discovery objective
discriminant analysis objective
unsupervised classification objective
class prediction objective
PERSON: Elisabetta Manduchi
PERSON: James Malone
classification objective
A class prediction objective (sometimes called supervised classification) is a data transformation objective where the aim is to create a predictor from training data through a machine learning technique. The training data consist of pairs of objects (typically vectors of attributes) and
class labels for these objects. The resulting predictor can be used to attach class labels to any valid novel input object. Depending on usage, the prediction can be definite or probabilistic. A classification is learned from the training data and can then be tested on test data.
James Malone
class prediction objective
supervised classification objective
spread calculation objective
Person:Helen Parkinson
Spread calculation can be achieved by use of a standard deviation, which measures distance from the mean
is a data transformation objective whereby the aim is to the calculate the spread of a dataset, spread is a descriptive statistic which describes the variability of values in a data set
Awaiting English definition from Monnie McGee
James Malone
spread calculation objective
center calculation data transformation
James Malone
A center calculation data transformation is a data transformation that has objective of center calculation.
PERSON: James Malone
center calculation data transformation
data vector reduction data transformation
A data vector reduction is a data transformation that has objective data vector reduction and that consists of reducing the input vectors k to a smaller number of output vectors j, where j<k.
James Malone
PERSON: James Malone
data vector reduction data transformation
scaling objective
Person:Helen Parkinson
Scaling gene expression data for cross platform analysis http://www.springerprotocols.com/Abstract/doi/10.1007/978-1-59745-454-4_13
is a data transformation objective where all, or some of a data set is adjusted by some data transformation according to some scale, for example a user defined minimum or maximum
Awaiting English definition from Monnie McGee
James Malone
scaling objective
descriptive statistical data analysis
descriptive statistical calculation data transformation
A descriptive statistical calculation data transformation is a data transformation that has objective descriptive statistical calculation and which concerns any calculation intended to describe a feature of a data set, for example, its center or its variability.
James Malone
PERSON: James Malone
descriptive statistical calculation data transformation
scaling data transformation
A scaling data transformation is a data transformation that has objective scaling.
James Malone
PERSON: James Malone
scaling data transformation
error correction objective
Application of a multiple testing correction method
PERSON: James Malone
An error correction objective is a data transformation objective where the aim is to remove (correct for) erroneous contributions arising from the input data, or the transformation itself.
James Malone, Helen Parkinson
error correction objective
sequence analysis data transformation
EDITOR
A sequence analysis data transformation is a data transformation that has objective sequence analysis and has the aim of analysing ordered biological data for sequential patterns.
James Malone
sequence analysis data transformation
cross validation objective
WEB: http://en.wikipedia.org/wiki/Cross_validation
A cross validation objective is a data transformation objective in which the aim is to partition a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis.
James Malone
cross validation objective
rotation estimation objective
merging objective
PERSON: Data Transformation Branch
A merging objective is a data transformation objective in which the data transformation has the aim of performing a union of two or more sets.
James Malone
combining objective
merging objective
merging of columns from two different data sets
clustered data visualization
A data visualization which has input of a clustered data set and produces an output of a report graph which is capable of rendering data of this type.
James Malone
clustered data visualization
gene list visualization
Adata visualization which has input of a gene list and produces an output of a report graph which is capable of rendering data of this type.
James Malone
gene list visualization
classified data visualization
A data visualization which has input of a classified data set and produces an output of a report graph which is capable of rendering data of this type.
James Malone
classified data visualization
background corrected data visualization
A data visualization which has input of a background corrected data set and produces an output of a report graph which is capable of rendering data of this type.
James Malone
Monnie McGee
background corrected data visualization
survival analysis data transformation
A data transformation which has the objective of performing survival analysis.
James Malone
PERSON: James Malone
survival analysis data transformation
proportional hazards model estimation
Cox model
Cox proportional hazards model
PERSON: James Malone
PERSON: Tina Boussard
Proportional hazards model is a data transformation model to estimate the effects of different covariates influencing the times-to-failure of a system.
WEB: http://en.wikipedia.org/wiki/Cox_proportional_hazards_model
proportional hazards model estimation
correlation study objective
A data transformation objective in which correlation is obtained (often measured as a correlation coefficient, ?) which indicates the strength and direction of a relationship between two random variables.
PERSON: Tina Boussard
correlation study objective
spectrum analysis objective
Calculation of characteristic path length in mass spectrometry
PERSON: Tina Boussard
Person:Helen Parkinson
is a data transformation objective where the aim is to analyse some aspect of spectral data by some data transformation process.
spectrum analysis objective
tandem mass spectrometry
A precursor ion is selected in the first stage, allowed to fragment and then all resultant masses are scanned in the second mass analyzer and detected in the detector that is positioned after the second mass analyzer. This experiment is commonly performed to identify transitions used for quantification by tandem MS.
PERSON: James Malone
PERSON: Tina Boussard
PERSON: Tina Boussard
Tandem mass spectrometry is a data transformation that uses two or more analyzers separated by a region in which ions can be induced to fragment by transfer of energy (frequently by collision with other molecules).
tandem mass spectrometry
gas chromatography mass spectrometry
Gas chromatography mass spectrometry is a data transformation combining mass spectrometry and
gas chromatography for the qualitative as well as quantitative
determinations of compounds.
PERSON: James Malone
PERSON: Tina Boussard
PERSON: Tina Boussard
gas chromatography mass spectrometry
chi square test
PERSON: James Malone
PERSON: Tina Boussard
The chi-square test is a data transformation with the objective of statistical hypothesis testing, in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough.
chi square test
ANOVA
ANOVA
ANOVA or analysis of variance is a data transformation in which a statistical test of whether the means of several groups are all equal.
James Malone
sequential design
Any design in which the decision as to whether to enroll the next patient, pair of patients, or block of patients is determined by whether the cumulative treatment difference for all previous patients is within specified limits. Enrollment is continued if the difference does not exceed the limits. It is terminated if it does
MUSC
PMID: 17710740.Pharm Stat. 2007 Aug 20.Sequential design approaches for bioequivalence studies with crossover designs.
Philippe Rocca-Serra
Provenance: OCI
sequential design
observation design
OBI branch derived
PMID: 12387964.Lancet. 2002 Oct 12;360(9340):1144-9.Deficiency of antibacterial peptides in patients with morbus Kostmann: an observation study.
Philippe Rocca-Serra
observation design
observation design is a study design in which subjects are monitored in the absence of any active intervention by experimentalists.
genetically modified organism
PERSON: Philippe Rocca-Serra
A protocol for removal of antibiotic resistance cassettes from human embryonic stem cells genetically modified by homologous recombination or transgenesis.
Nat Protoc. 2008;3(10):1550-8. PMID: 18802436
OBI Biomaterial
an organism that is the output of a genetic transformation process
genetically modified organism
predicted data item
BP 12/21: Edited the incomplete definition from Philippe. It is still unclear to me if this should be a data item at all, or an information content entity. This will be important, because if we exclude predictions from data items, we will run into issues that we willl have to duplicate things like 'weight datum' etc. all of which can be predicted.
Philippe Rocca-Serra; Bjoern Peters
A data item that was generated on the basis of a calculation or logical reasoning
predicted data item
mean-centered data
Person:Helen Parkinson
Person:Philippe Rocca-Serra
a data item which has been processed by a mean centering data transformation where each output value is produced by subtracting the mean from the inout value
mean-centered data
group randomization
Philippe Rocca-Serra
adapted from wikipedia [http://en.wikipedia.org/wiki/Randomization]
A group assignment which relies on chance to assign materials to a group of materials in order to avoid bias in experimental set up.
PMID: 18349405. Randomization reveals unexpected acute leukemias in Southwest Oncology Group prostate cancer trial. J Clin Oncol. 2008 Mar 20;26(9):1532-6.
group randomization
prediction
OBI
Philippe Rocca-Serra
Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics. 2008 Oct 16;9(1):436. PMID: 18922190
a process by which an event or an entity is described before it actually happens or is being discovered and identified.
prediction
DNA sequencer
A DNA sequencer is an instrument that determines the order of deoxynucleotides in deoxyribonucleic acid sequences.
ABI 377 DNA Sequencer, ABI 310 DNA Sequencer
DNA sequencer
MO
Trish Whetzel
computer
A computer is an instrument which manipulates (stores, retrieves, and processes) data according to a list of instructions.
Apple PowerBook, Dell OptiPlex
Melanie Courtot
Trish Whetzel
computer
http://en.wikipedia.org/wiki/Computer
study design
A plan specification comprised of protocols (which may specify how and what kinds of data will be gathered) that are executed as part of an investigation and is realized during a study design execution.
Editor note: there is at least an implicit restriction on the kind of data transformations that can be done based on the measured data available.
PERSON: Chris Stoeckert
a matched pairs study design describes criteria by which subjects are identified as pairs which then undergo the same protocols, and the data generated is analyzed by comparing the differences between the paired subjects, which constitute the results of the executed study design.
experimental design
rediscussed at length (MC/JF/BP). 12/9/08). The definition was clarified to differentiate it from protocol.
study design
repeated measure design
PMID: 10959922.J Biopharm Stat. 2000 Aug;10(3):433-45.Equivalence in test assay method comparisons for the repeated-measure, matched-pair design in medical device studies: statistical considerations.
PlanAndPlannedProcess Branch
a study design which use the same individuals and exposure them to a set of conditions. The effect of order and practice can be confounding factor in such designs
http://www.holah.karoo.net/experimentaldesigns.htm
repeated measure design
cross over design
(source: http://www.sbu.se/Filer/Content0/publikationer/1/literaturesearching_1993/glossary.html)
PMID: 17601993-Objective: HIV-infected patients with lipodystrophy (HIV-lipodystrophy) are insulin resistant and have elevated plasma free fatty acid (FFA) concentrations. We aimed to explore the mechanisms underlying FFA-induced insulin resistance in patients with HIV-lipodystrophy. Research Design and Methods: Using a randomized placebo-controlled cross-over design, we studied the effects of an overnight acipimox-induced suppression of FFA on glucose and FFA metabolism by using stable isotope labelled tracer techniques during basal conditions and a two-stage euglycemic, hyperinsulinemic clamp (20 mU insulin/m(2)/min; 50 mU insulin/m(2)/min) in nine patients with nondiabetic HIV-lipodystrophy. All patients received antiretroviral therapy. Biopsies from the vastus lateralis muscle were obtained during each stage of the clamp. Results: Acipimox treatment reduced basal FFA rate of appearance by 68.9% (52.6%-79.5%) and decreased plasma FFA concentration by 51.6 % (42.0%-58.9%), (both, P < 0.0001). Endogenous glucose production was not influenced by acipimox. During the clamp the increase in glucose-uptake was significantly greater after acipimox treatment compared to placebo (acipimox: 26.85 (18.09-39.86) vs placebo: 20.30 (13.67-30.13) mumol/kg/min; P < 0.01). Insulin increased phosphorylation of Akt (Thr(308)) and GSK-3beta (Ser(9)), decreased phosphorylation of glycogen synthase (GS) site 3a+b and increased GS-activity (I-form) in skeletal muscle (P < 0.01). Acipimox decreased phosphorylation of GS (site 3a+b) (P < 0.02) and increased GS-activity (P < 0.01) in muscle. Conclusion: The present study provides direct evidence that suppression of lipolysis in patients with HIV-lipodystrophy improves insulin-stimulated peripheral glucose-uptake. The increased glucose-uptake may in part be explained by increased dephosphorylation of GS (site 3a+b) resulting in increased GS activity.
Philippe Rocca-Serra
a repeated measure design which ensures that experimental units receive, in sequence, the treatment (or the control), and then, after a specified time interval (aka *wash-out periods*), switch to the control (or treatment). In this design, subjects (patients in human context) serve as their own controls, and randomization may be used to determine the ordering which a subject receives the treatment and control
cross over design
n-to-1 design
Adapted from http://www.childrens-mercy.org/stats/definitions/crossover.htm and source:http://symptomresearch.nih.gov/chapter_6/sec1/csss1pg1.htm)
N-of-1 design is a cross-over design in which the same patient is repeatedly randomised to receive either the experimental treatment or its control (Senn, 1993).
Philippe Rocca-Serra
n-to-1 design
randomized complete block design
A randomized complete block design is_a study design which assigns randomly treatments to block. The number of units per block equals the number of treatment so each block receives each treatment exactly once (hence the qualifier 'complete'). The design was originally devised from field trials used in agronomy and agriculture. The analysis assumes that there is no interaction between block and treatment. The method was then used in other settings So The randomised complete block design is a design in which the subjects are matched according to a variable which the experimenter wishes to control. The subjects are put into groups (blocks) of the same size as the number of treatments. The members of each block are then randomly assigned to different treatment groups.
Philippe Rocca-Serra
http://www.stats.gla.ac.uk/steps/glossary/anova.html,(A researcher is carrying out a study of the effectiveness of four different skin creams for the treatment of a certain skin disease. He has eighty subjects and plans to divide them into 4 treatment groups of twenty subjects each. Using a randomised blocks& design, the subjects are assessed and put in blocks of four according to how severe their skin condition is; the four most severe cases are the first block, the next four most severe cases are the second block, and so on to the twentieth block. The four &members of each block are then randomly assigned, one to each of the four treatment groups. http://www.stat