]>
Larisa Soldatova
Ontology of General Purpose Datatypes
Pance Panov
This ontology contains entities such as: datatype, datatype generator, datatype qualiy and others giving the possibility to represent arbitrary complex datatypes. This is an important fact for a general data mining ontology that wants to represent and query over modelling algorithms for mining structured data.
The ontology was first developed under the OntoDM (Ontology of Data Mining is available at http://kt.ijs.si/panovp/OntoDM) ontology, but for generality and reuse purpose it was decided to export it as a separate ontology. Additionaly, the OntoDT ontology is based on and ISO/IEC 11404 (http://www.iso.org/iso/catalogue_detail.htm?csnumber=39479) standard and can be reused used independently by any domain ontology that requires representation and reasoning about general purpose datatypes.
1.0
Saso Dzeroski
31.05.2012
Pance Panov
is-about
denotes
has_quality
bearer_of
has_part
part_of
information content entity quality
table datatype
whose values are collections of values in
the product space of one or more field datatypes, such that each value in the product space represents
an association among the values of its fields. Although the field datatypes may be infinite, any given value
of a table datatype contains a finite number of associations.
attribute identifier
aggregate imposed ordering
An aggregate datatype has the ordering property, if and only if there is a canonical first element of each nonempty
value in its value-space. This ordering is (externally) imposed by the aggregate value, as distinct from
the value-space of the element datatype itself being (internally) ordered (see 6.3.2). It is also distinct from the
value-space of the aggregate datatype being ordered.
primitive datatype
A datatype whose value space is defined either axiomatically or by enumeration is said to be a primitive
datatype.
identifiable datatype that cannot be decomposed into other identifiable datatypes without loss of all semantics
associated with the datatype
defined datatype parameter specification
real field-list specification
parameter identifier
extending subtype specification
date-time factor
graph aggregate component
field component specification
datatype which is a parametric datatype to a datatype generator
parametric datatype
datatype on which a datatype generator operates to produce a generated datatype
explicit subtype specification
dyadic operation
Dyadic operation maps a pair of values of the given datatype into a value of the given datatype or into a value of datatype Boolean
attribute-list specification
state-value-list specification
value space specification
a value space is the collection of values for a given datatype
ISO/IEC 11404:2007(E)
tree datatype generator
array datatype
whose values are associations between
the product space of one or more finite datatypes, designated the index datatypes, and the value space of the element datatype, such that every value in the product space of the index datatypes associates to
exactly one value of the element datatype.
scaled datatype
Scaled is a family of datatypes whose value spaces are subsets of the rational value space, each
individual datatype having a fixed denominator, but the scaled datatypes possess the concept of
approximate value.
real datatype
real is a family of datatypes which are computational approximations to the mathematical
datatype comprising the “real numbers”. Specifically, each real datatype designates a collection of
mathematical real values which are expressed to some finite precision and must be distinguishable to at
least that precision.
scaled factor
aggregate-imposed identifier uniqueness
An aggregate-value has the identifier uniqueness property if and only if no identifier (e.g., label, index) of the
element datatype occurs more than once in the aggregate-value. The aggregate datatype has the identifier
uniqueness property, if and only if all values in its value space do.
non-aggregate generator
identifier
exactness
The computational model of a datatype may limit the degree to which values of the datatype can be
distinguished. If every value in the value space of the conceptual datatype is distinguishable in the
computational model from every other value in the value space, then the datatype is said to be exact.
Certain mathematical datatypes having values which do not have finite representations are said to be
approximate
generated datatype
A generated datatype is a datatype resulting from an application of a datatype generator.
numeric quality
A datatype is said to be numeric if its values are conceptually quantities (in some mathematical number
system). A datatype whose values do not have this property is said to be non-numeric.
homogenity
An aggregate datatype is homogeneous, if and only if all components must belong to a single datatype. If
different components may belong to different datatypes, the aggregate datatype is said to be heterogeneous.
The component datatype of a homogeneous aggregate is also called the element datatype.
vector datatype
directed labeled graph datatype generator
maximum-size specification
state field component
pointer datatype
whose values constitutes a
means of reference to values of another datatype, designated the element datatype. The values of a
pointer datatype are atomic.
tree datatype
agregate size
The size of an aggregate-value is the number of component values it contains. The size of the aggregate
datatype is fixed, if and only if all values in its value space contain the same number of component values.
The size is variable, if different values of the aggregate datatype may have different numbers of component
values. Variability is the more general case; fixed-size is a constraint.
uniqueness
An aggregate-value has the uniqueness property if and only if no value of the element datatype occurs more
than once in the aggregate-value. The aggregate datatype has the uniqueness property, if and only if all
values in its value space do.
real factor
aggregate generator quality
subtype
index type specification
boolean/state/real field-list specification
defined generator parameter specification
upper bound specification
defined datatype
selection subtype specification
cardinality
A value space has the mathematical concept of cardinality: it may be finite, denumerably infinite (countable),
or non-denumerably infinite (uncountable). A datatype is said to have the cardinality of its value space. In the
computational model, there are three significant cases:
⎯ datatypes whose value spaces are finite,
⎯ datatypes whose value spaces are exact and denumerably infinite,
⎯ datatypes whose value spaces are approximate and therefore have a finite or denumerably
infinite computational model, although the conceptual value space may be non-denumerably infinite.
Every conceptually finite datatype is necessarily exact. No computational datatype is non-denumerably
infinite.
character datatype
character is a family of datatypes whose value spaces are character-sets.
defined generator specification
enumerated-value identifier
defined generator parameter-list specification
non-directed labeled graph generator
state base type
select-item specification
record of boolean
ordering
A datatype is said to be ordered if an order relation is defined on its value space.
date-time unit
defined datatype parameter-list specification
size subtype specification
boolean field-list specification
set of discrete
aggregate datatype
synonim: structured datatype
An aggregate datatype is a generated datatype, each of whose values is, in principle, made up of values of
the parametric datatypes. The parametric datatypes of an aggregate datatype or its generator are also called
component datatypes.
enumerated datatype
enumerated is a family of datatypes, each of which comprises a finite number of distinguished values having an intrinsic order.
synonim: discrete datatype
choice datatype
whose values is a single value
from any of a set of alternative datatypes. The alternative datatypes of a choice datatype are logically
distinguished by their correspondence to values of another datatype, called the tag datatype.
datatype quality
real field component
lower bound specification
date-time radix
boolean field component
boundary
A datatype is said to be bounded above if it is ordered and there is a value U in the value space such that, for
all values s in the value space, s ≤ U . The value U is then said to be an upper bound of the value space.
Similarly, a datatype is said to be bounded below if it is ordered and there is a value L in the space such that,
for all values s in the value space, L ≤ s . The value L is then said to be a lower bound of the value space. A
datatype is said to be bounded if its value space has both an upper bound and a lower bound.
datatype generator specification
A datatype generator is a conceptual operation on one or more datatypes which yields a datatype. A datatype
generator operates on datatypes to generate a datatype, rather than on values to generate a value.
Specifically, a datatype generator is the combination of:
⎯ a collection of criteria for the number and characteristics of the datatypes to be operated upon,
⎯ a construction procedure which, given a collection of datatypes meeting those criteria, creates a new
value space from the value spaces of those datatypes, and
⎯ a collection of characterizing operations which attach to the resulting value space to complete the
definition of a new datatype.
The application of a datatype generator to a specific collection of datatypes meeting the criteria for the
datatype generator forms a generated datatype. The generated datatype is sometimes called the resulting
datatype, and the collection of datatypes to which the datatype generator was applied are called its parametric
datatypes.
synonim: datatype constructor
ISO/IEC 11404:2007(E)
factor
size specification
date and time datatype
time is a family of datatypes whose values are points in time to various common resolutions:
year, month, day, hour, minute, second, and fractions thereof.
complex factor
state-value identifier
component mandatoriness
The components of an aggregate datatype may not all be required to have a valid value of the datatype, i.e.,
the actual value space of the datatype may include values for which some of the component values are
unspecified.
When a component of the datatype is required to have a valid value in order for the aggregate value to be a
valid value of the datatype, the component is said to be a mandatory component.
When a component of the datatype is not required to have a valid value in order for the aggregate value to be
a valid value of the datatype, the component is said to be an optional component.
record of real
extended-value identifier
datatype
Since this collection is unbounded, there are four formal methods used in the definition of the datatypes:
⎯ explicit specification of primitive datatypes, which have universal well-defined abstract notions, each
independent of any other datatype.
⎯ implicit specification of generated datatypes, which are syntactically and in some ways semantically
dependent on other datatypes used in their specification. Generated datatypes are specified implicitly by
means of explicit specification of datatype generators, which themselves embody independent abstract
notions.
⎯ specification of the means of datatype declaration, which permits the association of additional identifiers
and refinements to primitive and generated datatypes and to datatype generators.
⎯ specification of the means of defining subtypes of the datatypes defined by any of the foregoing methods.
ISO/IEC 11404:2007(E)
set of distinct values, characterized by properties of those values, and by operations on those values
real base type
index upperbound specification
record of primitives
value expression
non-directed labeled graph datatype
complex radix
complex datatype
complex is a family of datatypes, each of which is a computational approximation to the
mathematical datatype comprising the “complex numbers”. Specifically, each complex datatype
designates a collection of mathematical complex values which are known to certain applications to some
finite precision and must be distinguishable to at least that precision in those applications.
select-list specification
labeled graph datatype
record of state
sequence of discrete
class datatype
state datatype
state is a family of datatypes, each of which comprises a finite number of distinguished but unordered values.
structurness
Aggregate datatypes are:
⎯ conceptually structured, having both the component datatypes and the access method specified, or
⎯ conceptually semi-structured, having either the component datatypes or the access method specified, but
not both, or
⎯ conceptually unstructured, having neither the component datatype nor the access method specified.
extended-value-list specification
DAG datatype
sequence datatype
whose values are ordered
sequences of values from the element datatype. The ordering is imposed on the values and not intrinsic
in the underlying datatype; the same value may occur more than once in a given sequence.
niladic operation
Niladic operations yield values of the given datatype.
base type specification
index lowerbound specification
bag datatype
radix
index-type list specification
range subtype specification
range specification
procedure datatype
whose values is an
operation on values of other datatypes, designated the parameter datatypes. That is, a procedure
datatype comprises the set of all operations on values of a particular collection of datatypes. All values of
a procedure datatype are conceptually atomic.
vector generator
aggregate generator
An aggregate datatype generator generates a datatype by
⎯ applying an algorithmic procedure to the value spaces of its component datatypes to yield the value space
of the aggregate datatype, and
⎯ providing a set of characterizing operations specific to the generator.
synonim: aggregate datatype constructor
recursiveness
A datatype is said to be recursive if a value of the datatype can contain (or refer to) another value of the
datatype.
node component
monadic operation
Monadic operations map a value of the given datatype into a value of the given datatype or into a value of datatype Boolean.
character-set identifier
state field-list specification
set datatype
whose value-space is the set of all subsets of
the value space of the element datatype, with operations appropriate to the mathematical set.
minimum-size specification
real radix
characterising operation specification
The set of characterising operations for a datatype comprises those operations on, or yielding values of, the datatype that distinguish this datatype from other datatypes having value spaces which are identical except possibly for substitution of symbols.
subtype generator specification
A subtype is a datatype derived from an existing datatype, designated the base datatype, by restricting the
value space to a subset of that of the base datatype whilst maintaining all characterizing operations. Subtypes
are created by a kind of datatype generator which is unusual in that its only function is to define the
relationship between the value spaces of the base datatype and the subtype.
non-aggregate datatype
set of real
rational datatype
Rational is the mathematical datatype comprising the “rational numbers”.
n-adic operation
N-adic operations map ordered n-tuples of values, each of whichh is of a specified datatype, which may be the given datatype or a parametric datatype, into values of the given datatype or parametric datatype.
access type
The access method for an aggregate datatype is the property which determines how component values can
be extracted from a given aggregate-value.
edge component
equality
ISO/IEC 11404:2007
In every value space there is a notion of equality, for which the following rules hold:
⎯ for any two instances (a, b) of values from the value space, either a is equal to b, denoted a = b , or a is
not equal to b, denoted a ≠ b ;
⎯ there is no pair of instances (a, b) of values from the value space such that both a = b and a ≠ b ;
⎯ for every value a from the value space, a = a ;
⎯ for any two instances (a, b) of values from the value space, a = b if and only if b = a ;
⎯ for any three instances (a, b, c) of values from the value space, if a = b and b = c , then a = c .
On every datatype, the operation Equal is defined in terms of the equality property of the value space, by:
⎯ for any values a, b drawn from the value space, Equal(a,b) is true if a = b , and false otherwise.
scaled radix
attribute component specification
record (tuple) datatype
whose values are heterogeneous
aggregations of values of component datatypes, each aggregation having one value for each component
datatype, keyed by a fixed field-identifier.
synonim: tuple datatype
enumerated-value-list specification
DAG datatype generator
sequence of real
excluding subtype specification
record of boolean/state/real
primitive field-list specification
aggregate field component
field identifier
primitive field component
field-list specification
label
IAO
bounded below
table generator
Negate:scaled
Lsln:set
non equal
Round:time&date
unbounded
record generator
character value space
class generator
variable size
unbounded below
Append:sequence
non-unique values
InOrder:ordinal
Discriminant:choice
ordered
AttributeSelect:class
semi-structured
Negate:complex
Successor:enumerated
Serialize:bag
Negate:rational
non-ordered
Promote:complex
Multiply:real
set generator
AttributeReplace:class
Equal:pointer
Difference:time&date
InOrder:Enumerated
Multiply:rational
Empty:set
Equal:integer
equal
integer
integer is the mathematical datatype comprising the exact integral values.
And:boolean
bag generator
Reciprocal:rational
Equal:rational
Equal:character
InOrder:integer
finite
approximate
Equal:state
SetOf:set
unstructured
AttributeFunctionInvoke:class
countable
component mandatory
Equal:bag
NonNegative:integer
size subtype generator
Equal:choice
key access
Delete:bag
scaled value space
Or:boolean
component non-mandatory
Equal:scaled
Equal:array
Equal:set
uncountable
InOrder:rational
homogeneous
structured
Invoke:procedure
Equal:class
Multiply:complex
index access
Add:integer
Select:array
MapToBag:table
Empty:table
Not:boolean
Equal:record
real value space
Divide:scaled
boolean value space
pointer generator
FieldSelect:record
access by value
Extend:time&date
Equal:table
procedure generator
Select:bag
non-numeric
recursive
unbounded above
Equal:complex
Equal:procedure
Replace:array
rational value space
bounded above
InOrder:scaled
IsEmpty:table
date-and-time value space
Insert:bag
complex value space
Fetch:table
array generator
Subset:set
choice generator
non-recursive
Equal:void
exact
Tail:sequence
Select:table
Delete:table
Multiply:integer
Union:set
unordered aggregate
Equal:boolean
Intersection:set
NonNegative:rational
Tag:choice
Serialize:table
InOrder:time&date
Empty:sequence
Add:real
unique values
FieldReplace:record
void value space
identifier not unique
Round:scaled
numeric
explicit subtype generator
Add:scaled
sequence generator
Reciprocial:complex
state value space
Dereference:pointer
IsEmpty:bag
ordinal value space
heterogeneous
Multiply:scaled
Head:sequence
excluding subtype generator
position acess
range subtype generator
InOrder:real
Add:rational
Equal:sequence
ordered aggregate
Negate:integer
bounded
Equal:Enumerated
IsEmpty:sequence
enumerate value space
extending subtype generator
Promote:real
MapToTable:table
rational
Rational is the mathematical datatype comprising the “rational numbers”.
void
void is the datatype representing an object whose presence is syntactically or semantically
required, but carries no information in a given instance.
fixed size
Negate:real
selection subtype generator
real
integer value space
SquareRoot:complex
Select:set
ordinal
ordinal is the datatype of the ordinal numbers, as distinct from the quantifying numbers
(datatype integer). ordinal is the infinite enumerated datatype.
Promote:rational
Insert:table
homogenuous set generator
indirect access
Difference:set
Add:complex
Reciprocal:real
Cast:choice
Successor:ordinal
Equal:time&date
identifier unique
Equal:ordinal
Empty:bag
inplemetation dependent access
Equal:real
boolean
boolean is the mathematical datatype associated with two-valued logic.
AttributeFunctionOverride:class
ident2:real
enumerated{class1, class2, class3}
ident4
tuple(ident1:real,ident2:real,ident3:real,ident4:real,ident4:enumerated)
ident5:enumerated
ident3:real
ident4:real
ident2
ident5
ident1:real
ident1
list(ident1:real,ident2:real,ident3:real,ident4:real,ident5:enumerated)
ident3