You can organize entities in three ways: containment, subsumption, and composition.
**Containment: **When one concept can describe others, the representation is called a containment. For instance, the entity “people” can contain “managers” and “shareholder” entities.
**Subsumption: **Subsumption occurs when you describe concepts using hierarchical relationships to infer concept properties. For instance, you can have an “employee” entity instead of a “manager”, since all managers are employees. The technique incorporates a specific entity into a more general category.
**Composition: **Composition describes entities based on their parts. For instance, “wheels” are part of a “vehicle”. Composition helps machines to infer properties of entities through logic. In a knowledge graph, classifying “car” as a “vehicle” will allow a machine to automatically infer that a “car” must have “wheels.”
Food contains ingredients, but the classifications (cuisine, prep method, etc) can help influence other composition graphs
An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them
Often an ontology of the domain is not a goal in itself. Developing an ontology is akin to defining a set of data and their structure for other programs to use. Problem-solving methods, domain-independent applications, and software agents use ontologies and knowledge bases built from ontologies as data
2: What is in an ontology?
For the purposes of this guide:
Ontology is a formal explicit description of concepts in a domain of discourse
Classes (aka concepts)
Describe concepts in the domain
E.g., a class of wines represents all wines. Specific wines are instances of this class
Can be divided into more specific subclasses
Properties (aka slots or roles) of each concept describing various features and attributes of the concept
E.g. a Château Lafite Rothschild Pauillac wine has a full body has two slots
Body: full
Maker: Château Lafite Rothschild
Restrictions on slots (aka facets or role restrictions).
An ontology together with a set of individual instances of classes constitutes a knowledge base. In reality, there is a fine line where the ontology ends andthe knowledge base begins.
In practical terms, developing an ontology includes:
defining classes in the ontology,
arranging the classes in a taxonomic (subclass–superclass) hierarchy,
defining slots and describing allowed values for these slots,
filling in the values for slots for instances.
3: A Simple Knowledge-Engineering Methodology
Step 1. Determine the domain and scope of the ontology
We suggest starting the development of an ontology by defining its domain and scope. That is, answer several basic questions:
What is the domain that the ontology will cover?
For what we are going to use the ontology?
For what types of questions the information in the ontology should provide answers?
Who will use and maintain the ontology?
The answers to these questions may change during the ontology-design process, but at any given time they help limit the scope of the model.
If the ontology we are designing will be used to assist in natural language processing of articles in wine magazines, it may be important to include synonyms and part-of-speech information for concepts in the ontology.
Competency questions:
One of the ways to determine the scope of the ontology is to sketch a list of questions that a knowledge base based on the ontology should be able to answer, competency questions (Gruninger and Fox 1995).
These questions will serve as the litmus test later: Does the ontology contain enough information to answer these types of questions? Do the answers require a particular level of detail or representation of a particular area?
These competency questions are just a sketch and do not need to be exhaustive.
Step 2. Consider reusing existing ontologies
There are libraries of reusable ontologies on the Web and in the literature. For example, we can use the Ontolingua ontology library (http://www.ksl.stanford.edu/software/ontolingua/) or the DAML ontology library (http://www.daml.org/ontologies/). There are also a number of publicly available commercial ontologies (e.g., UNSPSC (www.unspsc.org), RosettaNet (www.rosettanet.org), DMOZ (www.dmoz.org)).
Check FooDB
Step 3. Enumerate important terms in the ontology
Initially, it is important to get a comprehensive list of terms without worrying about overlap between concepts they represent, relations among the terms, or any properties that the concepts may have, or whether the concepts are classes or slots.
Step 4. Define the classes and the class hierarchy
3 approaches:
Top-down
Bottom-up
Combinations
Step 5. Define the properties of classes– slots
Remember subclasses inherit their parents’ classes
When defining a domain or a range for a slot, find the most general classes or class that can be respectively the domain or the range for the slots
If a list of classes defining a range or a domain of a slot includes a class
and its subclass, remove the subclass
If a list of classes defining a range or a domain of a slot contains all
subclasses of a class A, but not the class A itself, the range should contain only the class A and not the subclasses
If a list of classes defining a range or a domain of a slot contains all but a few subclasses of a class A, consider if the class A would make a more appropriate range definition
Step 7. Create Instances
4: Defining Classes and a Class Hierarchy
Looks more in depth at Step 4 from Section 3
Ensure the Class Hierarchy is Correct
An “is-a” relation
A class A is a subclass of B if every instance of A is also an instance of B
A Single [Class] is Not a Subclass of All [Classes]
A common modeling mistake is to include both a singular and a plural version of the same concept in the hierarchy making the former a subclass of the latter.
For example, it is wrong to define a class Wines and a class Wine as a subclass of Wines. Once you think of the hierarchy as representing the “kind-of” relationship, the modeling error becomes clear: a single Wine is not a kind of Wines.
The best way to avoid such an error is always to use either singular or plural in naming classes (more in Section 6)
Transitivity of the Hierarchical Relations
If B is a subclass of A and C is a subclass of B, then C is a subclass of A
A direct subclass is the “closest” subclass of the class: there are no classes between a class and its direct subclass in a hierarchy.
Evolution of a class hiearchy
“Graph Similarity Scoring and Matching” paper
Trying to calculate if two graphs are similar to each other
Leads to iterative methods for computing similarity scores for the elements of these graphs, in whichscores for similarity between elements propagate along to neighboring elements at each time step. Here, we consider a graph GA(VA, EA) to consist of a set of vertices or nodes VA, and a set of edges EA VA ×VA, which can be directed or undirected. The aim of this work is to highlight graph similarity methods from different fields that utilize this iterative framework, and present a simple application of one particular extension to the task of graph matching
HITS search algorithm
Was intended to improve Web search
Hubs: point to many good sources on the query
Authorities: point to high quality hubs
Generalized form of similarity between two nodes:
Where i is a node in Graph B, j is a node in Graph A at stage k
Each iteration is normalized by to give the actual similarity measure
Blondel et al initialized x0 to all-ones, final score to be the limit of even iterations
Blondel wanted to do automatic synonym extraction from a dictionary graph
This paper uses a related similarity measure that used a linear update and does not depend on
initial values of node scores
Similarity Flooding algorithm
SimRank algorithm
Scoring Method for Phylogenetic Tree Construction
Vertex Similarity Method
They suggest: an edge in GB is like an edge in GA if their source and terminal nodes are similar
Application to Graph Matching
Usually, people are trying to find an assignment matrix that matches elements between sets
Computationally expensive. Usage of Hungarian algorithm results in O(n^3) time to solve
The formula from this paper can be used for finding subgraphs within graphs
This can be used for predicting functions of proteins
Including node identity information (type labels) can be used to penalize/reward matches to improve performance