Schema Integration Techniques: A Comparative Analysis | Exercises Distributed Database Management Systems

1- Schema Translation

Suitability of Canonical Data Model

entity-relationship model [Palopoli et al., 1998, 2003b; He and Ling, 2006],

object-oriented model [Castano and Antonellis, 1999; Bergamaschi et al., 2001],

a graph [Palopoli et al., 1999; Milo and Zohar, 1998; Melnik et al., 2002; Do and

Rahm, 2002] that may be simplified to a tree [Madhavan et al., 2001]

2- Schema Mapping

It is also possible that the schema mapping step may be divided into two phases [Bernstein

and Melnik, 2007]: mapping constraint generation and transformation generation.

3- Schema Matching

Linguistic based matching

A number of issues affect the particular matching algorithm [Rahm and Bernstein,

2001].

In some systems, the hand-crafted rules are specified for each schema individually

(intraschema rules) by the designer, and interschema rules are then “discovered” by the

matching algorithm [Palopoli et al., 1999].

COMA [Do and Rahm, 2002]

The traversing of the graph can be done in a number of ways; for example CUPID [Madhavan

et al., 2001] converts the graphs to trees and then looks at similarities of subtrees rooted at

the two nodes in consideration, while COMA [Do and Rahm, 2002] considers the paths from

the root to these element nodes. The fundamental point of these algorithms is that if the

subgraphs are similar, this increases the similarity of the roots of these subtrees. The

similarity of the subgraphs

Constraint based matching

Another interesting approach to considering neighborhood in directed graphs while computing

similarity of nodes is similarity flooding [Melnik et al., 2002].

containment edges from a relation or entity node to its attributes may be distinguished from

referential edges from a foreign key attribute node to the corresponding primary key attribute

node. Some systems exploit these edge semantics (e.g., DIKE [Palopoli et al., 1998, 2003a]).

Learning based Matching

This training set can be generated after manual identification of the schema correspondences

between two databases followed by extraction of example training data instances [Doan et al.,

2003a], or by the specification of a query expression that converts data from one database to

another [Berlin and Motro, 2001].

Some have used neural networks (e.g., SEMINT [Li and Clifton, 2000; Li et al., 2000]), others

have used Na¨ıve Bayesian learner/classifier (Autoplex [Berlin and Motro, 2001], LSD [Doan

Docsity.com

Schema Integration Techniques: A Comparative Analysis, Exercises of Distributed Database Management Systems