

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Various schema integration techniques, including schema translation, schema mapping, schema matching, and mapping creation. It discusses different models like entity-relationship, object-oriented, graph, and tree, and algorithms such as cupid, coma, similarity flooding, and learning-based matching. It also mentions systems like dike, semint, autoplex, lsd, imap, clio, and ontobuilder.
Typology: Exercises
1 / 2
This page cannot be seen from the preview
Don't miss anything!
entity-relationship model [Palopoli et al., 1998, 2003b; He and Ling, 2006], object-oriented model [Castano and Antonellis, 1999; Bergamaschi et al., 2001], a graph [Palopoli et al., 1999; Milo and Zohar, 1998; Melnik et al., 2002; Do and Rahm, 2002] that may be simplified to a tree [Madhavan et al., 2001]
It is also possible that the schema mapping step may be divided into two phases [Bernstein and Melnik, 2007]: mapping constraint generation and transformation generation.
Linguistic based matching
A number of issues affect the particular matching algorithm [Rahm and Bernstein, 2001].
In some systems, the hand-crafted rules are specified for each schema individually (intraschema rules) by the designer, and interschema rules are then “discovered” by the matching algorithm [Palopoli et al., 1999].
COMA [Do and Rahm, 2002]
The traversing of the graph can be done in a number of ways; for example CUPID [Madhavan et al., 2001] converts the graphs to trees and then looks at similarities of subtrees rooted at the two nodes in consideration, while COMA [Do and Rahm, 2002] considers the paths from the root to these element nodes. The fundamental point of these algorithms is that if the subgraphs are similar, this increases the similarity of the roots of these subtrees. The similarity of the subgraphs
Constraint based matching
Another interesting approach to considering neighborhood in directed graphs while computing similarity of nodes is similarity flooding [Melnik et al., 2002].
containment edges from a relation or entity node to its attributes may be distinguished from referential edges from a foreign key attribute node to the corresponding primary key attribute node. Some systems exploit these edge semantics (e.g., DIKE [Palopoli et al., 1998, 2003a]).
Learning based Matching
This training set can be generated after manual identification of the schema correspondences between two databases followed by extraction of example training data instances [Doan et al., 2003a], or by the specification of a query expression that converts data from one database to another [Berlin and Motro, 2001].
Some have used neural networks (e.g., SEMINT [Li and Clifton, 2000; Li et al., 2000]), others have used Na¨ıve Bayesian learner/classifier (Autoplex [Berlin and Motro, 2001], LSD [Doan
et al., 2001, 2003a] and [Naumann et al., 2002]), and decision trees [Embley et al., 2001, 2002].
Composite approach has been proposed in the LSD [Doan et al., 2001, 2003a] and iMAP [Dhamankar et al., 2004] systems for handling 1:1 and N:M matches, respectively.
An algorithm due to Miller et al. [2000] accomplishes this iteratively by considering each Tk in turn.
An algorithm to accomplish this [Popa et al., 2002] also starts, as above, with a source schema, a target schema, andM, and “discovers” mappings that satisfy both the source and the target schema semantics. The algorithm is also more powerful than the one we discussed in this section in that it can handle nested structures that are common in XML, object databases, and nested relational systems.
If there are multiple such paths, then the database designer needs to be involved in selecting one (tools such as Clio [Miller et al., 2001], OntoBuilder [Roitman and Gal, 2006] and others facilitate this process and provide mechanisms for designers to view and specify correspondences [Yan et al., 2001]). The result of this step is a set Mk _ Mk of candidate sets.
An algorithm to accomplish this [Popa et al., 2002] also starts, as above, with a source schema, a target schema, andM, and “discovers” mappings that satisfy both the source and the target schema semantics. The algorithm is also more powerful than the one we discussed in this section in that it can handle nested structures that are common in XML, object databases, and nested relational systems.