Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Data Mining - Data Mining is an area in computer science, Study notes of Data Mining

Mahatma Jyoti Rao Phoole University Data Mining

Query processing ,trees grid files,spatial data

Typology: Study notes

2017/2018

Uploaded on 10/22/2018

joonageorge 🇮🇳

(1)

9 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

Distributed Databases

Module 2

Partial preview of the text

Download Data Mining - Data Mining is an area in computer science and more Study notes Data Mining in PDF only on Docsity!

Distributed Databases

Module 2

Introduction to Distributed Databases

(^) Data in a distributed database system is stored across several sites.
(^) Each site is typically managed by a DBMS that can run independently of the other sites.

Distributed data independence…

(^) Users should be able to ask queries without specifying where the referenced relations, or copies or fragments of the relations, are located.
(^) This principle is a natural extension of physical and logical data independence.
(^) Queries that span multiple sites should be optimized systematically in a cost-based manner, taking into account communication costs and differences in local computation costs.

Distributed transaction atomicity

(^) Users should be able to write transactions that access and update data at several sites just as they would write transactions over purely local data.
(^) The effects of a transaction across sites should continue to be atomic.

Contd..

(^) The key to building heterogeneous systems is to have well-accepted standards for gateway protocols.
(^) A gateway protocol is an API that exposes DBMS functionality to external applications.
(^) Examples include ODBC and JDBC.
(^) By accessing database servers through gateway protocols, their differences (in capabilities, data formats, etc.) are masked, and the differences between the different servers in a distributed system are bridged to a large degree.

Distributed DBMS Architectures

(^) Client-Server
(^) Collaborating Server
(^) Middleware

Contd…

(^) This architecture has become very popular for several reasons.
(^) First, it is relatively simple to implement due to its clean separation of functionality and because the server is centralized.
(^) Second, expensive server machines are not underutilized by dealing with mundane user-interactions, which are now relegated to inexpensive client machines.
(^) Third, users can run a graphical user interface that they are familiar with, rather than the (possibly unfamiliar and unfriendly) user interface on the server.

Collaborating Server Systems

(^) We can have a collection of database servers, each capable of running transactions against local data, which cooperatively execute transactions spanning multiple servers.
(^) When a server receives a query that requires access to data at other servers, it generates appropriate subqueries to be executed by other servers and puts the results together to compute answers to the original query.
(^) Ideally, the decomposition of the query should be done using cost- based optimization, taking into account the costs of network communication as well as local processing costs.

Contd…

(^) We can think of this special server as a layer of software that coordinates the execution of queries and transactions across one or more independent database servers; such software is often called middleware.
(^) The middleware layer is capable of executing joins and other relational operations on data obtained from the other servers, but typically, does not itself maintain any data.

Storing Data in a Distributed DBMS

(^) In a distributed DBMS, relations are stored across several sites.
(^) Accessing a relation that is stored at a remote site incurs message- passing costs.
(^) To reduce this overhead, a single relation may be partitioned or fragmented across several sites.
(^) The fragments are stored at the sites where they are most often accessed, or replicated at each site where the relation is in high demand.

Contd…

(^) The motivation for replication is twofold:
(^) Increased availability of data:
- (^) If a site that contains a replica goes down, we can find the same data at other sites.
- (^) Similarly, if local copies of remote relations are available, we are less vulnerable to failure of communication links.
(^) Faster query evaluation:
- (^) Queries can execute faster by using a local copy of a relation instead of going to a remote site.
(^) There are two kinds of replication, called synchronous and asynchronous replication, which differ primarily in how replicas are kept current when the relation is modified.

(^) Typically, the tuples that belong to a given horizontal fragment are identified by a selection query; - (^) for example, employee tuples might be organized into fragments by city, with all employees in a given city assigned to the same fragment. - The horizontal fragment shown corresponds to Chicago. By storing fragments in the database site at the corresponding city, we achieve locality of reference. - (^) Chicago data is most likely to be updated and queried from Chicago, and storing this data in Chicago makes it local (and reduces communication costs) for most queries.
(^) Similarly, the tuples in a given vertical fragment are identified by a projection query.
(^) The vertical fragment in the figure results from projection on the first two columns of the employees relation.

Replication

(^) Replication means that we store several copies of a relation or relation fragment.
(^) An entire relation can be replicated at one or more sites.
(^) Similarly, one or more fragments of a relation can be replicated at other sites.
(^) For example, if a relation R is fragmented into R1, R2, and R3, there might be just one copy of R1, whereas R2 is replicated at two other sites and R3 is replicated at all sites.

Distributed Data Independence

(^) Distributed data independence means that users should be able to write queries without regard to how a relation is fragmented or replicated;
(^) It is the responsibility of the DBMS to compute the relation as needed.
(^) This property implies that users should not have to specify the full name for the data objects accessed while evaluating a query.

Data Mining - Data Mining is an area in computer science, Study notes of Data Mining

Related documents

Partial preview of the text

Download Data Mining - Data Mining is an area in computer science and more Study notes Data Mining in PDF only on Docsity!

Distributed Databases

Introduction to Distributed Databases

Distributed data independence…

Distributed transaction atomicity

Contd..

Distributed DBMS Architectures

Contd…

Collaborating Server Systems

Contd…

Storing Data in a Distributed DBMS

Contd…

Replication

Distributed Data Independence