Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Management Systems: Concepts, Data Models, and Concurrency Control, Study notes of Database Management Systems (DBMS)

An overview of database management systems (dbms), covering key concepts such as data models, database components, and database access languages. It delves into the different types of databases, including centralized, distributed, relational, and nosql databases, highlighting their characteristics and use cases. The document also explores the critical topic of concurrency control, which ensures the consistent and reliable execution of concurrent transactions in a dbms. It discusses the acid properties, lock-based concurrency control, and timestamp-based concurrency control protocols. This comprehensive coverage of dbms fundamentals and advanced topics makes this document a valuable resource for students and professionals seeking to understand the principles and mechanisms underlying modern database systems.

Typology: Study notes

2023/2024

Available from 08/23/2024

anubhav-mishra-9
anubhav-mishra-9 🇮🇳

3 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Unit -1
Database
The database is a collection of inter-related data which is used to retrieve, insert and delete the
data efficiently. It is also used to organize the data in the form of a table, schema, views, and
reports, etc.
For example: The college Database organizes the data about the admin, staff, students and faculty
etc.
Using the database, you can easily retrieve, insert, and delete the information.
Database Management System
o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc. are a very popular commercial database which is used in
different applications.
o DBMS provides an interface to perform various operations like database creation, storing
data in it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.
DBMS allows users the following tasks:
o Data Definition: It is used for creation, modification, and removal of definition that defines
the organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in
the database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and
recovering information corrupted by unexpected failure.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Database Management Systems: Concepts, Data Models, and Concurrency Control and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

Unit - 1 Database The database is a collection of inter-related data which is used to retrieve, insert and delete the data efficiently. It is also used to organize the data in the form of a table, schema, views, and reports, etc. For example: The college Database organizes the data about the admin, staff, students and faculty etc. Using the database, you can easily retrieve, insert, and delete the information. Database Management System o Database management system is a software which is used to manage the database. For example: MySQL, Oracle, etc. are a very popular commercial database which is used in different applications. o DBMS provides an interface to perform various operations like database creation, storing data in it, updating data, creating a table in the database and a lot more. o It provides protection and security to the database. In the case of multiple users, it also maintains data consistency. DBMS allows users the following tasks: o Data Definition: It is used for creation, modification, and removal of definition that defines the organization of data in the database. o Data Updation: It is used for the insertion, modification, and deletion of the actual data in the database. o Data Retrieval: It is used to retrieve the data from the database which can be used by applications for various purposes. o User Administration: It is used for registering and monitoring users, maintain data integrity, enforcing data security, dealing with concurrency control, monitoring performance and recovering information corrupted by unexpected failure. Characteristics of DBMS o It uses a digital repository established on a server to store and manage the information. o It can provide a clear and logical view of the process that manipulates data. o DBMS contains automatic backup and recovery procedures. o It contains ACID properties which maintain data in a healthy state in case of failure. o It can reduce the complex relationship between data. o It is used to support manipulation and processing of data.

o It is used to provide security of data. o It can view the database from different viewpoints according to the requirements of the user. Advantages of DBMS o Controls database redundancy: It can control data redundancy because it stores all the data in one single database file and that recorded data is placed in the database. o Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users. o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system. o Reduce time: It reduces development time and maintenance need. o Backup: It provides backup and recovery subsystems which create automatic backup of data from hardware and software failures and restores the data if required. o multiple user interface: It provides different types of user interfaces like graphical user interfaces, application program interfaces Disadvantages of DBMS o Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run DBMS software. o Size: It occupies a large space of disks and large memory to run them efficiently. o Complexity: Database system creates additional complexity and requirements. o Higher impact of failure: Failure is highly impacted the database because in most of the organization, all the data stored in a single database and if the database is damaged due to electric failure or database corruption then the data may be lost forever. Data Models in DBMS A Data Model in Database Management System (DBMS) is the concept of tools that are developed to summarize the description of the database. Data Models provide us with a transparent picture of data which helps us in creating an actual database. It shows us from the design of the data to its proper implementation of data. Types of Relational Models

  1. Conceptual Data Model
  2. Representational Data Model
  3. Physical Data Model It is basically classified into 3 types:-

The advantage of using a Representational data model is to provide a foundation to form the base for the Physical model

  1. Physical Data Model The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in a database is stored physically on a secondary storage device such as discs and tapes. This is stored in the form of files, records, and certain other data structures. It has all the information on the format in which the files are present and the structure of the databases, the presence of external data structures, and their relation to each other. Here, we basically save tables in memory so they can be accessed efficiently. In order to come up with a good physical model, we have to work on the relational model in a better way. Structured Query Language (SQL) is used to practically implement Relational Algebra. This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is typically created by DBA and developers. The purpose is actual implementation of the database. Characteristics of a physical data model:  The physical data model describes data need for a single project or application though it maybe integrated with other physical data models based on project scope.  Data Model contains relationships between tables that which addresses cardinality and nullability of the relationships.  Developed for a specific version of a DBMS, location, data storage or technology to be used in the project.  Columns should have exact datatypes, lengths assigned and default values.  Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined Components of DBMS There are many components available in the DBMS. Each component has a significant task in the DBMS. A database environment is a collection of components that regulates the use of data, management, and a group of data. These components consist of people, the technique of Handel the database, data, hardware, software, etc. there are several components available for the DBMS. We are going to explain five main topics of the database below.
  1. Hardware o Here the hardware means the physical part of the DBMS. Here the hardware includes output devices like a printer, monitor, etc., and storage devices like a hard disk. o In DBMS, information hardware is the most important visible part. The equipment which is used for the visibility of the data is the printer, computer, scanner, etc. This equipment is used to capture the data and present the output to the user. o With the help of hardware, the DBMS can access and update the database. o The server can store a large amount of data, which can be shared with the help of the user's own system. o The database can be run in any system that ranges from microcomputers to mainframe computers. And this database also provides an interface between the real worlds to the database. o When we try to run any database software like MySQL, we can type any commands with the help of our keyboards, and RAM, ROM, and processor are part of our computer system.
  2. Software o Software is the main component of the DBMS. o Software is defined as the collection of programs that are used to instruct the computer about its work. The software consists of a set of procedures, programs, and routines associated with the computer system's operation and performance. Also, we can say that computer software is a set of instructions that is used to instruct the computer hardware for the operation of the computers. o The software includes so many software like network software and operating software. The database software is used to access the database, and the database application performs the task.
  1. Data Definition Language(DDL):It is used to construct a database. DDL implements database schema at the physical, logical, and external levels. The following commands serve as the base for all DDL commands: o ALTER o COMMENT o CREATE o DESCRIBE o DROP o SHOW o USE
  2. Data Manipulation Language(DML): It is used to access a database. The DML provides the statements to retrieve, modify, insert and delete the data from the database. The following commands serve as the base for all DML commands: o INSERT o UPDATE o DELETE o LOCK o CALL o EXPLAIN PLAN
  3. People o The people who control and manage the databases and perform different types of operations on the database in the DBMS. o The people include database administrator, software developer, and End-user. o Database administrator-database administrator is the one who manages the complete database management system. DBA takes care of the security of the DBMS, its availability, managing the license keys, managing user accounts and access, etc. o Software developer- theThis user group is involved in developing and designing the parts of DBMS. They can handle massive quantities of data, modify and edit databases, design and develop new databases, and troubleshoot database issues. o End user - These days, all modern web or mobile applications store user data. How do you think they do it? Yes, applications are programmed in such a way that they collect user data and store the data on a DBMS system running on their server. End users are the ones who store, retrieve, update and delete data. o The users of the database can be classified into different groups. i. Native Users
  4. ii. Online Users iii. Sophisticated Users iv. Specialized Users v. Application Users vi. DBA - Database Administrator There are various types of databases used for storing different varieties of data:

    1. Centralized Database It is the type of database that stores data at a centralized database system. It comforts the users to access the stored data from different locations through several applications. These applications contain the authentication process to let users access data securely. An example of a Centralized database can be Central Library that carries a central database of each library in a college/university. Advantages of Centralized Database o It has decreased the risk of data management, i.e., manipulation of data will not affect the core data. o Data consistency is maintained as it manages data in a central repository. o It provides better data quality, which enables organizations to establish data standards. o It is less costly because fewer vendors are required to handle the data sets. Disadvantages of Centralized Database o The size of the centralized database is large, which increases the response time for fetching the data. o It is not easy to update such an extensive database system.

    Each table in the database carries a key that makes the data unique from others. Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc. Properties of Relational Database There are following four commonly known properties of a relational model known as ACID properties, where: A means Atomicity: This ensures the data operation will complete either with success or with failure. It follows the 'all or nothing' strategy. For example, a transaction will either be committed or will abort. C means Consistency: If we perform any operation over the data, its value before and after the operation should be preserved. For example, the account balance before and after the transaction should be correct, i.e., it should remain conserved. I means Isolation: There can be concurrent users for accessing data at the same time from the database. Thus, isolation between the data should remain isolated. For example, when multiple transactions occur at the same time, one transaction effects should not be visible to the other transactions in the database. D means Durability: It ensures that once it completes the operation and commits the data, data changes should remain permanent.

    1. NoSQL Database Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It is not a relational database as it stores data not only in tabular form but in several different ways. It came into existence when the demand for building modern applications increased. Thus, NoSQL presented a wide variety of database technologies in response to the demands. We can further divide a NoSQL database into the following four types:

    a. Key-value storage: It is the simplest type of database storage where it stores every single item as a key (or attribute name) holding its value, together. b. Document-oriented Database: A type of database used to store data as JSON-like document. It helps developers in storing data by using the same document-model format as used in the application code. c. Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most commonly, social networking websites use the graph database. d. Wide-column stores: It is similar to the data represented in relational databases. Here, data is stored in large columns together, instead of storing in rows. Advantages of NoSQL Database o It enables good productivity in the application development as it is not required to store data in a structured format. o It is a better option for managing and handling large data sets. o It provides high scalability. o Users can quickly access data from the database through key-value. 5 ) Object-oriented Databases The type of database that uses the object-based data model approach for storing data in the database system. The data is represented and stored as objects which are similar to the objects used in the object-oriented programming language. 6 ) Hierarchical Databases It is the type of database that stores data in the form of parent-children relationship nodes. Here, it organizes data in a tree-like structure.

    In the protocol, transactions gain locks on data items to control their access and prevent conflicts between concurrent transactions. This article will look deep into the Lock Based Protocol in detail. What is a Lock? A Lock is a variable assigned to any data item to keep track of the status of that data item so that isolation and non-interference are ensured during concurrent transactions. Lock Based Protocols A lock is a variable associated with a data item that describes the status of the data item to possible operations that can be applied to it. They synchronize the access by concurrent transactions to the database items. It is required in this protocol that all the data items must be accessed in a mutually exclusive manner. Let me introduce you to two common locks that are used and some terminology followed in this protocol. Types of Lock

    1. Shared Lock (S): Shared Lock is also known as Read-only lock. As the name suggests it can be shared between transactions because while holding this lock the transaction does not have the permission to update data on the data item. S-lock is requested using lock-S instruction.
    2. Exclusive Lock (X): Data item can be both read as well as written.This is Exclusive and cannot be held simultaneously on the same data item. X-lock is requested using lock-X instruction. Lock Compatibility Matrix A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions. Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive(X) on the item no other transaction may hold any lock on the item. If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. Then the lock is granted. Lock Compatibility Matrix Concurrency Control Protocols Concurrency Control Protocol allow concurrent schedules, but ensure that the schedules are conflict/view serializable, and are recoverable and maybe even cascadeless. These protocols do not examine the precedence graph as it is being created, instead a protocol imposes a discipline that avoids non-serializable schedules. Different concurrency control protocols provide different advantages between the amount of concurrency they allow and the amount of overhead that they impose. Types of Lock-Based Protocols
    1. Simplistic Lock Protocol It is the simplest method for locking data during a transaction. Simple lock-based protocols enable all transactions to obtain a lock on the data before inserting, deleting, or updating it. It will unlock the data item once the transaction is completed.
    2. Pre-Claiming Lock Protocol Pre-claiming Lock Protocols assess transactions to determine which data elements require locks. Before executing the transaction, it asks the DBMS for a lock on all of the data elements. If all locks are given, this protocol will allow the transaction to start. When the transaction is finished, it releases all locks. If all of the locks are not provided, this protocol allows the transaction to be reversed and waits until all of the locks are granted.
    3. Two-phase locking (2PL) A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking can be done in two phases  Growing Phase: New locks on data items may be acquired but none can be released.  Shrinking Phase: Existing locks may be released but no new locks can be acquired.
    4. Strict Two-Phase Locking Protocol Strict Two-Phase Locking requires that in addition to the 2 - PL all Exclusive(X) locks held by the transaction be released until after the Transaction Commits. Timestamp based Concurrency Control Timestamp-based concurrency control is a method used in database systems to ensure that transactions are executed safely and consistently without conflicts, even when multiple transactions are being processed simultaneously. This approach relies on timestamps to manage and coordinate the execution order of transactions. Refer to the timestamp of a transaction T as TS(T). What is Timestamp Ordering Protocol? The main idea for this protocol is to order the transactions based on their Timestamps. A schedule in which the transactions participate is then serializable and the only equivalent serial schedule permitted has the transactions in the order of their Timestamp Values. Stating simply, the schedule is equivalent to the particular Serial Order corresponding to the order of the Transaction timestamps. An algorithm must ensure that, for each item accessed by Conflicting Operations in the schedule, the order in which the item is accessed does not violate the ordering. To ensure this, use two Timestamp Values relating to each database item X.  W_TS(X) is the largest timestamp of any transaction that executed write(X) successfully.  R_TS(X) is the largest timestamp of any transaction that executed read(X) successfully. Basic Timestamp Ordering Every transaction is issued a timestamp based on when it enters the system. Suppose, if an old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj). The protocol manages concurrent execution such that the timestamps determine the serializability order. The timestamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. Whenever some Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm compares the timestamp

     Efficient: The technique is efficient and scalable, as it does not require locking and can handle a large number of transactions.  No Deadlocks: Since there are no locks involved, there is no possibility of deadlocks occurring.  Improved Performance: By allowing transactions to execute concurrently, the overall performance of the database system can be improved. Disadvantages of Timestamp Ordering Protocol  Limited Granularity: The granularity of timestamp-based concurrency control is limited to the precision of the timestamp. This can lead to situations where transactions are unnecessarily blocked, even if they do not conflict with each other.  Timestamp Ordering: In order to ensure that transactions are executed in the correct order, the timestamps need to be carefully managed. If not managed properly, it can lead to inconsistencies in the database.  Timestamp Synchronization: Timestamp-based concurrency control requires that all transactions have synchronized clocks. If the clocks are not synchronized, it can lead to incorrect ordering of transactions.  Timestamp Allocation: Allocating unique timestamps for each transaction can be challenging, especially in distributed systems where transactions may be initiated at different locations.