Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Management Systems: Concepts, Algorithms, and Applications, Schemes and Mind Maps of Data Structures and Algorithms

Data Structure NOtes for b.tech students. They can refer for the exam.

Typology: Schemes and Mind Maps

2022/2023

Uploaded on 01/04/2023

AshwaniKumarcse
AshwaniKumarcse 🇮🇳

5

(1)

3 documents

1 / 98

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62

Partial preview of the text

Download Database Management Systems: Concepts, Algorithms, and Applications and more Schemes and Mind Maps Data Structures and Algorithms in PDF only on Docsity!

DIGITAL NOTES

ON

DATABASE MANAGEMENT SYSTEMS

(R18A0510 )

B.TECH II YEAR - II SEM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY

(Autonomous Institution – UGC, Govt. of India)

(Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC – ‘A’ Grade - ISO 9001:2015 Certified) Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad – 500100, Telangana State, INDIA.

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SYLLABUS

II Year B. Tech. IT – II Sem L T/P/D C 4 1/ - / - 3 (R18A05 10 ) DATABASE MANAGEMENT SYSTEMS Objectives:  To Understand the basic concepts and the applications of database systems  To Master the basics of SQL and construct queries using SQL  To understand the relational database design principles  To become familiar with the basic issues of transaction processing and concurrency control  To become familiar with database storage structures and access techniques UNIT I: Database System Applications, Purpose of Database Systems, View of Data – Data Abstraction

  • Instances and Schemas – Database Languages – database Access for applications Programs – Database Users and Administrator – Transaction Management – Database Architecture – Storage Manager – the Query Processor. Data Models: Introduction to the Relational Model – Structure – Database Schema, Keys – Schema Diagrams. Database design– Other Models, ER diagrams – ER Model - Entities, Attributes and Entity sets – Relationships and Relationship sets – ER Design Issues – Concept Design – Conceptual Design with relevant Examples. Relational Query Languages, Relational Operations. UNIT II: Relational Algebra – Selection and projection set operations – renaming – Joins – Division – Examples of Algebra overviews – Relational calculus – Tuple Relational Calculus (TRC) – Domain relational calculus (DRC). Overview of the SQL Query Language – Basic Structure of SQL Queries, Set Operations, Aggregate Functions – GROUPBY – HAVING, Nested Sub queries, Views, Triggers, Procedures. UNIT III: Normalization – Introduction, Non loss decomposition and functional dependencies, First, Second, and third normal forms – dependency preservation, Boyce/Codd normal form. Higher Normal Forms - Introduction, Multi-valued dependencies and Fourth normal form, Join dependencies and Fifth normal form UNIT IV: Transaction Concept- Transaction State- Implementation of Atomicity and Durability – Concurrent Executions – Serializability- Recoverability – Implementation of Isolation – Testing for serializability- Lock – Based Protocols – Timestamp Based Protocols- Validation- Based Protocols – Multiple Granularity.

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

INDEX

S. No Unit Topic Page no 1 I

INTRODUCTION TO DATABASE

MANAGEMENT SYSTEM 1

I VIEW OF DATA

I INSTANCES AND SCHEMAS

I

ENTITY-RELATIONSHIP MODEL 9

I DATABASE SCHEMA

S. No Unit Topic Page no 1

II

PRELIMINARIES 23

II

RELATIONAL ALGEBRA 23

II

RELATIONAL CALCULUS 28

II

THE FORM OF A BASIC SQL QUERY 31

II INTRODUCTION TO VIEWS

II

TRIGGERS 40

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

S. No Unit Topic Page no 1

III SCHEMA REFINEMENT

III FUNCTIONAL

DEPENDENCIES

III

NORMAL FORMS 46

III

DECOMPOSITIONS 49

III DEPENDENCY-PRESERVING

DECOMPOSITION INTO 3NF

III OTHER KINDS OF

DEPENDENCIES 56

S. No Unit Topic Page no 1

IV

TRANSACTION CONCEPT 63

IV

CONCURRENT EXECUTION 67

IV TRANSACTION

CHARACTERISTICS

IV RECOVERABLE

SCHEDULES 76

IV

RECOVERY SYSTEM 79

IV TIMESTAMP-BASED

PROTOCOLS

IV

MULTIPLE GRANULARITY. 87

UNIT- 1

Introduction to Database Management System

As the name suggests, the database management system consists of two parts. They are:

  1. Database and
  2. Management System What is a Database? To find out what database is, we have to start from data, which is the basic building block of any DBMS. Data : Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc). Record : Collection of related data items, e.g. in the above example the three data items had no meaning. But if we organize them in the following way, then they collectively represent meaningful information. Roll Name Age 1 ABC 19 Table or Relation : Collection of related records. Roll Name Age 1 ABC 19 2 DEF 22 3 XYZ 28 The columns of this relation are called Fields , Attributes or Domains. The rows are called Tuples or Records. Database : Collection of related relations. Consider the following collection of tables: T 1 T T3 T Roll Name Age 1 ABC 19 2 DEF 22 3 XYZ 28 Roll Address 1 KOL 2 DEL 3 MUM

Age and Hostel attributes are in different tables. A database in a DBMS could be viewed by lots of different people with different responsibilities. Figure 1.1: Empolyees are accessing Data through DBMS For example, within a company there are different departments, as well as customers, who each need to see different kinds of data. Each employee in the company will have different levels of access to the database with their own customized front-end application. In a database, data is organized strictly in row and column format. The rows are called Tuple or Record. The data items within one row may belong to different data types. On the other hand, the columns are often called Domain or Attribute. All the data items within a single attribute are of the same data type. What is Management System? A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data. This is a collection of related data with an implicit meaning and hence is a database. The collection of data, usually referred to as the database , contains information relevant to an Year Hostel I H II H Roll Year 1 I 2 II 3 I

Database systems arose in response to early methods of computerized management of commercial data. As an example of such methods, typical of the 1960s, consider part of a university organization that, among other data, keeps information about all instructors, students, departments, and course offerings. One way to keep the information on a computer is to store it in operating system files. To allow users to manipulate the information, the system has a number of application programs that manipulate the files, including programs to: Add new students, instructors, and courses Register students for courses and generate class rosters Assign grades to students, compute grade point averages (GPA), and generate transcripts This typical file-processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from, and add records to, the appropriate files. Before database management systems (DBMSs) were introduced, organizations usually stored information in such systems. Keeping organizational information in a file-processing system has a number of major disadvantages: Data redundancy and inconsistency. Since different programmers create the files and application programs over a long period, the various files are likely to have different structures and the programs may be written in several programming languages. Moreover, the same information may be duplicated in several places (files). For example, if a student has a double major (say, music and mathematics) the address and telephone number of that student may appear in a file that consists of student records of students in the Music department and in a file that consists of student records of students in the Mathematics department. This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency ; that is, the various copies of the same data may no longer agree. For example, a changed student address may be reflected in the Music department records but not elsewhere in the system. Difficulty in accessing data. Suppose that one of the university clerks needs to find out the names of all students who live within a particular postal-code area. The clerk asks the data-processing department to generate such a list. Because the designers of the original system did not anticipate this request, there is no application program on hand to meet it. There is, however, an application program to generate the list of all students. Data isolation. Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult. Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. Suppose the university maintains an account for each department, and records the balance amount in each account. Suppose also that the university requires that the account balance of a department may never fall below zero. Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files. Atomicity problems. A computer system, like any other device, is subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure. Consider a program to transfer $500 from the account balance of department A to the account balance of department B. If a system failure occurs during the execution of the program, it is possible that the

$500 was removed from the balance of department A but was not credited to the balance of department B , resulting in an inconsistent database state. Clearly, it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic —it must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system. Concurrent-access anomalies. For the sake of overall performance of the system and faster response, many systems allow multiple users to update the data simultaneously. Indeed, today, the largest Internet retailers may have millions of accesses per day to their data by shoppers. In such an environment, interaction of concurrent updates is possible and may result in inconsistent data. Consider department A , with an account balance of $10,000. If two department clerks debit the account balance (by say $500 and $100, respectively) of department A at almost exactly the same time, the result of the concurrent executions may leave the budget in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value $10,000, and write back $9500 and $9900, respectively. Depending on which one writes the value last, the account balance of department A may contain either $9500 or $9900, rather than the correct value of $9400. To guard against this possibility, the system must maintain some form of supervision. But supervision is difficult to provide because data may be accessed by many different application programs that have not been coordinated previously. Security problems. Not every user of the database system should be able to access all the data. For example, in a university, payroll personnel need to see only that part of the database that has financial information. They do not need access to information about academic records. But, since application programs are added to the file-processing system in an ad hoc manner, enforcing such security constraints is difficult. These difficulties, among others, prompted the development of database systems. In what follows, we shall see the concepts and algorithms that enable database systems to solve the problems with file- processing systems. Advantages of DBMS: Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same data multiple times). In a database system, by having a centralized database and centralized control of data by the DBA the unnecessary duplication of data is avoided. It also eliminates the extra time for processing the large volume of data. It results in saving the storage space. Improved Data Sharing : DBMS allows a user to share the data in any number of application programs. Data Integrity : Integrity means that the data in the database is accurate. Centralized control of the data helps in permitting the administrator to define integrity constraints to the data in the database. For example: in customer database we can can enforce an integrity that it must accept the customer only from Noida and Meerut city. Security : Having complete authority over the operational data, enables the DBA in ensuring that the only mean of access to the database is through proper channels. The DBA can define authorization checks to be carried out whenever access to sensitive data is attempted.

Database DISK Figure 1.2 : Levels of Abstraction in a DBMS

  • Physical level (or Internal View / Schema) : The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail.
  • Logical level (or Conceptual View / Schema) : The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. This is referred to as physical data independence.
  • View level (or External View / Schema): The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database. For example, we may describe a record as follows: type instructor = record ID : char (5); name : char (20); dept name : char (20); salary : numeric (8,2); end ; This code defines a new record type called instructor with four fields. Each field has a name and a type associated with it. A university organization may have several such record types, including
    • department , with fields dept_name , building , and budget
    • course , with fields course_id , title , dept_name , and credits
  • student , with fields ID , name , dept_name , and tot_cred At the physical level, an instructor , department , or student record can be described as a block of consecutive storage locations. At the logical level, each such record is described by a type definition, as in the previous code segment, and the interrelationship of these record types is defined as well. Finally, at the view level, computer users see a set of application programs that hide details of the data types. At the view level, several views of the database are defined, and a database user sees some or all of these views. Instances and Schemas Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Schemas are changed infrequently, if at all. The concept of database schemas and instances can be understood by analogy to a program written in a programming language. Each variable has a particular value at a given instant. The values of the variables in a program at a point in time correspond to an instance of a database schema. Database systems have several schemas, partitioned according to the levels of abstraction. The physical schema describes the database design at the physical level, while the logical schema describes the database design at the logical level. A database may also have several schemas at the view level, sometimes called subschemas , which describe different views of the database. Of these, the logical schema is by far the most important, in terms of its effect on application programs, since programmers construct applications by using the logical schema. Application programs are said to exhibit physical data independence if they do not depend on the physical schema, and thus need not be rewritten if the physical schema changes. Data Models Underlying the structure of a database is the data model : a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. The data models can be classified into four different categories:
  • Relational Model. The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record-based model. Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic objects, called entities , and relationships among these objects. Suppose that each department has offices in several locations and we want to record the locations at which each employee works. The ER diagram for this variant of Works In, which we call Works In

E R Model - (Railway Booking System)

E R Model - (Banking Transaction System) Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has become the dominant software-development methodology. This led to the development of an object- oriented data model that can be seen as extending the E-R model with notions of encapsulation, methods (functions), and object identity. Semi-structured Data Model. The semi-structured data model permits the specification of data where individual data items of the same type may have different sets of attributes. This is in contrast to the data models mentioned earlier, where every data item of a particular type must have the same set of attributes. The Extensible Markup Language (XML) is widely used to represent semi- structured data. Historically, the network data model and the hierarchical data model preceded the relational data model. These models were tied closely to the underlying implementation, and complicated the task of modeling data. As a result they are used little now, except in old database code that is still in service in some places. Database Languages A database system provides a data-definition language to specify the database schema and a data-manipulation language to express database queries and updates. In practice, the data-definition and data-manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language. Data-Manipulation Language A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. The types of access are:

  • Retrieval of information stored in the database
  • Insertion of new information into the database
  • Deletion of information from the database
  • Modification of information stored in the database There are basically two types:
  • Procedural DMLs require a user to specify what data are needed and how to get those data.
  • Declarative DMLs (also referred to as nonprocedural DMLs ) require a user to specify what data are needed without specifying how to get those data. A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language. Although technically incorrect, it is common practice to use the terms query language and data-manipulation language synonymously. Data-Definition Language (DDL) We specify a database schema by a set of definitions expressed by a special language called a data- definition language ( DDL ). The DDL is also used to specify additional properties of the data.
  • Domain Constraints. A domain of possible values must be associated with every attribute (for example, integer types, character types, date/time types). Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take. Domain constraints are the most elementary form of integrity constraint. They are tested easily by the system whenever a new data item is entered into the database.

There are four different types of database-system users, differentiated by the way they expect to interact with the system. Different types of user interfaces have been designed for the different types of users. Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. For example, a bank teller who needs to transfer $50 from account A to account B invokes a program called transfer. Application programmers are computer professionals who write application programs. Application programmers can choose from many tools to develop user interfaces. Rapid application development (RAD) tools are tools that enable an application programmer to construct forms and reports without writing a program. Sophisticated users interact with the system without writing programs. Instead, they form their requests in a database query language. They submit each such query to a query processor , whose function is to break down DML statements into instructions that the storage manager understands. Analysts who submit queries to explore data in the database fall in this category. Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of data in different ways. For instance, an analyst can see total sales by region (for example, North, South, East, and West), or by product, or by a combination of region and product (that is, total sales of each product in each region). Database Architecture: The architecture of a database system is greatly influenced by the underlying computer system on which the database system runs. Database systems can be centralized, or client-server, where one server machine executes work on behalf of multiple client machines. Database systems can also be designed to exploit parallel computer architectures. Distributed databases span multiple geographically separated machines. Figure 1.3: Database System Architecture A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be broadly divided into the storage manager and the query processor components. The storage manager is important because databases

typically require a large amount of storage space. The query processor is important because it helps the database system simplify and facilitate access to data. Figure 1.4: Two-tier and three-tier architectures. Query Processor: The query processor components include · DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary. · DML compiler, which translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands. A query can usually be translated into any of a number of alternative evaluation plans that all give the same result. The DML compiler also performs query optimization , that is, it picks the lowest cost evaluation plan from among the alternatives. Query evaluation engine, which executes low-level instructions generated by the DML compiler. Storage Manager: A storage manager is a program module that provides the interface between the lowlevel data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible for the interaction with the file manager. The storage manager components include: