Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

inroduction of database system., Lecture notes of Database Management Systems (DBMS)

inroduction of database system.

Typology: Lecture notes

2019/2020

Uploaded on 03/30/2020

rahem-ali
rahem-ali 🇮🇳

1 document

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
UNIT I
UNIT I: Introduction: Database System Applications, Purpose of Database Systems, View of Data,
Database Languages, Relational Databases, Database Design, Object-based and Semi-structured
Databases, Data storage and Querying, Transaction Management, Mining and Analysis, Database
Architecture, Database Users and Administrators.
Database Design and the E-R Model: Overview of the Design Process, The Entity-Relationship Model,
Constraints, Entity-Relationship Diagrams, Entity Relationship Design Issues, Weak Entity Sets,
Extended E-R Features, Database Design for Banking Enterprise, Reduction to Relational Schemas, Other
Aspects of Database Design
INTRODUCTION
A database management system (DBMS) is a collection of interrelated and a set of application programs
used to access, update and manage that data. DBMS is also simply called as database system (DBS). This
interrelated data is usually referred to as the database (DB).
The goal of DBMS is to provide an environment that is both convenient and efficient to use in
Retrieving information from the database, Storing information into the database.
Databases are usually designed to manage large bodies of information. This involves definition of
structures for information storage (data modeling), provision of mechanisms for the manipulation of
information (systems structure, query processing), providing for the safety of information in the database
(crash recovery and security) and concurrency control if the system is shared by users.
DATABASE SYSTEM APPLICATIONS
Banking: All Transactions
Universities: Registration, Grades
Sales: Customers, Products, Purchases
Online Retailers: Order Tracking, Customized Recommendations
Manufacturing: Production, Inventory, Orders, Supply Chain
Airlines: Reservations, Schedules
Human Resources: Employee Records, Salaries, Tax Deductions
Databases touch all aspects of our lives
PURPOSE OF DATABASE SYSTEMS
The typical file processing system is supported by a conventional operating system. The system
stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files.
A file processing system has a number of major disadvantages.
1. Data Redundancy and Inconsistency: In file processing, every user group maintains its own
files for handling its data processing applications. For example, consider the UNIVERSITY
database. Here, two groups of users might be the course registration personnel and the
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download inroduction of database system. and more Lecture notes Database Management Systems (DBMS) in PDF only on Docsity!

UNIT – I

UNIT – I: Introduction: Database System Applications, Purpose of Database Systems, View of Data, Database Languages, Relational Databases, Database Design, Object-based and Semi-structured Databases, Data storage and Querying, Transaction Management, Mining and Analysis, Database Architecture, Database Users and Administrators. Database Design and the E-R Model: Overview of the Design Process, The Entity-Relationship Model, Constraints, Entity-Relationship Diagrams, Entity – Relationship Design Issues, Weak Entity Sets, Extended E-R Features, Database Design for Banking Enterprise, Reduction to Relational Schemas, Other Aspects of Database Design

INTRODUCTION

A database management system (DBMS) is a collection of interrelated and a set of application programs used to access, update and manage that data. DBMS is also simply called as database system (DBS). This interrelated data is usually referred to as the database (DB).  The goal of DBMS is to provide an environment that is both convenient and efficient to use in Retrieving information from the database, Storing information into the database. Databases are usually designed to manage large bodies of information. This involves definition of structures for information storage (data modeling), provision of mechanisms for the manipulation of information (systems structure, query processing), providing for the safety of information in the database (crash recovery and security) and concurrency control if the system is shared by users.

DATABASE SYSTEM APPLICATIONS

 Banking: All Transactions  Universities: Registration, Grades  Sales: Customers, Products, Purchases  Online Retailers: Order Tracking, Customized Recommendations  Manufacturing: Production, Inventory, Orders, Supply Chain  Airlines: Reservations, Schedules  Human Resources: Employee Records, Salaries, Tax Deductions  Databases touch all aspects of our lives 

PURPOSE OF DATABASE SYSTEMS

The typical file processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from, and add records to, the appropriate files. A file processing system has a number of major disadvantages.

1. Data Redundancy and Inconsistency: In file processing, every user group maintains its own files for handling its data processing applications. For example, consider the UNIVERSITY database. Here, two groups of users might be the course registration personnel and the

accounting office. The accounting office also keeps data on registration and related billing information, whereas the registration office keeps track of student courses and grades. Storing the same data multiple times is called data redundancy. This redundancy leads to several problems:  Need to perform a single logical update multiple times.  Storage space is wasted.  Files that represent the same data may become inconsistent.  Data inconsistency is the various copies of the same data may no larger agree. Example: One user group may enter a student's birth date erroneously as JAN- 19 - 1984, whereas the other user groups may enter the correct value of JAN- 29 - 1984.

2. Difficulty in Accessing Data: File processing environments do not allow needed data to be retrieved in a convenient and efficient manner. Suppose that one of the bank officers needs to find out the names of all customers who live within a particular area. The bank officer has now two choices: either obtain the list of all customers and extract the needed information manually or ask a system programmer to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several days later, the same officer needs to trim that list to include only those customers who have an account balance of $10,000 or more. A program to generate such a list does not exist. Again, the officer has the preceding two options, neither of which is satisfactory. 3. Data Isolation: Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult. 4. Integrity Problems: The data values stored in the database must satisfy certain types of consistency constraints. Example: The balance of certain types of bank accounts may never fall below a prescribed amount. Developers enforce these constraints in the system by addition appropriate code in the various application programs 5. Atomicity Problems: Atomic means the transaction must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file processing system. Example: Consider a program to transfer $50 from account A to account B. If a system failure occurs during the execution of the program, it is possible that the $50 was removed from account A but was not credited to account B, resulting in an inconsistent database state. 6. Concurrent Access Anomalies: For the sake of overall performance of the system and faster response, many systems allow multiple users to update the data simultaneously. In such an environment, interaction of concurrent updates is possible and may result in inconsistent data. To guard against this possibility, the system must maintain some form of supervision. But supervision is difficult to provide because data may be accessed by many different application programs that have not been coordinated previously. Example: When several reservation clerks try to assign a seat on an airline flight, the system should ensure that each seat can be accessed by only one clerk at a time for assignment to a passenger. 7. Security Problems: Enforcing security constraints to the file processing system is difficult.

DATA MODELS Underlying the structure of a database is the data model: “a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints”. A data model provides a way to describe the design of a database at the physical, logical, and view levels. The data models can be classified into four different categories:

1. Relational Model: The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields, or attributes. The collection of attributes and records will create the table. The relational data model is the most widely used data model, and a vast majority of current database systems are based on the relational model. 2. Entity-Relationship Model: The entity-relationship (E-R) data model uses a collection of basic objects, called entities , and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design. 3. Object-Based Data Model: Object-oriented programming (especially in Java, C++, or C#) has become the dominant software-development methodology. This led to the development of an object- oriented data model that can be seen as extending the E-R model. The object-relational data model combines features of the object-oriented data model and relational data model. 4. Semi structured Data Model: The semi structured data model permits the specification of data where individual data items of the same type may have different sets of attributes. The Extensible Markup Language (XML) is widely used to represent semi structured data.  Historically, network data model and hierarchical data model preceded relational data model. These models were tied closely to the underlying implementation, and complicated task of modeling data. As a result they are used little now, except in old database code that is still in service in some places. DATABASE LANGUAGES A database system provides a Data-Definition Language to specify the database schema and a Data Manipulation Language to express database queries and updates. In practice, the data-definition and data manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language.

Data-Definition Language: “ We specify a database schema by a set of definitions expressed by a

special language called a data-definition language (DDL)”. DDL is also used to specify additional properties of data.

The DDL just like any other programming language gets some instructions (statements) as input and generates some output. The output of the DDL is placed in the data dictionary, which contains metadata—that is, data about data. The data dictionary is considered to be a special type of table that can only be accessed and updated by the database system itself (not a regular user). The database system consults the data dictionary before reading or modifying actual data. Example: Create Table account (account_number char (10), balance integer) The data values stored in the database must satisfy certain consistency constraints. For example, suppose the university requires that the account balance of a department must never be negative. DDL provides facilities to specify such constraints. Database systems implement integrity constraints that can be tested:  Domain Constraints: A domain of possible values must be associated with every attribute (for example, integer types, character types, date/time types). Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take. These are the most elementary form of integrity constraint.  Referential Integrity:. There are cases where we wish to ensure that a value that appears in one relation for a given set of attributes also appears in a certain set of attributes in another relation (referential integrity). For example, the dept_name value in a course record must appear in the dept_name attribute of some record of the department relation. Database modifications can cause violations of referential integrity. When a referential integrity constraint is violated, normal procedure is to reject the action that caused the violation.  Assertions: An assertion is any condition that the database must always satisfy. Domain constraints and referential integrity constraints are special forms of assertions. However, there are many constraints that we cannot express by using only these special forms. For example, “Every department must have at least five courses offered every semester” must be expressed as an assertion.  Authorization: We may want to differentiate among the users as far as the type of access they are permitted on various data values in the database. These differentiations are expressed in terms of authorization. (The most common authorization is: read authorization , which allows reading, but not modification of data; insert authorization , which allows insertion of new data, but not modification of existing data; update authorization , which allows modification, but not deletion, of data; and delete authorization , which allows deletion of data. We may assign the user all, none, or a combination of these types of authorization).  We specify the storage structure and access methods used by the database system by a set of statements in a special type of DDL called a data storage and definition language. These statements define the implementation details of the database schemas, which are usually hidden from the users.

Data-Manipulation Language: A data-manipulation language (DML) is a language that enables

users to access or manipulate data as organized by the appropriate data model. The types of access are:  Retrieval of information stored in the database (Select)  Insertion of new information into the database (Insert)  Deletion of information from the database (Delete)  Modification of information stored in the database (Update)

The query specifies that those rows from the table instructor where the dept_name is History must be retrieved, and the name attribute of these rows must be displayed. If the query is run on table in previous figure, result will consist of two rows, one with the name El Said and the other with the name Califieri. Queries may involve information from more than one table. For instance, following query finds ID and department name of all instructors associated with a department with budget of greater than $95,000. Select instructor.ID, department.dept_name From instructor, department Where instructor.dept_name= department.dept_name and department.budget > 95000; If above query were run on the tables in above figure, the system would find that there are two departments with budget of greater than $95,000—Computer Science and Finance; there are five instructors in these departments. Thus, the result will consist of a table with two columns (ID, dept_name) and five rows: (12121, Finance), (45565, Computer Science), (10101, Computer Science), (83821, Computer Science), and (76543, Finance). Database Access from Application Programs: SQL does not support actions such as input from users, output to displays, or communication over the network. Such computations and actions must be written in a host language, such as C, C++, or Java, with embedded SQL queries that access the data in the database. There are two ways to do this:  By providing an application program interface (set of procedures) that can be used to send DML and DDL statements to the database and retrieve the results. The Open Database Connectivity (ODBC) standard for use with the C language is a commonly used application program interface standard. The Java Database Connectivity (JDBC) standard provides corresponding features to the Java language.  By extending the host language syntax to embed DML calls within the host language program. DATABASE DESIGN Database systems are designed to manage large bodies of information. Database design mainly involves the design of the database schema. The design of a complete database application environment that meets the needs of the enterprise being modeled requires attention to a broader set of issues.

Design Process: A high-level data model provides the database designer with a conceptual framework

in which to specify the data requirements of the database users, and how the database will be structured to fulfill these requirements.

 The initial phase of database design is to characterize fully the data needs of database users. The outcome of this phase is a specification of user requirements.  Next, the designer chooses a data model, and by applying the concepts of the chosen data model, translates these requirements into a conceptual schema of the database.  The schema developed at this conceptual-design phase provides a detailed overview of the enterprise. The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another. The designer can also examine the design to remove any redundant features. The focus at this point is on describing the data and their relationships, rather than on specifying physical storage details.  In terms of the relational model, the conceptual-design process involves decisions on what attributes we want to capture in the database and how to group these attributes to form the various tables. The “what” part is basically a business decision and the “how” part is mainly a computer-science problem. There are principally two ways to tackle the problem.  First one is to use entity-relationship model ; the other is to employ a set of algorithms collectively known as normalization that takes as input the set of all attributes and generates a set of tables.  The process of moving from an abstract data model to the implementation of the database proceeds in two final design phases. In the logical-design phase , the designer maps the high-level conceptual schema onto the implementation data model of the database system that will be used. The designer uses the resulting system-specific database schema in the subsequent physical-design phase , in which the physical features of the database are specified.

Database Design for a University Organization: The description that arises from this design

phase serves as the basis for specifying the conceptual structure of the database. Here are the major characteristics of the university.  University is organized into departments. Each department is identified by a unique name (dept_name).  Each department has a list of courses it offers. Each course has associated with it a course_id, title, dept_name, and credits, and may also have have associated prerequisites.  Instructors are identified by their unique ID. Each instructor has name, associated department (dept_name), and salary.  Students are identified by their unique ID. Each student has a name, an associated major department (dept_name), and tot_cred (total credit hours the student earned thus far).  The university maintains a list of classrooms, specifying the name of the building, room number, and room capacity.  The university maintains a list of all classes (sections) taught. Each section is identified by a course_id, sec_id, year, and semester, and has associated with it a semester, year, building, room number, and time_slot_id.  The university has a list of all student course registrations, specifying, for each student, the courses and the associated sections that the student has taken (registered for).

The overall logical structure (schema) of a database can be expressed graphically by an entity- relationship (E-R) diagram. There are several ways in which to draw these diagrams. One of the most popular is to use the Unified Modeling Language (UML). In the notation we use, which is based on UML, an E-R diagram is represented as follows:  Entity sets are represented by a partitioned rectangular box with the entity set name in the header and the attributes listed below it.  Relationship sets are represented by a diamond connecting a pair of related entity sets. The name of the relationship is placed inside the diamond. Fig: E-R Diagram In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a database must conform. One important constraint is mapping cardinalities, which express the number of entities to which another entity can be associated via a relationship set. For example, if each instructor must be associated with only a single department, the E-R model can express that constraint. The entity-relationship model is widely used in database design.

Normalization: Another method for designing a relational database is to use normalization. To

understand the need for normalization, let us look at what can go wrong in a bad database design. Among the undesirable properties that a bad design may have are:  Repetition of information  Inability to represent certain information The goal of Normalization is to generate a set of relation schemas that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. The approach is to design schemas that are in an appropriate normal form. To determine whether a relation schema is in one of the desirable normal forms, we need additional information about the real-world enterprise that we are modeling with the database. The most common approach is to use functional dependencies. DATABASE ARCHITECTURE The architecture of a database system is greatly influenced by the underlying computer system on which the database system runs. Database systems can be centralized, or client-server, where one server machine executes work on behalf of multiple client machines. Database systems can also be designed to exploit parallel computer architectures. Distributed databases span multiple geographically separated machines. Most users of a database system today are not present at the site of the database system, but connect to it through a network. We can therefore differentiate between client machines, on which remote database users work, and server machines, on which the database system runs. The figure given provides a single picture of the various components of a database system and the connections among them.

Database applications are usually partitioned into two or three parts. In two-tier architecture, the application resides at the client machine, where it invokes database system functionality at the server machine through query language statements. Application program interface standards like ODBC and JDBC are used for interaction between the client and the server. In contrast, in three-tier architecture, client end communicates with an application server, usually through a forms interface. The application server in turn communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server. Three-tier applications are more appropriate for large applications, and for applications that run on the WWW.

4. Granting of Authorization for Data Access: By granting different types of authorization, the database administrator can regulate which parts of the database various users can access. The authorization information is kept in a special system structure that the database system consults whenever someone attempts to access the data in the system. 5. Routine Maintenance: Examples of the database administrator’s routine maintenance activities are:  Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of data in case of disasters such as flooding.  Ensuring that enough free disk space is available for normal operations, and upgrading disk space as required.  Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive tasks submitted by some users.

DATA STORAGE AND QUERYING

A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be broadly divided into the storage manager and the query processor components.

Storage Manager: Storage manager is the component of a database system that provides interface

between low-level data stored in database, application programs and queries submitted to the system. The storage manager is important because databases typically require a large amount of storage space. Corporate databases range in size from hundreds of gigabytes to, for the largest databases, terabytes of data. Since the main memory of computers cannot store this much information, the information is stored on disks. Data are moved between disk storage and main memory as needed.  The storage manager is responsible for the interaction with the file manager. The raw data are stored on the disk using the file system provided by the operating system. The storage manager is responsible for storing, retrieving, and updating data in the database. The storage manager components include:

1. Authorization and Integrity Manager , which tests for the satisfaction of integrity constraints and checks the authority of users to access data. 2. Transaction Manager , which ensures that the database remains in a consistent (correct) state despite system failures, and that concurrent transaction executions proceed without conflicting. 3. File Manager , which manages the allocation of space on disk storage and the data structures used to represent information stored on disk. 4. Buffer Manager , which is responsible for fetching data from disk storage into main memory, and deciding what data to cache in main memory. Buffer manager is a critical part of the database system, since it enables the database to handle data sizes that are much larger than the size of main memory. The storage manager implements several data structures as part of the physical system implementation:  Data files, which store the database itself.  Data dictionary, which stores metadata about structure of database, in particular schema of database.  Indices, which can provide fast access to data items. Like the index in a textbook, a database index provides pointers to those data items that hold a particular value.

The Query Processor: The query processor is important because it helps the database system to

simplify and facilitate access to data. It allows database users to obtain good performance while being able to work at the view level. It is the job of the database system to translate updates and queries written in a nonprocedural language, at the logical level, into an efficient sequence of operations at the physical level. The query processor components include:

  1. DDL interpreter , which interprets DDL statements and records the definitions in the data dictionary.
  2. DML compiler , which translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands. The DML compiler also performs query optimization ; that is, it picks lowest cost evaluation plan among the alternatives.
  3. Query evaluation engine , which executes low-level instructions generated by the DML compiler.

SPECIALITY (OBJECT-BASED AND SEMI-STRUCTURED) DATABASES

Object-Based Data Models: Object-oriented programming has become the dominant software

development methodology. This led to the development of an object-oriented data model that can be seen as extending the E-R model with notions of encapsulation, methods (functions), and object identity. Inheritance, and encapsulation (information hiding), with methods to provide an interface to objects, are among the key concepts of object-oriented programming that have found applications in data modeling. The object-oriented data model also supports a rich type system, including structured and collection types. In the 1980s, several database systems based on the object-oriented data model were developed. The major database vendors presently support the object-relational data model, a data model that combines features of the object-oriented data model and relational data model.

Semi Structured Data Models: Semi structured data models permit the specification of data where

individual data items of the same type may have different sets of attributes. This is in contrast with data models mentioned earlier, where every data item of a particular type must have the same set of attributes. The XML language was initially designed as a way of adding markup information to text documents, but has become important because of its applications in data exchange. XML provides a way to represent data that have nested structure, and furthermore allows a great deal of flexibility in structuring of data, which is important for certain kinds of nontraditional data.

TRANSACTION MANAGEMENT

Often, several operations on the database form a single logical unit of work. An example is a funds transfer, in which one department account (say A) is debited and another department account (say B) is credited. Clearly, it is essential that either both the credit and debit occur, or that neither occur. That is, the funds transfer must happen in its entirety or not at all. This all-or-none requirement is called atomicity. In addition, it is essential that the execution of the funds transfer preserve the consistency of the database. That is, the value of the sum of the balances of A and B must be preserved. This correctness requirement is called consistency. Finally, after the successful execution of a funds transfer, the new values of the balances of accounts A and B must persist, despite the possibility of system failure. This persistence requirement is called durability.

DATABASE DESIGN AND THE E-R MODEL

Overview of the Design Process: The task of creating a database application is a complex one,

involving design of the database schema, design of the programs that access and update the data, and design of a security scheme to control access to data. Needs of users play a central role in design process. Design Phases: For small applications, it may be feasible for a database designer who understands the application requirements to decide directly on the relations to be created, their attributes, and constraints on the relations. A high-level and complex data model provides database designer with a conceptual framework to specify data requirements of database users, and how database will be structured to fulfill these requirements.  The initial phase of database design is to characterize fully the data needs of database users. The outcome of this phase is a specification of user requirements.  Next, the designer chooses E-R data model, and by applying the concepts of the chosen data model, translates these requirements into a conceptual schema of the database.  The schema developed at this conceptual-design phase provides a detailed overview of the enterprise. The entity-relationship model is typically used to represent the conceptual design. By specifying the entities, the attributes of the entities, the relationships among the entities, and constraints on the entities and relationships. Typically, the conceptual-design phase results in the creation of an entity-relationship diagram that provides a graphic representation of the schema.  The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another. Designer can also examine the design to remove any redundant features. The focus at this point is on describing the data and their relationships, rather than on specifying physical storage details. The process of moving from an abstract data model to the implementation of the database proceeds in two final design phases.  In the logical-design phase , the designer maps the high-level conceptual schema onto the implementation data model of the database system that will be used.  Finally, the designer uses the resulting system-specific database schema in the subsequent physical- design phase , in which the physical features of the database are specified. Note: The physical schema of a database can be changed relatively easily after an application has been built. However, changes to the logical schema are usually harder to carry out, since they may affect a number of queries and updates scattered across application code. It is therefore important to carry out the database design phase with care, before building the rest of the database application. Design Alternatives: A major part of the database design process is deciding how to represent in the design the various types of “things” such as people, places, products, and the like. We use the term entity to refer to any such distinctly identifiable item. In a university database, examples of entities would include instructors, students, and departments. Various entities are related to each other in a variety of ways, all of which need to be captured in database design.

In designing a database schema, we must avoid two major pitfalls:

1. Redundancy: A bad design may repeat information. Redundancy can also occur in a relational schema. The biggest problem with redundant information is that the copies of a piece of information can become inconsistent if the information is updated without taking precautions to update all copies of the information. Ideally, information should appear in exactly one place. 2. Incompleteness: A bad design may make certain aspects of enterprise difficult/impossible to model. For example, suppose that, we only had entities corresponding to course offering, without having an entity corresponding to courses. Equivalently, in terms of relations, suppose we have a single relation where we repeat all of the course information once for each section that the course is offered. It would then be impossible to represent information about a new course, unless that course is offered. THE ENTITY-RELATIONSHIP MODEL The entity-relationship (E-R) data model was developed to facilitate database design by allowing specification of an enterprise schema that represents the overall logical structure of a database. The E-R model is very useful in mapping the meanings and interactions of real-world enterprises onto a conceptual schema. Because of this usefulness, many database-design tools draw on concepts from the E-R model. The E-R data model employs three basic concepts: entity sets, relationship sets, and attributes.

Entity Sets: An entity is a “thing” or “object” in the real world that is distinguishable from all other

objects. For example, each person in a university is an entity. An entity has a set of properties, and the values for some set of properties may uniquely identify an entity. For instance, a person may have a person id property whose value uniquely identifies that person. An entity may be concrete, such as a person or a book, or it may be abstract, such as a course, a course offering, or a flight reservation. An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all people who are instructors at a given university, for example, can be defined as the entity set instructor. We use the term extension of the entity set to refer to the actual collection of entities belonging to the entity set. Thus, the set of actual instructors in the university forms the extension of the entity set instructor. The above distinction is similar to the difference between a relation and a relation instance. An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. Possible attributes of instructor entity set are ID, name, dept_name, and salary. Each entity has a value for each of its attributes. For instance, a particular instructor entity may have value 12121 for ID, the value Wu for name, the value Finance for dept_name, and the value 90000 for salary. Fig: Instructor Entity Set

n-ary relationships. The number of entity sets that participate in a relationship set is the degree of the relationship set. A binary relationship set is of degree 2; a ternary relationship set is of degree 3.

Attributes : For each attribute, there is a set of permitted values, called the domain , or value set. The

domain attribute semester might be strings from the set {Fall, Winter, Spring, Summer}. Since an entity set may have several attributes, each entity can be described by a set of (attribute, data value) pairs. Ex: A particular instructor entity may be described by the set {(ID, 76766), (name, Crick), (dept_name, Biology), (salary, 72000)}, meaning that the entity describes a person named Crick whose instructor ID is 76766, who is a member of the Biology department with salary of $72,000. An attribute, as used in the E-R model, can be characterized by the following attribute types.

1. Simple and Composite Attributes: In our examples thus far, the attributes have been simple; that is, they have not been divided into subparts. Composite attributes, on the other hand, can be divided into subparts (that is, other attributes). For example, an attribute name could be structured as a composite attribute consisting of first name, middle name, and last name. Composite attributes help us to group together related attributes, making the modeling cleaner. Note: Using composite attributes in a design schema is a good choice if a user will wish to refer to an entire attribute on some occasions and to only a component of the attribute on other occasions. 2. Single-valued and Multivalued Attributes: There may be instances where an attribute has a set of values for a specific entity. Suppose an instructor may have one, or several phone numbers, and different instructors may have different numbers of phones. This type of attribute is said to be multivalued. To denote that an attribute is multivalued, we enclose it in braces, for example {phone_number} or {dependent name}. Note: Upper and lower bounds may be placed on number of values in a multivalued attribute, if needed. 3. Derived Attributes: Value for Derived attribute can be derived from values of other related attributes or entities. For instance, suppose that instructor entity set has an attribute age. If instructor entity set also has an attribute date_of_birth, we can calculate age from date_of_birth and current date. Thus, age is a derived attribute. In this case, date_of_birth may be referred to as a base attribute, or a stored attribute. Value of a derived attribute is not stored but is computed when required.  An attribute takes a null value when an entity does not have a value for it. The null value may indicate “not applicable” (that is, that the value does not exist for the entity) or “unknown”. An unknown value

may be either missing (the value does exist, but we do not have that information) or not known (we do not know whether or not the value actually exists). A null value for the apartment number attribute could mean that the address does not include an apartment number (not applicable), that an apartment number exists but we do not know what it is (missing), or that we do not know whether or not an apartment number is part of the instructor’s address (unknown).

CONSTRAINTS

An E-R schema may define certain constraints to which the contents of a database must conform. Mapping Cardinalities: Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. Mapping cardinalities are most useful in describing binary relationship sets. For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of the following:

  1. One-to-One: An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A.
  2. One-to-Many: An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can be associated with at most one entity in A.
  3. Many-to-One: An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A.
  4. Many-to-Many: An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is associated with any number (zero or more) of entities in A.