






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The cs 186 midterm ii exam from the university of california, berkeley's computer science division - electrical engineering and computer sciences (eecs) department, focusing on introduction to database systems. The exam covers various sql queries, join cost calculations, and query optimization techniques.
Typology: Exams
1 / 12
This page cannot be seen from the preview
Don't miss anything!
University of California, Berkeley College of Engineering Computer Science Division โ EECS
Fall 2004 Prof. Michael J. Franklin
MIDTERM II CS 186 Introduction to Database Systems
NAME: ___D.B. Guru_____________________ STUDENT ID:__________________
cs186 a b c d e f g h i j k l m n o p q r s t u v w x y z a b c d e f g h i j k l m n o p q r s t u v w x y z
This is a closed book examination โ but you are allowed one 8.5โ x 11โ sheet of notes (double sided). You should answer as many questions as possible. Partial credit will be given where appropriate. There are 100 points in all. You should read all of the questions before starting the exam, as some of the questions are substantially more time-consuming than others.
Write all of your answers directly on this paper. Be sure to clearly indicate your final answer for each question. Also, be sure to state any assumptions that you are making in your answers.
Problem Possible Score
TOTAL 100
Question 1 โSQL [6 parts, 27 points total]
For parts a-d, consider the following schema (primary keys are underlined):
Student (sname, sid, gpa, level, deptno) Course (cno, cname, deptno, units) Dept (dname, deptno) Takes (sid, cno)
a) [7 points] Write a SQL query that returns the names (i.e., snames) of students who have taken more courses outside their department than inside their department. For this question, you can assume that all students in the database have taken at least one course inside their department. (note: you should do scratch work elsewhere and just put your final answer here!)
b) [3 points] Which of the following queries returns the department numbers of those departments for which there are no courses being offered? More than one choice may be correct.
A) SELECT D.deptno FROM Dept D, Course C WHERE D.deptno NOT EQUAL C.deptno;
B) SELECT C.deptno, COUNT(C.deptno) FROM Course C GROUP BY C.deptno HAVING COUNT (C.Deptno) = NULL;
C) SELECT C.deptno FROM Course C WHERE C.deptno NOT IN (SELECT * FROM Dept);
D) SELECT D.deptno FROM Dept D WHERE NOT EXISTS (SELECT * FROM Course C WHERE C.deptno = D.deptno); E) None of the above
c) [3 points] Which of the following queries returns the id of the student with the highest GPA? More than one choice may be correct.
A) SELECT S.sid FROM Students S
e) [3 points] For the following schema: Athletes(name, country, sport, age, height, weight)
Which of the following SQL queries reflects the English query statement: "For each country, find the average height of weightlifters, qualifying only those countries that have weightlifters with minimum weight of 160 pounds." More than one choice may be correct.
A) SELECT country, avg(height) FROM Atheletes WHERE sport = "weightlifting" GROUP BY country, height, weight HAVING min(weight) >= 160;
B) SELECT country, avg(height) FROM Atheletes GROUP BY country, sport HAVING min(weight) >= 160 AND sport = "weightlifting";
C) SELECT country, avg(height) FROM Atheletes WHERE sport ="weightlifting" GROUP BY country HAVING min(weight) >= 160;
D) SELECT country, avg(height) FROM Atheletes WHERE sport = "weightlifting" AND min(weight) >= 160 GROUP BY country, weight;
E) SELECT country, avg(height) FROM Atheletes WHERE sport ="weightlifting" GROUP BY country, height HAVING min(weight) >= 160;
f) [8 points] For the schema in part (e), write a SQL query that returns for each sport, the name of the sport, the country that has the most athletes who play that sport, and the number of athletes of that country that play that sport. (note: you should do scratch work elsewhere and just put your final answer here!)
Question 2 โ Join Costs [3parts, 9 points total]
For this question, you will consider the I/O cost of operations on two tables of a database. The database has the following schema (note, this schema is slightly different than the schema used in question 1).
Students(sid, name, address, GPA) EnrolledIn(sid, classid, semester, year)
Assume that tuples are of fixed size. There are 10 Students tuples per page and 200 EnrolledIn tuples per page. Also assume that there are 1000 pages in the Students relation, and 500 pages in the EnrolledIn relation. The data is unsorted, and tuples are distributed evenly throughout the database.
For the following join strategies, give the I/O cost. Assume 52 pages in the buffer, and that no pages are currently in the buffer when the join begins. If multiple variants of an algorithm have been discussed in class, section, or in the book, use the most efficient one unless otherwise noted. Be sure to state any assumptions you are making and be sure to clearly indicate your final answer.
a) [3 points] Hash Join (not hybrid):
b) [3 points] Sort Merge (note, both relations can be sorted in two passes):
c) [3 points] Block Nested Loops:
Question 3 โ Query Optimization (continued)
b) [4 points] What is the expected cardinality of the Accident relation after initial selections are applied:
Next, estimate the I/O cost for the following access plans in Pass 1. Be sure to list any assumptions you are making (for example if you are sorting RIDs before accessing data).
c) [3points] Index scan on Car(company) for the relation Car:
d) [3 points] Index scan on Accident(accident_date) for the relation Accident:
e) [3 points] Index scan on Accident(damage_amount) for the relation Accident:
f) [3 points] List all join orders that will be considered in Pass 3 by the System R query optimizer. (ignore the specific join algorithm in this step).
g) [6 points] Suppose page nested loop join algorithm is the only available join algorithm, what is the best join order and what is the total estimated cost?
Question 4 โ ER models (continued)
a) [10 points] Create a relational schema (with SQL CREATE TABLE statements) for this diagram. Be sure to label all primary and foreign key constraints. The types of the attributes are as follows:
INTEGER: GuitarType.numFrets, Guitar.serial#, Player.ssn CHAR(20): all others
Note: Your schema for this part should have no more than four tables (solutions with more than four tables will not receive full credit). NOTE: the solution for question 4 is a sample solution; some minor variations on this could still give you full credit.
Question 4 โ ER models (continued)
b) [2 points] Say that we want to impose an additional constraint that a guitar can be played by at most one player. Indicate this new constraint on the original diagram for this question (Be sure that the change is clearly indicated, there will be no regarding on this part).
See arrow in dashed ellipse in diagram.
c) [4 points] For the change in part b, indicate (below) how your CREATE TABLE statements would have to be changed to reflect this.
d) [4 points] Now, say that we want to add the fact that some players are teachers that teach other players. Teachers can teach multiple players and players can have multiple teachers. Indicate this new information on the original diagram for this question (Be sure that the change is clearly indicated, there will be no regarding on this part).
e) [5 points] Write the CREATE TABLE statement(s) that capture this information. Note, this may cause you to change one or more of your original CREATE TABLE statements.