Database management, fall 2005, Exercises 5

1 Let's consider the join R JOIN _R.a=S.b S. Costs are measured as the number of disk accesses. Disk operations needed in writing the result are not counted. Page size in 4KB. Table R has 10,000 rows. Each page contains 10 rows. Table S has 2000 rows, 10 rows in each page. Column b is the primary key of S. Both tables are stored as piles without indexes. Buffer space is available for 52 pages.
(a) What is the cost of this join when it is done using the block nested loop join technique? How does reduction of buffer space affect the costs?
(b) What is the cost of this join when it is done using the merge join technique? How much can we reduce the needed buffer space without increasing the costs?
(c) What is the cost of this join when it is done using the hash join technique? How does the reduction of buffer space affect the costs with this technique?

2 Consider the join of task 1.
(a) When you may increase the buffer space, what would be the lowest cost join and how much buffer space would be needed?
(b) How many rows are in the result of this join at least, and at most, and how many pages must be written in the result file?
(c) How would your answer to task (b) change, if you knew that column a of R is a foreign key that refers to S and null values are not allowde in it?

3. The database contains information about the employees, departments and their financial affairs. It has the following schema.

Emp(eid: integer,did: integer, sal: integer, hobby: char(2));
Dept(did: integer, dname: char(20), floor: integer, phone: char(10));
Finance(did: integer, budget: real, sales: real, expenses: real);

Consider the query:

SELECT D.dname, F.budget
FROM Emp E, Dept D, Finance F
WHERE E.did = D.did AND D.did = F.did AND D.floor = 1
AND E.sal >= 59000 AND E.hobby = 'yodeling';

(a) Draw a query tree for it.
(b) Optimize the query (use the equivalence rules) and draw the optimized query tree.
(c) Explain briefly what indexes would improve the effiency of the execution.

4. The database contains data about parts and their suppliers. It has the following schema:

Suppliers(sid: integer, sname: char(20), city: char(20)); [300 rows]
Supply(sid: integer ->Supplier, pid: integer->Parts); [20000 rows]
Parts(pid: integer, pname: char(20), price: real); [2000 rows]

Consider the query:

SELECT S.sname, P.pname
FROM Suppliers S, Parts P, Supply Y
WHERE S.sid = Y.sid AND Y.pid = P.pid AND
S.city = 'Madison' AND P.price > 2000;

All tables are stored as heaps. Page size is 4KB. Table Parts has a secondary hash index on column pid and table Supply has secondary hash idexes on columns pid and sid. Supliers are located in 15 cities. Only 2% of parts costs more than 2000. Construct an execution plan for the query.

5. Consider the database schema

Course(courseid: integer, name: char(40), taught_by:integer -> teacher, etc: char(200)); Material(courseid: integer->Course, bookid: integer ->Books); Books(bookid: integer, name: char(40), etc: char(400)); Teacher(tid:integer, name:char(40),etc: char(200));

and the query:

SELECT b.bookid,b.name, t.name FROM Books b, Material m, Course c, teacher t WHERE c.courseid = m.courseid AND b.name = 'Java Programming' AND m.bookid = b.bookid and t.tid=c.taught_by;

Let's assume that table Course has 6000 rows. Table Material has 12000 rows, and table Books 24000 rows. Only 6000 of the books listed in table Books is used as course material. Table Teacher has 6000 rows. Page size is 4KB.
(a) Estimate the number of rows in the result.
<(b) Construct an execution plan for the query, assuming that all tables are stored as heaps, there are secondary B? -structured indexes for columns Course.name, Course.courseid, Material.courseid, Books.bookid and Teacher.tid. Estimate the number of disc accesses. You may assume that the average size of an index entry is about 20 bytes.