Secondary composite-key pages access

database-theoryindex

we have this relation
Store: [SID, City, EID]
Employee: [EID, Name, Salary, SID]
Article: [AID, Name, Producer, Price]
Inventory: [AID, SID, Count]
Invoice: [IID, SID, Customer]
Item: [IID, SID, AID, Count]

Store.EID is a foreign key referencing Employee.EID and designates the store manager. Employee.SID
is a foreign key referencing Store.SID and describes in which store this employee is employed. The
inventory table stores (with foreign keys) how many articles (AID) are in stock at which store (SID). An
invoice is generated at a store (SID) and consists of multiple items, where each item is composed of an
article (AID) and a count.

The Employee table contains 20 000 employees, of which exactly two tuples t into one page. Each
employee has a (uniformly distributed) salary between 20 000 and 100 000. There are 10 stores, each
with the same amount of employees. The Employee table has two secondary composite-key indices on
(Salary,EID) and (EID,Salary). Both indices are B+ trees with a height of h and each leaf can store
5 references to pages.

The question is
Which index requires less page accesses for the query select * from Employee where EID = 5
and Salary > 60000?

So I know that the (EID, Salary) is better to use here, since the EID is unique and it will go to the EID = 5 and then find his salary, however, I find it a little bit confusing because actually, we have one access to the EID 5 then the condition will be checked and we will either get this employee or not.
So how can I calculate the costs of each in order to compare between them?

Best Answer

Since the index (EID,Salary) has a height of h, the PK on EID must have height <=h. If this is a clustered index, then for <= h LIOs (page reads) the PK can satisfy the query.

Then assuming the PK is not clustered, using the PK costs h or h + 1 LIOs.

condition will be checked and we will either get this employee or not. So how can I calculate the costs of each in order to compare between them?

The probability that the salary >60000 is about 0.5, so you can calculate an expected cost. Using the index on (EID,Salary) has an expected cost of
0.5 * h + 0.5 * (h+1) = h + 0.5 LIOs.

Using the (Salary,EID) index costs h LIOs to seek to the first row with Salary >60000, and then you can scan across the leaf pages to find EID=5, which will require reading (0 -.05] of the total leaf pages. If Employee has 2 rows per page the (Salary, EID) index would have, perhaps, 8 rows per page. So you scan, on average, .25 of the rows or (20,000/8)/4=625. So that would cost, on average 625+h LIOs.