Deriving formulas for input/output

database-designdatabase-theoryperformance

I'm currently enrolled in a DBS class and am having problem with an assignment. I've searched around and have been unable to understand what it is I'm meant to be doing with this derivation formula.

A plant file with TREE-GENUS as the key field includes records with
the following TREE-GENUS values: Tsuga, Ficus , Arbutus, Quercus,
Melaleuca, Tristaniopsis, Cornus, Sequoiadendron, Lithocarpus,
Liriodendron, Pittosporum.
Suppose that records with these search
field values are inserted into a random (heap) file with a maximum of
3 records per block. Derive a formula for the expected number of
disk I/O to scan these records and to search for a particular record

I've been using some software that was given with the assignment and it also asks what are the maximum number of blocks that are allowed and that is not given by the above brief. I'm not really sure how to derive a formula for this. I've assumed that because there are 3 records per block there are 4 blocks required and that a random heap file uses 1 disk i/o per write/read.

If this is a larger topic than is worth explaining a link to a reliable few pages is also helpful.

Best Answer

This is obviously a homework question but it seems to me a bit like a trick question from a db perspective. I don't think there is a simple answer and here the answer may be quite a number of additional questions. My recommendation in trying to answer the question is to sketch out how pages would work, and the like. In essence so much depends on implementation (compare disk I/O in MySQL's InnoDB vs PostgreSQL and you will see massive differences due partly to supported methods of access).

The key problem comes down to indexing. The heap table could in fact be implicitly indexed the way MySQL's InnoDB does, or it could be unindexed (the way PostgreSQL does it). If it is indexed, then you have additional issues. Do you support a physical scan of the first block (to determine where to start)? If so, how do you determine which starting point to use (if you collect statistics on the data in the tables that would help but that's additional I/O there too)?

I could imagine cases where the number of disk block reads could range from n/3 (where n is the number of records) to (n^1/10)/3, and writes to range from 1 to n/3.

In the end questions like this, particularly where you have software assigned for use in class probably should be directed to your instructor. It isn't clear what the point of the exercise is and so much depends on implementation.

Related Solutions

Mysql – Database structure for a system with multisite

You can do it with multiple databases, but it will be more difficult to manage multiple schemas (rollouts, upgrades, etc) when there are changes.

The single database design is a kind of multi-tenant (now that you have the right term, you should find a lot of material about these designs) and you would need to work on the design of these grouping structures. It's certainly possible to structure search across tenants very much more easily in a single database. In separate databases, you would effectively have to query across databases. This is possible in mysql, but isn't supported within the SQL language to pick up database names out of a table to do your joins - you'd have to generate dynamic SQL.

So instead of simply:

SELECT *
FROM reservations r
INNER JOIN hotel h
    ON h.hotel_id = r.hotel_id
INNER JOIN site_hotel sh -- this table links hotels to sites and manages your search visibility in one place
    ON sh.hotel_id = h.hotel_id
INNER JOIN site s
    ON s.site_id = sh.site_id
WHERE s.site_name = 'www.a.com'

perhaps you have to use dynamic SQL build up this query as a UNION with queries where reservations is prefixed by the different database names for each database allowed for a web site from a similar hotel/site linkage table.

Like I said, the grouping structures are going to be key, since it sounds like your "tenants" are going to have a little less than simple relations. However, it's certainly possible to build this using appropriate structures once you've thought about the allowed relationships between the hotel and web site entities.

It does get difficult to scale this out where "tenants" get their own servers if they have high load if you have a lot of overlap in your tenant structures, but then again, if your tenants are allowed to see quite a bit of stuff between each other, that points even more strongly to a single database design.

Mysql – the best storing device for DB transaction log

You really need a fast HDD, but you need a properly sized innodb_log_file_size.

Why not SSD for MySQL ? I learned something from this layout from a FaceBook Engineer's blog

I wrote old posts about this

Aug 14, 2013 : How do I determine how much data is being written per day through insert, update and delete operations?
Feb 06, 2014 : MySQL on SSD - what are the disadvantages?
Aug 12, 2014 : Wordpress blog where pages are almost-always served from cache: SSD or more RAM?

I even mentioned doing this for PostgreSQL : Postgres Write Performance on Intel S3700 SSD

BTW See my post MySQL 5.5 - Innodb - innodb_log_file_size higher than 4GB combined? for determining the right size for innodb_log_file_size. You don't want a small HDD for ib_logfile0 and ib_log_file1.

Give it a Try !!!

Best Answer

Related Solutions

Mysql – Database structure for a system with multisite

Mysql – the best storing device for DB transaction log

Related Question