Beginner design of biological sampling database

database-design

As a complete newcomer, I would like to practise designing a database based on the biological data my lab has been accumulating over the years. Here is an explanation (I am simplifying/abstracting what we are doing to make this first database simpler):

Each fieldwork season, we would go and measure trees (e.g. height, trunk diametre, etc.) at different geographical locations. Over the years, each tree and location would have been visited and measured multiple times. However, due to practical constraints we may not get to visit and measure all trees at all locations every season.

The purpose of the database are twofold:

(1) Track how each individual tree changed with time, asking questions like: "Did tree A grow taller over the past two years?"

(2) Track how populations of trees at each location changed with time, asking questions like: "Did the group of 12 trees at Acme Acres grow taller in general over the past three years?"

Here is my initial thoughts on what tables to make:

(a) Trees: Fields: TreeID (e.g. "tree A"), Location (linked to table c below).

(b) Visits: This keeps metadata for each visit, with information like the date of visit (for a location, which will be the same for all trees at that location), locations we covered during the visit.

(c) Locations: Basic information on each location, such as latitude, longitude, name of location (e.g. "Acme Acres"), etc.

(c) TreeData: This contains the actual measurements for individual trees. Fields may include: tree (linked to table a), height, diametre, visit (linked to table b). In the end there will be multiple entries any one tree here, with data from each visit.

Does any of this make sense as a first dab at database design? (I've only been thinking about databases for three days)

I would appreciate and humbly accept any pointers you can give me.

Thank you very much!

BTW, One reason I was considering a Visits table was because sometimes I might to compare the data between visits 3 and 4, or between visits 1 and 5, etc

Best Answer

I would rename TreeData to measurements.

Also, unless you really need the Visits table, I wouldn't use it. If designed properly your database would look something like this if you included the visits table:

tree db design

Using the visits table would complicate some of the queries that you're wanting answered. For example, to answer number one you might use something like this in sql server:

select
     t.treeid,
     max(m.height)-min(m.height) as Growth
from trees t
     inner join measurements m
          on t.treeid=m.treeid
     inner join visits v
          on m.visitid=t.visitid
where v.date>=dateadd(y,-2,getdate())
group by t.treeid

without the visits table you could rewrite that like this:

select
     m.treeid,
     max(m.height)-min(m.height) as Growth
from measurements m
where m.date>=dateadd(y,-2,getdate())
group by m.treeid

Related Solutions

Ms-access – Database Design Regarding a Relationship for Patient Visit

The first thing that catches my eye is what appear to be separate tables for each exam. This could get rather difficult to maintain in the future: if the structure of exams changes you need to update n tables. Also, any aggregate queries that cover all exams for 1 patient will need to join n tables.

I'd suggest a structure like this:

patient_exams
-------------
  patient_exam_id (PK)
  visit_id (FK)
  patient_id (FK)
  study_id (Fk)
  exam_seq_num 
  (other fields)

Use exam_seq_num to track which exam number the record is for a patient. You can use a composite key containing patient_id, visit_id, study_id, exam_seq_num to ensure that you don't get exams for a patient with duplicate sequence numbers. You'll still need a bit of code to create the correct sequence number, maybe an on-insert trigger.

UPDATE:

Ok, so now it's clear that the exame tables are actually for different types of exams. You could have something like

base_exam
---------
  id (PK)
  patient_id (FK)
  exam_date
  (other stuff)

Lumbar_exam_details
-------------------
  lumbar_exam_id (PK)
  base_exam_id (FK to base_exam.id)
  (other specific fields)

blood_exam_details
------------------
  blood_exam_id (pk)
  base_exam_id (FK to base_exam.id)
  (other specific fields)

All of your exam detail tables reference the base_exam table, which stores common fields for all exams (such as the date of the exam, the patient who was examined, etc...).

If you really want to have a "display name" for exam types, I would do that in a view that overlays the specific exam table. For example, the query for lumbar_exam_view might look like:

SELECT *, "Lumbar Exam" AS DISPLAY_NAME
FROM LUMBAR_EXAM_DETAILS

Use this view in any queries/reports on lumbar_exam_details you will have access to display_name anywhere that you want the user-friendly string.

If you need the display name to be stored as actual data, you can add an exam_type_id field to base_exam and then have it point to an exam_type table:

exam_type
---------
  id
  display_name

Data:

exam_types
ID  | display_name
------------------
1   | Lumbar Exam
2   | Blood test

Now your base exam records have an ID that points them to the correct user-friendly string. Note that this does not ensure that the exam detail record is of the correct type (i.e. it is possible, for example, to have a base_exam record that is referenced by lumbar_exam_details, but the base_exame record erroneously references the display name "Blood Test") - it only works on the display name.

Ms-access – Create primary keys based upon higher-level entity in access

I think the difficultly lies in using the parent table as part of the sub-table's primary key. So long as you have a reference to the parent table, you can always generate a reference that combines Location+Room or Room+Rack. Otherwise the primary key will get too hard to update and you have bad primary key.

I can see two ways to tackle this depending on how many locations, rooms, racks you have. It depends on quantity for your interface.

Firstly, a schema...

tblLocation(locationID[PK++], locationReference)
tblRoom(roomID[PK++], keyLocation[tblLocation.locationID], roomNumber)
tblRack(rackID[PK++], keyRoom[tblRoom.roomID], rackReference)
tblDevice(deviceID[pk++], keyRack[tblRack.rackID], positionU, deviceLabel)

...tables for custom devices can extend tblDevice

Foreign keys in square brackets, PK++ means an auto increment primary key.

As you can see, a Location has many Rooms, a Room has many Racks, a Rack has many Devices.

I have made primary keys as auto IDs as I prefer them to be integers and there's less to think about should you change a location/room/rack reference.

Populate this with Locations, Rooms and Racks as appropriate.

Solution 1 :: Not a lot of racks

Have three drop downs on the form.
User chooses a location, you then select all the rooms of that location in the next combobox.
User then selects the room, you then fill the next combobox with the racks in the room.
User then fills in device details and U position (which you probably want to validate to make sure that you don't have anything allocated there.)

Solution 2 :: Lots of racks

Have a query similar to:

SELECT
    rackID,
    locationReference + '-' + roomReference + '-' + rackReference AS [Full Location]
FROM tblLocation
    INNER JOIN tblRoom
        ON tblLocation.locationID = tblRoom.keyLocation
    INNER JOIN tblRack
        ON tblRoom.roomID = tblRack.keyRoom
ORDER BY
    locationReference ASC,
    roomNumber ASC,
    rackReference ASC

To generate a list of Location-Room-Rack references.

ABC-101-A/1
ABC-101-A/1
ABC-101-A/2
ABC-101-A/2
ABC-105-B8
ABC-105-B8
FAY-208-X8
FAY-210-A9T
FAY-210-A9T
FAY-210-F4
NAT-410-78
NAT-410-78

User selects one, then fills in the information. List will be equal to the number of racks (which could get a bit excessive).

Advantages

If you move a rack (or racks) of equipment from one room to another, it's just a simple update query to update the database
If a location/room/rack reference changes, you only have one piece of data to update.

Further schema additions

I've used racks in the past, I simplified the schema but these are some additions that spring to mind.

Add rackSize to tblRack: the size in U's
Add sizeU to tblDevice: the size of the device in U's

You can then do further validation: if you have a 39U rack and somebody tries to put a 5U device in at location 38, you can throw an error (as U's 40-42 don't exist!). Also if somebody puts a 3U device in at location 10, and something is in location 11, you know it won't fit or they have not updated the details of the device that is no longer there!

You could also make some clever queries that would tell you how much space there is in your racks (and where). But that's beyond the scope of this question!

Hope this helps.