Beginner design of biological sampling database

database-design

As a complete newcomer, I would like to practise designing a database based on the biological data my lab has been accumulating over the years. Here is an explanation (I am simplifying/abstracting what we are doing to make this first database simpler):

Each fieldwork season, we would go and measure trees (e.g. height, trunk diametre, etc.) at different geographical locations. Over the years, each tree and location would have been visited and measured multiple times. However, due to practical constraints we may not get to visit and measure all trees at all locations every season.

The purpose of the database are twofold:

(1) Track how each individual tree changed with time, asking questions like: "Did tree A grow taller over the past two years?"

(2) Track how populations of trees at each location changed with time, asking questions like: "Did the group of 12 trees at Acme Acres grow taller in general over the past three years?"

Here is my initial thoughts on what tables to make:

(a) Trees: Fields: TreeID (e.g. "tree A"), Location (linked to table c below).

(b) Visits: This keeps metadata for each visit, with information like the date of visit (for a location, which will be the same for all trees at that location), locations we covered during the visit.

(c) Locations: Basic information on each location, such as latitude, longitude, name of location (e.g. "Acme Acres"), etc.

(c) TreeData: This contains the actual measurements for individual trees. Fields may include: tree (linked to table a), height, diametre, visit (linked to table b). In the end there will be multiple entries any one tree here, with data from each visit.

Does any of this make sense as a first dab at database design? (I've only been thinking about databases for three days)

I would appreciate and humbly accept any pointers you can give me.

Thank you very much!

BTW, One reason I was considering a Visits table was because sometimes I might to compare the data between visits 3 and 4, or between visits 1 and 5, etc

Best Answer

I would rename TreeData to measurements.

Also, unless you really need the Visits table, I wouldn't use it. If designed properly your database would look something like this if you included the visits table:

tree db design

Using the visits table would complicate some of the queries that you're wanting answered. For example, to answer number one you might use something like this in sql server:

select
     t.treeid,
     max(m.height)-min(m.height) as Growth
from trees t
     inner join measurements m
          on t.treeid=m.treeid
     inner join visits v
          on m.visitid=t.visitid
where v.date>=dateadd(y,-2,getdate())
group by t.treeid

without the visits table you could rewrite that like this:

select
     m.treeid,
     max(m.height)-min(m.height) as Growth
from measurements m
where m.date>=dateadd(y,-2,getdate())
group by m.treeid