Sql-server – SQL query help: Calculating days between observations

group bysql serversql-server-2012where

I'm extremely new to SQL, in fact, I just really started playing around with it. So I'm not exactly sure how to search for appropriate terms related to the question I have. I'll do my best.

CONTEXT:

I have a table that I'd like to query using SQL Server. That table has behavioral data for hundreds of thousands of animals over multiple days (I know! Lots of animals! Rats, actually, and it is data that has been compiled over many many years by many many people). Here are the columns:

  1. AnimalID (an identifying ID unique to that animal)
  2. DateID (Date of observation– for example, 20151111)
  3. Behavior1 (# of times observed eating on that DateID– could be 0, max is 337!)
  4. Behavior2 (# of times observed grooming on that DateID– could be 0, max is 36)
  5. Behavior3 (# of times observed mating/attempted mating on that DateID– could be 0, max is 77!)

There are other columns like "gender," "AgeInDays," and several more behavior types. I don't think it is worth listing them all out.

I'd like to be able to use WHERE and GROUP BY in a query to summarize this dataset.

So far I have been able to write queries that give me the total number of days that each animal does a behavior or the total number of times that an animal has done a behavior across all records for each animal. For example, I've done this by making WHERE [Behavior1] > 1 and GROUP BY [AnimalID].

This is actually pretty cool! And I'm stunned at how quickly these queries run! But I'm stuck on a different type of query.

QUESTION:

I want to write a query that will tell me the number of days (DateID) between the first date an animal (each unique AnimalID) tried to mate (Behavior3) and the second date the same animal tried to mate. In the same output, I'd like to have the number of days between the SECOND date they tried and the THIRD date they tried. All while having WHERE [Behavior1] > 1 and GROUP BY [AnimalID].

Ideally the output would have columns like this:

  1. AnimalID

  2. NumDays1 (number of days between first and second attempts for that AnimalID to mate)

  3. NumDays2 (number of days between the second and third attempts for that AnimalID to mate)

Is this even possible to do in a single query? I know I need to use DATEDIFF… Any help would be greatly appreciated!

Apologies if I've messed up the way I was supposed to ask this question on this forum.

Thanks!

Best Answer

I think in this case, you best best since you are on SQL 2012 is to use Window Functions and specifically the LEAD function.

Here is an example to try and match your data.

Say I created a table with the columns you have here in the following way and then insert rows into that:

CREATE TABLE #AnimalBehavior
(
    ID int identity(1,1) primary key clustered
    ,AnimalID int
    ,DateID int
    ,Behavior1 int
    ,Behavior2 int
    ,Behavior3 int
);
GO


insert into #AnimalBehavior (AnimalID, DateID, Behavior1, Behavior2, Behavior3)
Values (1, 20150101, 0, 1, 1)
    ,(1, 20150201, 0, 1, 1)
    ,(1, 20150301, 0, 1, 1)
    ,(1, 20150401, 0, 1, 1)
    ,(2, 20150101, 0, 1, 1)
    ,(2, 20150301, 0, 1, 1)
    ,(2, 20150501, 0, 1, 1);
GO

From here I will want to only look at the times where your behavior3 is greater than 0 (so that we know it occurred during that period) and take the difference between the LEAD of the 2nd compared to the initial row, and the LEAD of the 3rd compared to the LEAD of the 2nd. You can do that in this way:

With ab
as
(
Select AnimalID
, DateID
, LEAD(DateID, 1, 0) over (Partition By AnimalID order by DateID) DateID_1
, Lead(DateID, 2, 0) over (Partition By AnimalID order by DateID) DateID_2
, row_number() over (Partition By AnimalID order by DateID) as rown
from #AnimalBehavior 
where Behavior3 > 0
)
Select AnimalID, Convert(date, Convert(varchar(8), DateID))
, DateDiff(dd,  Convert(date, Convert(varchar(8), DateID)),  Convert(date, Convert(varchar(8), DateID_1)))
, DateDiff(dd,  Convert(date, Convert(varchar(8), DateID_1)),  Convert(date, Convert(varchar(8), DateID_2)))
from AB
where rown = 1;

I believe this covers what you are looking for. But feel free to ask questions if not.