PostgreSQL – How to Select Records Between Two Dates

datejoin;postgresqltable

I have two tables:

1 with an Unique ID with the first time they visited (in date format)
and a historical addresses table with the same Unique ID, a start field (date field), an end field (date field) and the addresses fields.

What I like to do is create a table which get the addresses of the persons (Unique ID) based on the date where they lived at the time of the visit.

I'm able to select records with my own made up dates, but I would like to get them from the 'first time visit' field.

The SQL query with random dates:

SELECT * FROM historical_adresses
WHERE (startdate, enddate) OVERLAPS ('2013-05-15'::DATE, '2013-05-15'::DATE);

Who can help me do this in PostgresSQL?

Best Answer

If I understand your problem correctly, you have to join the other table to historical_adresses based on the unique ID and that the visit time falls between startdate and enddate.

If this is so, the query could look like:

SELECT [address_elements]
FROM historical_adresses AS ha
JOIN visits AS v 
    ON ha.user_id = v.user_id
    AND v.visit_time BETWEEN ha.startdate AND ha.enddate
WHERE [find the visit somehow that you are interested in];

Note: you usually don't need all columns of a table (and definitely not from a join), so use the actual column names rather than *.

Query:

SET @startDate := CAST('2016-02-01' as date);
SET @finalDate := CAST('2016-02-04' as date);

SELECT house_id
    , COUNT(*)
    , DATEDIFF(@startDate, @finalDate)
FROM @tests AS T1
WHERE `date` > @startDate AND `date` <= @finalDate
GROUP BY house_id;

Query 1 in SQL Fiddle.

This query GROUP BY house_id and output the COUNT of days between @startDate and @finalData:

house_id    COUNT(*)    DATEDIFF(@startDate, @finalDate)
1           3                       3
2           2                       3
3           3                       3

From this query, it is obvious that only rows where COUNT(*) = DATEDIFF(@startDate, @finalDate) should be returned.

If (house_id, date) is not unique, COUNT(*) must be replaced by COUNT(DISTINCT `date`).

Query with all days

If you only want house_id where COUNT(*) = DATEDIFF(day, @startDate, @finalDate), the query must use the HAVING clause:

SELECT house_id
    , COUNT(*)
FROM @tests AS T1
WHERE `date` > @startDate AND `date` <= @finalDate
GROUP BY house_id
HAVING COUNT(*) = DATEDIFF(@finalDate, @startDate)

Query 2 in SQL Fiddle.

Output:

house_id    COUNT(*)
1           3
3           3

Query where `status = X`

Status value can be checked in the WHERE clause:

SELECT house_id
    , COUNT(*)
FROM @tests AS T1
WHERE `date` > @startDate AND `date` <= @finalDate
    AND status = '1'
GROUP BY house_id
HAVING COUNT(*) = DATEDIFF(@finalDate, @startDate)

Query 3 in SQL Fiddle.

Output:

house_id    COUNT(*)
1           3

Sql-server – convert date format for this value ‘19920000’

The date type can't store partial dates, like only year, or only day and month.

In one project I had a need to store incomplete dates and I used three separate nullable columns for day (tinyint), month (tinyint) and year (smallint). I don't see in what cases you'd need to use varchar instead of integer types.

In my case it was information about customer birth dates and quite often salespeople didn't know the year, but knew month or day. With this approach it is very flexible and allows the user to enter any bit of information he has. It was also important to know that certain part of the date is missing, such as "we don't know the exact birth date, but we know that it is in June".

Obviously, there were a bunch of custom checks in the application (not at the database level) trying to prevent entering things like 31st of June. In that particular project it was OK to perform the checks in the application. Most likely it would have been possible to implement them in T-SQL as well. In general, it is better to have constraints implemented at the database level, because applications come and go, but data remains.

It was more difficult to use these incomplete dates in queries as well. Different queries/reports required different approaches to dealing with the missing bits.

30K rows is not much at all and performance should not be a problem. So, I'd focus on smart parsing and validation of your Excel data and storing all information that you can extract.

Best Answer

Related Solutions

MySQL – How to Search Between Dates Where All Dates Appear

Query:

Query with all days

Output:

Query where status = X

Output:

Sql-server – convert date format for this value ‘19920000’

Related Question

Query where `status = X`