Sql-server – How to write search query to retrevie the rows with biggest DateTime <= 2010102248 in SQL

cquerysql server

I am creating a table and now trying to create my first Search Query.

  1. I have created the table successfully like this:

     CREATE TABLE testTable
     (
       DateTime INT    NOT NULL, /*yyyyddhhmm, 2010102345*/
       FeatureNbr      SMALLINT  NOT NULL,
       Val FLOAT(53)   NOT NULL
     );
    
  2. I have successfully inserted values to the table like this:

     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102245, 0, 12.56789);
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102245, 1, 13.56789);
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102245, 2, 14.56789);
    
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102246, 0, 15.56789);
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102246, 1, 16.56789);
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102246, 2, 17.56789);
    
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102247, 0, 18.56789);
     INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102247, 1, 19.56789);
    
  3. Now I have a little problem and questions. I am trying to create a search query to retreive ALL FeatureNbrm,Val,DateTime where the biggest datetime is found <= 2010102248 for each of the three features. The lines that should be retrevied would then be the below 3 lines, – for ALL the three features which are: 0,1,2 (We don't know how many features it is. We just want ALL of them/rows with the biggest datetime)

2010102246, 2, 17.56789  
2010102247, 0, 18.56789  
2010102247, 1, 19.56789  

But I only retrieve those lines in this format for the features: 0 and 1

2, 2010102246, 17.56789 <-- is missing
0, 2010102247, 18.56789
1, 2010102247, 19.56789

The code I use is the below but I don't understand how I should write this correctly?

SELECT FeatureNbr, DateTime, Val
FROM
  testTable
WHERE
     testTable.DateTime=
      (
        SELECT
          MAX(DateTime)
        FROM
          testTable 
        WHERE
          DateTime <= 2010102248
      )

Best Answer

You need to make sure your subquery is correlated with the main query, so you are getting the max timestamp for each feature:

SELECT
  test.FeatureNbr,
  test.DateTime,
  test.Val
FROM
  testTable test
WHERE
  DateTime =
    (
      SELECT
        MAX(DateTime)
      FROM
        testTable
      WHERE
        FeatureNbr = test.FeatureNbr  --This is what you are missing
          AND DateTime <= '2020-10-06 12:55:00' --Or whatever
    )

See this DB Fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a57b5d632f53a973a6225718e2baea3a

A few other points:

  1. Always store date/time values as DATETIME2 in SQL Server. Otherwise you can end up with invalid date/time combinations and won't be able to use the native date functions without first converting.
  2. You need to make sure your primary key is defined - the right clustered index (in this example (FeatureNbr,Datetime)) is critical - it will allow the query to complete reading the table once and without sorts.
  3. This was addressed in my answer here, but I know you are starting out so it might not have been apparent due to the length.
  4. Should also be mentioned that with the proper clustered index, SQL Server will recognize the queries provided by nbk and Andy Mallon (which are identical since a CTE is just another way of writing the same thing) to be equivalent to the above query - they will produce the exact same query plan and return the exact same result. It's just more code.

To address your comment:

Yes I begin to understand it a bit better but its so new. I know C# for 10 years but this is alot of "nesting" like it feels. After WHERE there should be a condition. You put: WHERE DateTime = (...). Is this some kind of "temp table" that is happening here, where you later: FeatureNbr = test.FeatureNbr? I also wonder if it will be correct to put this condition against DateTime like this? WHERE FeatureNbr = test.FeatureNbr AND DayTime <= 2010102248 (I know the datetime format got wrong here but it will work for the example as an integer)

Making the transition from object oriented/traditional programming to database development is a hard one as you're used to specifying exactly HOW things are done. SQL is a declarative language, we specify what we want and allow the database engine to determine the optimal plan.

So the "nesting" here is just specifying that the Datetime must equal the max Datetime less than or equal to the specified value for each FeatureNbr. The query engine then looks at the table, the indexes, the column statistics, and decides the best way to fulfill that query. If you have a good data model, you rarely have to tune queries.

You will probably want to start to familiarize yourself with how to read query/execution plans and the specific operators. Then you can experiment with how the engine actually fulfills the requests, how your clustered index influences the plan, etc.

If you were to generate query plans for three of the answers (nbk's, Andy's, and my own) they would be identical with the right clustered index. The answer that uses ROW_NUMBER() forces a sort, which is instructive in its own way.

Additional comment:

If I understand, it doesn't matter in what order you write this(When you combine columns): CONSTRAINT PK_testTable PRIMARY KEY (FeatureNbr,DateTime) it could be written like this also: CONSTRAINT PK_testTable PRIMARY KEY (DateTime,FeatureNbr)

So this brings up a very important concept for the physical implementation of tables in SQL Server, which is the rows are stored (more or less) in the order according to the position of the rows specified during the creation of the clustered index.

So: (FeatureNbr,Datetime) sorts first by FeatureNbr (let's call this option 1) while (Datetime,FeatureNbr) sorts first by Datetime (option 2).

If we do a query that looks for one row and one row only (say, FeatureNbr = 2 AND Datetime = '2020-10-06 10:33:00'), both indexes will perform identically. But if we do a query that has to look across ranges of Datetime for a given FeatureNbr, option 1 would allow the clustered index to be utilized, option 2 would result in a table scan. This is because for option 1 the engine knows all values for a FeatureNbr will be on the same page(s) (more or less), but for option 2 the relevant rows could be stored anywhere in the table.

For this reason, a date or datetime column should be the last row in an index as a general rule (in databases there are always exceptions).