I am creating a table and now trying to create my first Search Query.
-
I have created the table successfully like this:
CREATE TABLE testTable ( DateTime INT NOT NULL, /*yyyyddhhmm, 2010102345*/ FeatureNbr SMALLINT NOT NULL, Val FLOAT(53) NOT NULL );
-
I have successfully inserted values to the table like this:
INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102245, 0, 12.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102245, 1, 13.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102245, 2, 14.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102246, 0, 15.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102246, 1, 16.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102246, 2, 17.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102247, 0, 18.56789); INSERT INTO testTable(DateTime, FeatureNbr, Val) VALUES(2010102247, 1, 19.56789);
-
Now I have a little problem and questions. I am trying to create a search query to retreive
ALL
FeatureNbrm,Val,DateTime
where thebiggest datetime
is found<= 2010102248
for each of the three features. The lines that should be retrevied would then be the below 3 lines, – for ALL the three features which are: 0,1,2 (We don't know how many features it is. We just want ALL of them/rows with the biggest datetime)
2010102246, 2, 17.56789
2010102247, 0, 18.56789
2010102247, 1, 19.56789
But I only retrieve those lines in this format for the features: 0 and 1
2, 2010102246, 17.56789 <-- is missing
0, 2010102247, 18.56789
1, 2010102247, 19.56789
The code I use is the below but I don't understand how I should write this correctly?
SELECT FeatureNbr, DateTime, Val
FROM
testTable
WHERE
testTable.DateTime=
(
SELECT
MAX(DateTime)
FROM
testTable
WHERE
DateTime <= 2010102248
)
Best Answer
You need to make sure your subquery is correlated with the main query, so you are getting the max timestamp for each feature:
See this DB Fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a57b5d632f53a973a6225718e2baea3a
A few other points:
DATETIME2
in SQL Server. Otherwise you can end up with invalid date/time combinations and won't be able to use the native date functions without first converting.(FeatureNbr,Datetime)
) is critical - it will allow the query to complete reading the table once and without sorts.To address your comment:
Making the transition from object oriented/traditional programming to database development is a hard one as you're used to specifying exactly HOW things are done. SQL is a declarative language, we specify what we want and allow the database engine to determine the optimal plan.
So the "nesting" here is just specifying that the
Datetime
must equal the maxDatetime
less than or equal to the specified value for eachFeatureNbr
. The query engine then looks at the table, the indexes, the column statistics, and decides the best way to fulfill that query. If you have a good data model, you rarely have to tune queries.You will probably want to start to familiarize yourself with how to read query/execution plans and the specific operators. Then you can experiment with how the engine actually fulfills the requests, how your clustered index influences the plan, etc.
If you were to generate query plans for three of the answers (nbk's, Andy's, and my own) they would be identical with the right clustered index. The answer that uses
ROW_NUMBER()
forces a sort, which is instructive in its own way.Additional comment:
So this brings up a very important concept for the physical implementation of tables in SQL Server, which is the rows are stored (more or less) in the order according to the position of the rows specified during the creation of the clustered index.
So:
(FeatureNbr,Datetime)
sorts first byFeatureNbr
(let's call this option 1) while(Datetime,FeatureNbr)
sorts first byDatetime
(option 2).If we do a query that looks for one row and one row only (say,
FeatureNbr = 2 AND Datetime = '2020-10-06 10:33:00'
), both indexes will perform identically. But if we do a query that has to look across ranges ofDatetime
for a givenFeatureNbr
, option 1 would allow the clustered index to be utilized, option 2 would result in a table scan. This is because for option 1 the engine knows all values for aFeatureNbr
will be on the same page(s) (more or less), but for option 2 the relevant rows could be stored anywhere in the table.For this reason, a date or datetime column should be the last row in an index as a general rule (in databases there are always exceptions).