I'm trying to select distinct rows without any vendor specific syntax. My table layout is as follows:
+--------------------------------------+---------+--------------------------+----------------------------+------------+---------------+---------------+-------------+-----------+--------------------------------------+------------+---------+
| id | version | jobAsJson | jobSignature | state | createdAt | updatedAt | scheduledAt | serverTag | mutex | awaitingOn | isBatch |
+--------------------------------------+---------+--------------------------+----------------------------+------------+---------------+---------------+-------------+-----------+--------------------------------------+------------+---------+
| 6b56ef47-46e2-4905-9b4e-d8923306f98d | 0 | {some json value I want} | System.out.println(String) | PROCESSING | 1594385831872 | 1594385831872 | null | DEFAULT | resource-a | null | 0 |
| c0514d03-5bd1-494f-9978-dfbc24f06d67 | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385832874 | 1594385832874 | null | DEFAULT | resource-a | null | 0 |
| 6b33ef73-8aa7-4d83-a171-b30da6a95c5f | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385833874 | 1594385833874 | null | DEFAULT | resource-a | null | 0 |
| d4d75118-57cc-476c-a86d-a6b73fc30d37 | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385834874 | 1594385834874 | null | DEFAULT | resource-a | null | 0 |
| 2cfbf072-5233-496d-9236-28e66cddf054 | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385835874 | 1594385835874 | null | DEFAULT | resource-b | null | 0 |
| d8f2d419-f394-4d4b-b375-4f1611b28170 | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385836875 | 1594385836875 | null | DEFAULT | resource-b | null | 0 |
| 3cdf7878-0716-4928-a059-2a74b4172c74 | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385837875 | 1594385837875 | null | DEFAULT | resource-c | null | 0 |
| d0bfffac-9d5b-4c7e-a82b-ca0f93d1a1da | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385838875 | 1594385838875 | null | DEFAULT | 79333f04-ab41-41dc-b004-8b2d74055d38 | null | 0 |
| c1d568f3-f1e1-4d4d-9ca5-21129bf6d066 | 0 | {some json value I want} | System.out.println(String) | ENQUEUED | 1594385839875 | 1594385839875 | null | DEFAULT | 1b8b941c-6bdc-45cf-93b5-1856f9de404a | null | 0 |
+--------------------------------------+---------+--------------------------+----------------------------+------------+---------------+---------------+-------------+-----------+--------------------------------------+------------+---------+
I'm trying to select all rows where state = 'ENQUEUED', servertag is in ('DEFAULT'), the rows should be ordered by createdAt AND the rows should be unique on the field mutex.
The goal is to only have 1 row with state = 'PROCESSING' for each mutex. The selected rows with STATE='ENQUEUED' will be updated to 'PROCESSING'. In the example here, the expected results are:
- 2cfbf072-5233-496d-9236-28e66cddf054 -> first job with mutex = resource-b
- 3cdf7878-0716-4928-a059-2a74b4172c74 -> first job with mutex = resource-c
- d0bfffac-9d5b-4c7e-a82b-ca0f93d1a1da -> no mutex thus unique value to make query easier
- c1d568f3-f1e1-4d4d-9ca5-21129bf6d066 -> no mutex thus unique value to make query easier
=> no job with mutex resource-a is returned as it is already being used by the first job.
The query I have now works for SQLite but not for other databases (mySQL, oracle, DB2) and is as follows:
select
j.jobAsJson
from
(select id
from jobrunr_jobs
where state = 'ENQUEUED'
AND servertag in ('DEFAULT')
and mutex NOT IN (select mutex from jobrunr_jobs where state = 'PROCESSING')
GROUP BY mutex order by createdat) r
INNER JOIN
jobrunr_jobs j ON r.id = j.id
I would like to have a query that is database vendor agnostic.
Postgres and SQL Server complain about the fact that a GROUP BY
is used and id
is not part of it.
Best Answer
Errors?
First off, I suspect a subtle error in your query:
You limit results to mutex with servertag 'DEFAULT', but the restriction is not repeated in the
NOT IN
subquery. So you exclude mutex that have a row with state 'PROCESSING' and any servertag. May be intentional, but it smells like an error. Even if nothing breaks (because of mutually exclusive sets of mutex values), it is probably still bad for performance.Also,
mutex NOT IN (<subselect>)
is a notoriously treacherous construct. Ifmutex
can be NULL, the query will break unexpectedly. In Postgres, this construct is generally discouraged. I assume the same is true for most RDBMS.Query
Basically, you want the latest row per mutex with
state = 'ENQUEUED'
where no row withstate = 'PROCESSING'
exists. I assume you want to restrict both conditions to rows withservertag = 'DEFAULT'
:EXISTS
is very basic standard SQL, and work in every halfway decent RDBMS - unlike CTEs and window functions, which are later additions and not supported everywhere.Also removes the need for the dubious
NOT IN
.And it should be fast, especially with index support.
Aside 1: to be "DB-agnostic" (which is never fully possible) don't use mixed-case identifiers, which are treated differently across various RDBMS.
Aside 2: Assuming you then want to mark identified rows as
'PROCESSING'
. Then you'll want to return more than justjobAsJson
. OrUPDATE
identified rows right away. See: