Assuming that you range partition your table by the (auto incrementing) sequence and you limit your query by another value (day number), you might see, that a lot of partitions will be touched when you use:
EXPLAIN PARTITIONS SELECT * FROM table_name WHERE day_number=X ORDER BY...
The reason for that is, that there is no linkage between the partitioning attribute and the data filter. Since MySQL uses a different index for each partition, it needs to open all partitions to find all data matching day_number.
If you know the partition name, maybe by querying INFORMATION_SCHEMA.PARTITIONS before the real select, and you use MySQL-5.6+, you can use partition selection like
SELECT * FROM table_name PARTITION(partition_name) WHERE day_number=X ORDER BY...
But this is just chitchat based on a lot of assumtions. To give you a real answer, you need to post the query plan and more information about table structure - like Phil said before.
Fulltext isn't going to help without refactoring to use the full text functions ( CONTAINS, FREETEXT or their table equivalents ). It also doesn't really work with leading wildcard. Hacks are available, but basically you're going to struggle to write a semantically equivalent query for fulltext. For the future consider redesigning for fulltext which has stemming ( run, runner, running ) and thesaurus ( jogger ) which could serve your searches much better than two wildcards.
SSD is unlikely to help you unless you are memory bound. Your tables (at only 500k records) are probably in-memory most of the time. Can you confirm the size of the dJobs table, and server RAM?
Enterprise Edition could help where the limitation of 64GB RAM / lesser of 4 sockets or 16 cores goes up to 8, but you're going to need a really powerful box to notice a difference. For example, the 4 really means you could have something like 4 quad-core processors totalling 16 cores, with HT enabled, you're already at 32 logical processors. The general recommended server maxdop for this type of OLTP machine would be 8 anyway. I think this unlikely to benefit because your query has more fundamental problems but you never know.
Non-clustered indexes (particularly on dJobs) are unlikely to help because the query has so many columns from this table in the SELECT and many criteria in the WHERE clause. A non-clustered would have to be so wide to cover it would be practically a duplicate of the clustered index, therefore overly expensive to maintain. As the query sorts by jobID DESC, I considered a descending index but haven't trialled this.
Partitioning, (Enterprise only) is really a great feature, but again is unlikely to help you. I did a quick investigation of partitioning on dbo.dJobs.jobJobStatus column, eg I imagine you only have a small percentage of Jobs 'active' at any one time, eg a few hundred, even a few thousand from the 500,000 records. Partition elimination would probably be cancelled out by the OR OR OR approach. Parallel scans of multiple partitions are also an Enterprise feature:
This would probably work:
SELECT TOP 20 *
FROM dJobs
LEFT JOIN dClients on cltClientID = jobClientId
LEFT JOIN dUsers on regUserId = jobCoordinator
LEFT JOIN dJobStatus ON jbsID = jobJobStatus
WHERE
(
jobjobStatus IN ( SELECT jbsid FROM djobstatusgroupmapping WHERE jsgid = 0 )
)
ORDER BY jobID DESC
This probably won't work:
SELECT TOP 20 *
FROM dJobs
LEFT JOIN dClients on cltClientID = jobClientId
LEFT JOIN dUsers on regUserId = jobCoordinator
LEFT JOIN dJobStatus ON jbsID = jobJobStatus
WHERE
(
jobjobStatus IN ( SELECT jbsid FROM djobstatusgroupmapping WHERE jsgid = 0 )
OR ( 0=0 ) OR ( 0=0 ) --<< this 'OR always true' means 'get the whole table'
)
ORDER BY jobID DESC
This leads me into the query. The OR OR OR approach basically means 'always get the whole table'. The TOP 20 masks this design problem. The TOP also probably pushed the plan towards Nested Loops which Jon suggested was suspect. What also stands out to me about this nightmareish "scan all columns" constructed query is that you bascially have two copies of the same query (and therefore tables), one to count, one for the resultset. This might be more efficient if the data went into an intermediate table and the count was done from there for example.
Finally, this brings me to the only only thing that would actually help you (without a large-scale refactor of the code): data deletion or archiving. As mentioned, I imagine you only have a small percentage of Jobs 'active' at any one time. Carve off the 'inactive' ones into a different table. Create a view over the top of the two tables for reporting. Set up a nightly job to copy out the old records.
Having only a few thousand active jobs in your main table will most likely transform your query performance.
Some recommended reading:
Erland Sommarskog's article on these "search all columns" queries
Dynamic Search Conditions in T-SQL
http://www.sommarskog.se/dyn-search-2008.html
Querying Multiple Columns (Full-Text Search)
http://technet.microsoft.com/en-us/library/ms142488(v=sql.105).aspx
I hope that helps!
Best Answer
Simple.
Create proper indices (
order(ip)
andip(ip_from,ip_to)
).Complex.
Create a field of LINESTRING type from
ip_from
toip_to
. Create spatial index (RTree) by it. UseMBRContains
for filtering.From the other side - the ranges
(ip_from,ip_to)
must be NON-overlapped.If it is true, then you'll obtain the result with simple
Indices
ip(ip_from)
andorder(ip)
(or even covering indexip(ip_from, country_code)
) is enough for this query.