Mysql – Starting Search From a Specific Index

MySQL

I am not a pro in the database sector so pardon me if i am asking a basic question.

Scenario

I have a table in which all the recorded are stored chronologically.
Number of rows of that table is 1,445,248 (1.4 Million).

Desire

I want to optimize my query which looks like following: –

Select * from SalesInvoice where S_Date Between 20190401 And 20190501

Note

Yes, i need all the columns in the table and instead of typing out all the columns, i just used the all symbol (astrix).
S_Date is stored as numeric instead of DateTime (because it was developed a long time ago in this format)
The total time taken to execute this query is only 4 seconds but because of some calculations performed it is looping through ~80-90 time.

Question

I want to know, is there any way through which i can start the searching of rows from Nth position.

My Approach

Edit – Currently i am using SQL Server 2008 R2 but i will switch to MySQL that is why the query is written in SQL Server 2008 R2 format.

Step 1: Get first index of data by following query: –

Select Top 1 Id from SalesInvoice where S_Date Between 20190401 And
20190501 order by S_Date ASC

Step 2: Start Searching from the returned Id lets say 500,000

My Assumption if this technique exists

According to me, this technique might save table scanning time by starting the search from that ID which removes scanning of 500,000 rows.

BUT

I don't know how to start the lookup from a particular ID.

If anyone can provide me with a code snippet that will be very helpful

Best Answer

As for "all the recorded are stored chronologically", the only control you have over this is the PRIMARY KEY in InnoDB.

A simple rearrangement will provide the optimal access:

PRIMARY KEY(S_Date, id),  -- to order the data in the order desired
INDEX(id)   -- to keep AUTO_INCREMENT happy

(TOP 1 is not a MySQL construct; see LIMIT. But it is not needed with the above change.)

(The technique above is even better than INDEX(S_date).)

Explanation

In MySQL, the data rows are ordered by the PRIMARY KEY. Said another way: The PRIMARY KEY is "clustered" with the data. This implies that you you fetch the rows in PK order, you will be fetching the data in the most optimal order.

With the change I proposed, the data will be ordered by S_Date (with dups ordered by id). This is exactly the order you want.

To change the current table definition, do

ALTER TABLE SalesInvoice
    DROP PRIMARY KEY,
    ADD PRIMARY KEY(S_date, id),
    ADD index(id);

The auto_increment column must occur first in some index. This allows the 'next' value can be efficiently found after a restart.

Stop after

start the searching of rows from Nth position.

This performs such, but not efficiently:

ORDER BY ...
LIMIT 1000 OFFSET 20000

will skip (that is, read and discard) 20K rows, then deliver 1000.

It is more efficient to "remember where you left off". With PRIMARY KEY(id), it is straight-forward and efficient:

WHERE id > $leftoff
ORDER BY id

With a composite index it gets messier, but still possible. More discussion: http://mysql.rjweb.org/doc.php/pagination

Related Solutions

MySQL – SELECT on MyISAM Table on index – Does table size (rowcount) matter

Broadly speaking, on a MyISAM Table with a range scan, the process is:

Find the first Index result using the BTREE (inside the .MYI file) and access the row result (on the .MYD file) - Handler_read_key
Get the next result, using the index (and in the same order), until the value retrieved is larger than the one defined (multiple instances of Handler_read_next)

You can actually get this plan by observing that you get a range join type on EXPLAIN and on the Handler_* counters on SHOW SESSION STATUS.

Theoretically, the first step is O(log n) -where n is the number of records indexed (the table size)- while the second is O(m)- where m is the number of records returned. So, theoretically, a larger table will take more to return the records. Why so I say theorically? Because the O() notation can be deceitful if you do not have into account the constants. Indexes usually end up in memory, while rows (specially on MyISAM, which has not a dedicated buffer for data) can be on disk, so the difference in performance of both operations is large. Also, MyISAM has problems with large tables, so then number of levels tend not to be too large.

Let me show you an unrelated graph: Full table scan

In the above graph, the full table scan (blue line) should be flat, because all rows are examined, but it is not mainly because at that point, reading and returning 16M rows is more costly than returning only 1.

So the answer is- both operations take time, which one dominates depend on the actual value of m and n, plus the state of the database in terms of speed of hardware (memory, disk) and the state of the buffers (filesystem, key buffer). In conventional usage, an index scan of a single row is a very fast operation, but it depends comparing to what, and if you have into account extreme cases, like large tables where the BTREE index doesn't fit into memory.

Mysql – Faster query when joining data from multiple tables

Leading wildcard in LIKE is the main problem. Another problem is that the WHERE clause is spread across 3 tables.

Consider using a FULLTEXT index; this will make looking for "words" much faster. Alas, you may be stuck with LIKE in the 20-char part number.

Still the structure of the query will need to be changed...

SELECT  p.id, p.manufacturerId, m.code as manufacturerCode, p.materialNumber,
        p.partNumber, p.materialDescription, c.title as manufacturerTitle
    FROM  ( 
              ( SELECT  id
                    FROM  craft__parts
                    WHERE  partNumber like '%polymer%' 
              )
            UNION  DISTINCT 
              ( SELECT  id
                    FROM  craft__parts
                    WHERE  materialDescription like '%polymer%' 
              )
            UNION  DISTINCT 
              ( SELECT  p3.id
                    FROM  `craft__parts` AS p3
                    JOIN  `craft__manufacturers` AS m3 ON p3.manufacturerId = m3.id
                    JOIN  `craft_content` AS c3 ON m3.code = c3.field_manufacturerCode
                    WHERE  title like '%polymer%' 
              )
          ) AS p
    JOIN  `craft__manufacturers` AS m ON p.manufacturerId = m.id
    JOIN  `craft_content` AS c ON m.code = c.field_manufacturerCode
    ORDER BY  c.title asc

The goal there is to

Do each of the awful searches separately.
Combine the results (UNION DISTINCT ... p.id)
Then look up the rest of the fields

Step 1 still involves a full table scan (if using LIKE) or a quick FULLTEXT lookup (much better).

Steps 2 and should be relatively fast, assuming there were not too many matching rows.

The problem with the original query is all the bulky stuff hauled around while doing the JOINs.

Other notes...

A PRIMARY KEY is a KEY; do not redundantly say KEY(id).

Does m.code really need to be bigger than 255 characters? "Prefix" indexes are virtually useless, and may make the JOIN in the third UNION not run very fast.