If you are not able to fine tune the filesystem (for instance, by using a smaller block size), and you really must use a database, I'd suggest the following readings:
The first will explain the data structure most commonly used by indexes, the B-Tree. The second explains how MySQL uses the B-Tree. The third will tell you about the command EXPLAIN SELECT ...
, which is how MySQL describes the query plan (it will tell you which (if any) index it is using, if it is doing table scans -- which you must avoid at all costs).
To create an optimized index, you should first think about the structure of the query (or queries) you will need. For example, it might be something like: select content from files where firstParameter = XXX and secondParameter like 'xxx%'
.
You should analyze the cardinality of each column (ie, how many different values can the column have).
You chose the columns with highest cardinality to be the first in an index, leaving those with lower cardinality to the end. Example: suppose you have 2M rows and firstParameter
is a number between 1 and 1M, randomly distributed, and secondParameter
is the full name of the owner of the file. In such a situation, you want the index (firstParameter, secondParameter)
, in this very order, since the clause firstParameter = XXX
will leave you, in average, with only 2 rows. The cardinality of secondParameter
, on the other hand, is much lower: there are much less then 1M possibilities for people's names. Therefore, if your index is (secondParameter, firstParameter)
, the query where firstParameter = 1 and secondParameter like 'bruno%'
will first look for every row in which secondParameter
begins with bruno
(which will be likely tens or hundreds of thousands), and only then it will look for the other condition.
Also, note that the index is used from left to right. That is, if you have 3 columns, A
, B
and C
and you index (A, B, C)
, the index will be mostly useless in a query such as where A = 1 and C = 2
. It will likely be used to find rows matching A = 1
, but after that every row will be checked for C = 2
. If most of your queries are like that one (and some might specify B as well), than your index should be (A, C, B)
.
Finally, note that like 'xxx%'
can use an index, while like '%xxx'
(or like '%xxx%'
) cannot. This is, again, because the index is read from left to right. To match xxx%
, it knows where to start looking; to match %xxx
it must check every single row.
All that said about indexes, I'd strongly suggest you to rework your criteria so that you have something more structured. As you said, you could try to precompute something.
There are other considerations, such as the size of the content. If you can make it fit in under 8KB (which is like 3000 characters if you use UTF-8), then InnoDB will store the data in the same page as the primary key; otherwise, it will store the data elsewhere. If you query by primary key, in the first case you have a single read operation; if you query by another index, in the second case you have 3 read operations: one to find the primary key of matching row, one to find the row by primary key (to read the address of the data) and one to read the data.
Oh, check the amount of RAM of your server. Ideally your data (or at the very least your indexes) should fit in the RAM.
By considering all these points, you should have no problem at all: I don't know the hardware of your server, or its load (since you said it is shared), but 800k rows is close to nothing if you fine tune your indexes; I'm very far from expert and, by doing all the above things, I work daily with (very optimized) tables with 10M, some 100M rows, and the queries are ultra-fast.
I hope that helps. Once you have your table(s), you could ask another question showing the create table
statement and describing a bit about your data (sizes, cardinalities, etc) and the select queries you will use, so someone could help you to create an optimized index.
Best Answer
Yes, the "switch or option" that would allow you to do this is the one your host doesn't want to change. If there were a way for a user to simply work around resource limits, there wouldn't be much point in resource limits.
It's not obvious from the documentation, but the source code makes it fairly clear that this is a limit on the number of queries you can run, per hour, that change almost anything at all about the server. The use of the word "update" is unfortunate, because this resource limit impacts any query that does something like...
... as well as any valid pairing of these:
If you take a look inside the dump file, it should become readily apparent that you're doing quite a lot of all those things.
But aside from the dump, for a minute, what sort of tiny little service will you be able to host on a server that only lets you execute 36,000/hour ... 600/minute ... 10/sec queries that
INSERT
/UPDATE
/DELETE
max, averaged over an hour? That's tiny... so you have either purchased something really small and inexpensive... or your host has already oversubscribed the server and is trying to mitigate that problem... or your data has already caused more loading than the host anticipated, so you earned a special extra "feature."The most plausible workaround, to me, seems like it will be to pace your restore by imposing a small delay between each query that gets issued to the server, to stay within your time limit.
You can see this in action without actually applying it to the server by just trying this:
You should see the dump file coming at you, one line at a time, use
control-C
to break out. The value 100000 is 100000 microseconds, or 1/10th of one second, which is approximately your limit. The server won't be executing the queries instantaneously, as it will require some number of milliseconds to process each one, so setting the value at 100000 will probably work. Also, there is some white space inmysqldump
files, and statements likeCREATE TABLE
will be broken out on multiple lines, even though they'll only count as a single query, which will stretch out the time a bit, also.Remember, though, that resource limits are per-user, so you can't be doing anything else on the server at the same time with the same account, or the two sessions, together, will combine to push you toward that limit.
If deliberately slowing the pace of your import seems counterintuitive, that's because it is. Most of us try to get our imports to load faster, not slower, so you'll need to consider also whether this host's limits are going to work for you or whether you need a different plan.
You can get an idea of how many SQL statements are in the dump file with this:
This will return a count of lines, which will be close to the number of statements... slightly higher, again, because of how
mysqldump
files are formatted. This number divided by 36000 is approximately the fastest possible loading time, in hours, that you could achieve on this file, with this host, with these restrictions in place. For reference, I ran this against a randomly-selectedmysqldump
file of 72GB, and the count is 475,309... so, we could estimate a restore time of maybe 13 hours for that particular dump file under these conditions. This number will, of course, vary significantly, so check it against your data source and see what you find.