In my answer to your previous question, I mentioned how the size of an individual table is 2TB when you use innodb_file_per_table. There may be a way to surpass the table limit by doing just the the opposite, leaving innodb_file_per_table disabled.
With innodb_file_per_table disabled, there are three ways you can go about this:
TECHNIQUE #1 : Use multiple system tablespaces in ext3
You could chain InnoDB tablespaces 2TB at a time. Start off with this
innodb_data_file_path=ibdata1:10M:autoextend:max:2048G
When ibdata1 is about to hit 2TB, add another ibdata file
innodb_data_file_path=ibdata1:2048G:ibdata2:10M:autoextend:max:2048G
When ibdata2 is about to hit 2TB, add another ibdata file
innodb_data_file_path=ibdata:2048G:ibdata2:2048G:ibdata3:10M:autoextend:max:2048G
and so on...
UPDATE 2013-06-01 20:45 EDT
I just discovered something with reference to setting the max filesize for a system tablespace. Here is the situation:
Client has the following InnoDB files
[root@l*****]# ls -l ibd*
-rw-rw---- 1 s-em7-mysql s-em7-mysql 362807296 Jun 2 00:15 ibdata1
-rw-rw---- 1 s-em7-mysql s-em7-mysql 2196875759616 Jun 2 00:15 ibdata2
and has this setting
innodb_data_file_path=ibdata1:346M;ibdata2:500M:autoextend:max:10240000M
MySQL was reporting table full errors.
I tried creating a new file ibdata3
with
innodb_data_file_path=ibdata1:346M;ibdata2:2048G;ibdata3:10M:autoextend
That value did not work
I tried using new sizes for the max value
After trying for 15 min with different combinations, none of these values worked. Looks like mysqld was never going to startup again. I decided to try the exact number of bytes for the filesize of ibdata2:
innodb_data_file_path=ibdata1:346M;ibdata2:2196875759616;ibdata3:10M:autoextend
When I started mysql, LO AND BEHOLD, MySQL was up.
ibdata3
was created at 10M
ibdata3
was soon growing in 8M chunks
- Client was happy once again
Moral of the story : At times, you may need to use the exact filesize for a system tablespace in innodb_data_file_path
TECHNIQUE #2 : RAW DISK PARTITION
According to MySQL 5.0 Certification Study Guide, Page 428
Any raw partitions in the configuration must exist but must have the modifier newraw
listed after the size of the file specification. newraw
tells InnoDB to initialize the partition when the server starts up. New partitions are treated as read-only after initialization. After InnoDB initializes the tablespace, stop the server change newraw
to raw
in the partition specfication, and restart the server. For example, to use a 10GB Unix partition named /dev/hdc6, begin with a configuration like this:
[mysqld]
innodb_data_home_dir=
innodb_data_file_path=/dev/hdc6:10Gnewraw
Start the server and let InnoDB initialize the tablespace. Then stop the server and change the configuration from newraw
to raw
:
[mysqld]
innodb_data_home_dir=
innodb_data_file_path=/dev/hdc6:10Graw
After changing the configuration, restart the server.
Just create a RAID10 Disk with whatever size you need. Just don't put any filesystem on it.
TECHNIQUE #3 : Use ext4 with TECHNIQUE #1
You could chain InnoDB tablespaces 16TB at a time. Start off with this
innodb_data_file_path=ibdata1:10M:autoextend:max:16384G
When ibdata1 is about to hit 16TB, add another ibdata file
innodb_data_file_path=ibdata1:16384G:ibdata2:10M:autoextend:max:16384G
When ibdata2 is about to hit 16TB, add another ibdata file
innodb_data_file_path=ibdata:16384G:ibdata2:16384G:ibdata3:10M:autoextend:max:16384G
And so on...
Best Answer
Recommendation
"all the records belonging to one of the categories" --> Build a many:many mapping between records and categories. To have that table optimal, follow the tips here . Be sure to use a 2-byte
SMALLINT UNSIGNED
for the category_id. (You should not trust that you won't go beyond 255.)Then the query is something like
This approach will do an index scan of a portion of
rc
to find therecord_ids
, then randomly reach intor
via the PK. Yes, there will be "hundreds of millions of rows" inrc
, but it will scan only a small part of it, because all the entries forcategory_id=123
will be 'clustered' together (either in the data or the secondary index).Critique of alternatives
If you have a column with a commalist of categories, then every row will need to be fetched. That is a full table scan ("tens of millions of rows" = a few gigabytes) of
Records
. (BTW, don't useLIKE %123%
, useFIND_IN_SET()
-- not faster, but less messy to handle edge cases.)"a separate table for each category" -- This is a very common question on this (and other) forum. The answer is "NEVER!".
"separate column for each of the categories" -- Gag me with a spoon! The table will be fatter and, again, you will need a full table scan.
"200+ indexes" -- The hard limit is 64. The practical limit is more like 5.
Building on Akina's and Acidon's comments: With 8.0, you can use
TINYBLOB
orBINARY(..)
and set bits. This is about as clumsy as having multipleSETs
. Before 8.0, bitwise ORs (etc) were limited to 64 bits; nowBLOBs
work. SET/BLOB/BINARY/BIGINT save the most space. But your entire dataset is only tens of GB, so I don't see 'size' as a problem. Even the many:many table I recommend will be only several extra GB.