Mysql – Database architecture for interactions between users (8B rows, 1TB of data)

Architecturedatabase-designMySQLnosql

I'm saving an interactions table between two users. I have two interaction types, and each interaction can happen more than once. The easiest reference would be Facebook Likes and comments on posts: User1 did two likes, and 1 comment on User2.

I would also like basic query abilities, like all users that interacted with User1, and users User1 has interacted with sorted by how many interactions.

I'm currently saving this in MySQL, and things have been going well, but as the table is getting big (7.5B rows, 1TB total disk space divided equally between data and indexes), INSERTS have become very slow. Can take up to a minute for 1k rows.

My Current Table Structure:

 CREATE TABLE `interactions` (
  `user_id_to` bigint(11) unsigned NOT NULL,
  `user_id_from` bigint(11) unsigned NOT NULL,
  `type2counter` tinyint(11) unsigned NOT NULL DEFAULT '0',
  `counter` smallint(11) unsigned NOT NULL,
  `updated` date NOT NULL,
  PRIMARY KEY (`user_id_to`,`user_id_from`),
  KEY `uid_from` (`user_id_from`),
  KEY `uid_to_counter` (`user_id_to`,`counter`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Edit: Insert look like this:

INSERT INTO interactions
     (user_id_to, user_id_from, counter, updated)
   VALUES 
     ('2266931634','3180832729','3',NOW() ), 
     ('2266931634','3133897242','1',NOW() ), 
     ('2266931634','3207247957','1',NOW() ),  
     ('2266931634','3520390476','1',NOW() ), 
     ('2266931634','3631954079','1',NOW() ), 
     ('2266931634','3687475949','1',NOW() ), 
     ('2266931634','3627629761','1',NOW() ), 
     ('2266931634','3779990751','1',NOW() ), 
     ('2266931634','4071753134','1',NOW() ), 
     ('2266931634','4290900946','3',NOW() ) 
    ON DUPLICATE KEY UPDATE
             counter=counter+VALUES(counter),
             updated=NOW()

MySQL version: 5.6.19

Best Answer

Something wrong with that! There is no column counter??

But, assuming you meant one of the existing counters...

  • Having the counter in an index, and updating the counter frequently --> double the work. Once to update the table, once to update the index.

  • Since the first part of the PRIMARY KEY is user_id_to, you have a very good way of locating the counter given the user_id_to. So, DROP that index. This will speed things up considerably.

Something else to check... innodb_buffer_pool_size should be about 70% of RAM. What are the value of those?

PARTITIONing is very unlikely to help. Given a good index, partitioning is no faster because it is essentially turns a 1-step process (look up in the index) into a 2-step process: first find the partition ("pruning"), then use the index in that partition.

If possible, use either LOAD DATA or "batch" INSERTs.

What do the INSERTs look like? Perhaps INSERT ... ON DUPLICATE KEY UPDATE? For batching, that might be best done by creating a tmp table, then using IODKU with a SELECT on the tmp table. More on high speed ingestion.

Some things look fishy, please provide specifics on the inserts, updates, and selects.