MySQL – Optimal indexing for a lookup table. HASH index, BTREE index or composite PK

btreeclustered-primary-keyindexMySQL

Let's say I have a very large lookup table that looks like this:

   CREATE TABLE `MyLookup` (
      `FKToTableA` bigint(20) NOT NULL,
      `FKToTableB` bigint(20) NOT NULL,
      `Count` bigint(20) NOT NULL,
      `Bytes` bigint(20) NOT NULL,
      `Packets` bigint (20) NOT NULL,
   ) ENGINE=InnoDB;

The first 2 attributes are FKs to Table A and B respectively. But doesn't need to be. the FKs can be dropped if it's more optimal not to have it. FKToTableA and FKToTableB is the candidate key for this table. Thus FKToTableA and FKToTableB could be a composite primary key if needed.

My question is what's the most optimal way of indexing this table? Storage size and insert times are not a concern.
Most of the time my queries will be joining on to MyLookup on both FK columns and SUM the Count, Bytes and Packets columns.

select
    a.something, b.something, SUM(c.Count), SUM(c.Bytes), SUM(b.Packets)
from
    A a 
    inner join MyLookup c on a.Id = c.FKToTableA
    inner join B b on b.Id = c.FKToTableB
where
    a.something = 1 and
    a.Time >= 'blah' and
    a.Time <= 'blah'
group by 
    a.something, b.something

I see three options.

1) Put a HASH index on FKToTableA and FKToTableB.

2) Put a composite BTREE index on all columns.

3) Put a Primary Key on FKToTableA and FKToTableB.

I'm kind of leaning towards 1. From what I gather HASH indexes are great at equality comparisons and that's all joins are is a big equality comparison is it not? Still has to go hit the table again to sum the other columns though which may be a bottle neck. Not sure.

Or maybe it's possible to have a hash index with Count, Bytes, Packets? Not sure how composite hash indexes works though.

Has anyone dealt with this type of thing before and can shed some knowledge and suggestions?

EDIT:
Here's my schema for Table A and B to make it a bit more clear

CREATE TABLE `B` (
  `Id` bigint(20) NOT NULL AUTO_INCREMENT,
  `CustomerIdent` int(11) NOT NULL,
  `MetaData1` varchar(256) NULL,
  `MetaData2` varchar(256) NULL,
  `MetaData3` varchar(256) NULL,
  `MetaData4` varchar(256) NULL,
  `MetaData5` varchar(256) NULL,
  `Time` datetime NOT NULL
  CONSTRAINT `PK_A` PRIMARY KEY (`Id` ASC)
) ENGINE=InnoDB;

CREATE TABLE `B` (
  `Id` bigint(20) NOT NULL AUTO_INCREMENT,
  `SourceIp` bigint(20) NULL,
  `DestinationIp` bigint(20) NULL,
  `SourcePort` int(11) NULL,
  `DestinationPort` int(11) NULL,
  CONSTRAINT `PK_B` PRIMARY KEY (`Id` ASC)
) ENGINE=InnoDB;

Basically the lookup table is in charge of telling me how many times an row in B happened for one row of A.

Best Answer

  • a: Satisfy the WHERE with INDEX(something, time)
  • MyLookup -- If the pair (FKToTableA, FKToTableB) is unique, then make that the PRIMARY KEY, and put the columns in that order so that your SELECT can quickly get into MyLookup.
  • Don't use BIGINT (8 bytes) unless you expect to exceed 4 billion, the limit for INT UNSIGNED, which takes only 4 bytes.
  • IP addresses -- for the old IPv4, there are convenient routines for converting to INT UNSIGNED. For the new IPv6, it won't fit into BIGINT. See 5.6.3.
  • Do those Metadata columns need utf8? Can they be combined into a TEXT column? And other questions.
  • Build and maintain a "Summary table" rather than scanning large chunks of the "Fact" table to get this "report".
  • Do the SUMs before JOINing to b.
  • Do you really need a many-to-many mapping table? Seems like this is 1:many.
  • More tips on many-to-many.