Mysql – Indexes and Optimizations for Logging System

database-tuningindexindex-tuningMySQLperformance

I'm designing tables for a new log system and am having trouble figuring out what indexes I'll need (I'm a developer, not a DBA), and am also open to any suggestions/criticisms/etc. for making the tables as "fast" as possible (from my app's point of view).

Background info: the DB itself is a MySQL instance hosted on AWS. It's a shared server and I don't have tuning privileges beyond basic table setup, indexes, key constraints, etc. This is a logging system, so very write-intensive. Although I will write scripts to dump certain log records (errors, etc.), there won't be much reading going on.

The logging system will consist of two tables, log_entry which is the log record itself, and log_level which is a "static" lookup/reference table used by log_entry. Below, where I specify Type as just varchar, I haven't figured out what size to make the field (varchar(50), varchar(100), etc.). My proposed schema for the two tables is as follows:

log_levels:

Column          |    Type        |    Special Info                 |  Nullable?    |  Example
============================================================================================================
id                   int              Primary Key, Auto-Increment      No             3
name                 varchar(20)                                       No             "Info"
tag                  varchar(20)                                       No             "INFO"

log_entries:

Column          |    Type        |    Special Info                 |  Nullable?    |  Example
============================================================================================================
id                   int              Primary Key, Auto-Increment      No             19348
app_name             varchar                                           No             "My App"
timestamp            long                                              No             1373316599 (UNIX Epoch time)
log_level_id         int              Foreign Key (log_levels)         No             3
origin_class         varchar                                           No             "com.me.myapp.SomeClass"
message              text                                              No             "Something log-worthy happened."
throwable_class      varchar                                           Yes            "java.lang.RuntimeException"

The problem child here is log_entries.message. This field will store a lot of information, and for reasons outside the scope of this question, I can't break this information out into better-normalized fields/tables. It's just a big text blob that will store lots of different kinds of data, and again, there's nothing I can do about that.

When I do perform reads (for the purposes of analytics/debugging), these are the queries I plan on using the most (and hence perhaps a basis for creating indexes from):

// Give me all the ERROR log entries.
SELECT * FROM log_entries le INNER JOIN log_levels ll ON le.log_level_id = ll.id WHERE ll.tag = 'ERROR'

// All ERROR log entries for a particular user.
SELECT * FROM log_entries le INNER JOIN log_levels ll ON le.log_level_id = ll.id WHERE le.message LIKE '%user: john.doe@example.com%'
    AND ll.tag = 'ERROR'

// All log entries for a particular user.
SELECT * FROM log_entries WHERE message LIKE '%user: john.doe@example.com%'

// All unique throwables in the system.
SELECT DISTINCT(throwable_class) FROM log_entries

// All messages in a given time range, or after a certain datetime.
SELECT * FROM log_entries WHERE timestamp >= 1373315311 AND timestamp <= 1439065711

// All log entries for a particular user within a time range.
SELECT * FROM log_entries WHERE message LIKE '%user: john.doe@example.com%' AND timestamp >= 1373315311 AND timestamp <= 1439065711

So I ask:

  • What indexes should I be creating here, and is there any sort of management/housekeeping I should be performing with the indexes (re-indexing, vaccuming, etc.)?
  • Any other optimizations I can be making, in general (again, nothing I can do about log_entries.message)?
    • For instance, I've heard that with some DB servers (not necessarily MySQL), the order of the columns (when you go to create the table) can impact performance as well…?!? etc.

Thanks in advance!

Best Answer

My rules of thumb for indexing are to

  • first, read the execution plan,

then think about adding (or verifying) an index on

  • every column used in a WHERE clause,
  • every column used in a JOIN condition,
  • every column used in an ORDER BY.

In some cases, one multi-column index will be faster in SELECT statements than several single-column indexes.

Then, make sure your WHERE clauses are sargable. Expressions like LIKE '%user: john.doe@example.com%' are not sargable. Unless you extract the user information and store it elsewhere, full table scans are in your future.

When MySQL has to use a filesort, it writes rows to the sort buffer using a fixed-width format. The wider your columns, the longer it's liable to take to sort.

Filesort uses a fixed-length row-storage format similar to that used by the MEMORY storage engine. Variable-length types such as VARCHAR are stored using a fixed length.

The fixed length for columns of type text defaults to 1024 bytes. MySQL will use a slower algorithm for sorts that include text columns.

[MySQL] normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm.

[snip]

One problem with this approach is that it reads rows twice: One time when evaluating the WHERE clause, and again after sorting the pair values. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.)