I'm designing tables for a new log system and am having trouble figuring out what indexes I'll need (I'm a developer, not a DBA), and am also open to any suggestions/criticisms/etc. for making the tables as "fast" as possible (from my app's point of view).
Background info: the DB itself is a MySQL instance hosted on AWS. It's a shared server and I don't have tuning privileges beyond basic table setup, indexes, key constraints, etc. This is a logging system, so very write-intensive. Although I will write scripts to dump certain log records (errors, etc.), there won't be much reading going on.
The logging system will consist of two tables, log_entry
which is the log record itself, and log_level
which is a "static" lookup/reference table used by log_entry
. Below, where I specify Type as just varchar
, I haven't figured out what size to make the field (varchar(50)
, varchar(100)
, etc.). My proposed schema for the two tables is as follows:
log_levels
:
Column | Type | Special Info | Nullable? | Example
============================================================================================================
id int Primary Key, Auto-Increment No 3
name varchar(20) No "Info"
tag varchar(20) No "INFO"
log_entries
:
Column | Type | Special Info | Nullable? | Example
============================================================================================================
id int Primary Key, Auto-Increment No 19348
app_name varchar No "My App"
timestamp long No 1373316599 (UNIX Epoch time)
log_level_id int Foreign Key (log_levels) No 3
origin_class varchar No "com.me.myapp.SomeClass"
message text No "Something log-worthy happened."
throwable_class varchar Yes "java.lang.RuntimeException"
The problem child here is log_entries.message
. This field will store a lot of information, and for reasons outside the scope of this question, I can't break this information out into better-normalized fields/tables. It's just a big text blob that will store lots of different kinds of data, and again, there's nothing I can do about that.
When I do perform reads (for the purposes of analytics/debugging), these are the queries I plan on using the most (and hence perhaps a basis for creating indexes from):
// Give me all the ERROR log entries.
SELECT * FROM log_entries le INNER JOIN log_levels ll ON le.log_level_id = ll.id WHERE ll.tag = 'ERROR'
// All ERROR log entries for a particular user.
SELECT * FROM log_entries le INNER JOIN log_levels ll ON le.log_level_id = ll.id WHERE le.message LIKE '%user: john.doe@example.com%'
AND ll.tag = 'ERROR'
// All log entries for a particular user.
SELECT * FROM log_entries WHERE message LIKE '%user: john.doe@example.com%'
// All unique throwables in the system.
SELECT DISTINCT(throwable_class) FROM log_entries
// All messages in a given time range, or after a certain datetime.
SELECT * FROM log_entries WHERE timestamp >= 1373315311 AND timestamp <= 1439065711
// All log entries for a particular user within a time range.
SELECT * FROM log_entries WHERE message LIKE '%user: john.doe@example.com%' AND timestamp >= 1373315311 AND timestamp <= 1439065711
So I ask:
- What indexes should I be creating here, and is there any sort of management/housekeeping I should be performing with the indexes (re-indexing, vaccuming, etc.)?
- Any other optimizations I can be making, in general (again, nothing I can do about
log_entries.message
)?- For instance, I've heard that with some DB servers (not necessarily MySQL), the order of the columns (when you go to create the table) can impact performance as well…?!? etc.
Thanks in advance!
Best Answer
My rules of thumb for indexing are to
then think about adding (or verifying) an index on
In some cases, one multi-column index will be faster in SELECT statements than several single-column indexes.
Then, make sure your WHERE clauses are sargable. Expressions like
LIKE '%user: john.doe@example.com%'
are not sargable. Unless you extract the user information and store it elsewhere, full table scans are in your future.When MySQL has to use a filesort, it writes rows to the sort buffer using a fixed-width format. The wider your columns, the longer it's liable to take to sort.
The fixed length for columns of type text defaults to 1024 bytes. MySQL will use a slower algorithm for sorts that include text columns.
[snip]