Mysql – fixed length table

database-sizeMySQLtabletimestamp

I only have ever used mysql and storing data on .txt files.

Currently I'm working with tick data (I have node.js on my server). My custom setup so far stores a fixed amount of tick data as comma separated values on a txt file (I use javascript to add new vaules and delete old). The setup can't safely collect as much info as I would like and occasionally a 'server hic-up' happens and the data gets lost/damaged. It takes a long time to 're-grow' all the data, so I need to get something more robust (this way is too fragile).

How can I achieve a data structure that fits this description?

The database should store prices that are associated with unix timestamps oldest to newest. the database should have a maximum number of rows (100,000). If adding one new row would put the database over its maximum size then it should delete its oldest row.

My thoughts are:
Could I set the primary key to be the timestamp?
How would I handle keeping the database to a set amount of rows?
what would the speed be if I get all the rows at once in a query?
What database language would be suited for this?

I only have 3-4 users that will access this app so there wont be a lot of querys. txt files are defiantly not good for (me) storing large data.

Any advice?

Best Answer

Could I set the primary key to be the timestamp?

Yes, the primary key can be any column but I think you'd be better off using an ID column for the primary key to ensure it's unique (which is technically impossible to gaurantee with a timestamp primary key). As ypercube suggests, a PK of (timestamp, id) and using milliseconds on the time value may be a good option as well when using InnoDB and version 5.6+.

You can still create an index on the datetime column which will give similar performance, though if you're pulling back all rows all the time it's hardly an issue to begin with.

How would I handle keeping the database to a set amount of rows?

You can use a stored procedure to handle multiple pieces of logic, such as

Insert new row
Check table row count
- If > 100,000 delete oldest row

Or, you can use a trigger which would have similar logic and automatically fires for every insert on the table.

what would the speed be if I get all the rows at once in a query?

100,000 rows isn't that many in terms of a relational database, so you shouldn't have retrieval time being an issue.

What database language would be suited for this?

Again, it's a small size so pretty much anything works. MySQL seems like an obvious choice if you have previous experience with it.

Related Solutions

MySQL Large DELETE with JOINs

In MySQL there is a multi-table DELETE syntax. Your first DELETE will delete rows only from the players TABLE. If you want to delete from multiple tables you have to use something like:

DELETE FROM "players","stats","photos"
USING "players"
  LEFT JOIN "stats"
    ON "players".id = "stats".player_id
  LEFT JOIN "photos"
    ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"

This doesn't address the problem with the long-running DELETE statement though. In fact above query should take even more time, because now it actually would delete rows from stats and photos tables. The workaround you could use is to split the large DELETE into smaller ones. Since you have a nice WHERE condition, you could manually split the deletes on that (for example one DELETE for each ten years of players.born_on) and run them in ascending order, that is:

DELETE FROM "players","stats","photos"
USING "players"
  LEFT JOIN "stats"
    ON "players".id = "stats".player_id
  LEFT JOIN "photos"
    ON "players".id = "photos".player_id
WHERE "players".born_on < "1930-01-01";

DELETE FROM "players","stats","photos"
USING "players"
  LEFT JOIN "stats"
    ON "players".id = "stats".player_id
  LEFT JOIN "photos"
    ON "players".id = "photos".player_id
WHERE "players".born_on < "1940-01-01";

DELETE FROM "players","stats","photos"
USING "players"
  LEFT JOIN "stats"
    ON "players".id = "stats".player_id
  LEFT JOIN "photos"
    ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01";

It this is too coarse (i.e. it takes too long do execute each query) you should make the WHERE conditions even more fine-grained (perhaps delete one year each chunk).

Also there is a --purge option for pt-archiver from Percona Toolkit which would split the data to be deleted in chunks automatically, but it doesn't seem to support the multi table case. See example usage of pt-archiver in this presentation

From your table definitions I see that you don't have an index on players.birth_date (I suppose that this is the column you relate to as born_at in your example queries). This makes the decade chunks approach useless, since every query would have to scan all players table.

If you can't afford having a long table lock for the DELETE to finish, you most likely can't afford to create an index on the birth_date column as well.

You could split the data on another column, PRIMARY KEY is a good bet. You can write a script which would process all the players in chunks of 10000 (or less or more, depending on the length of a single DELETE statement):

DELETE FROM "players","stats","photos"
USING "players"
  LEFT JOIN "stats"
    ON "players".id = "stats".player_id
  LEFT JOIN "photos"
    ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
  AND "players".id BETWEEN n*10000+1 AND (n+1)*10000

Where n would be a parameter ranging from 0 to MAX(players.id)/10000 . This way you will avoid a full table scan (which certainly is painful for a 100M table)

You could also try to estimate the DELETE complexity with an EXPLAIN SELECT instead of DELETE:

EXPLAIN SELECT *
FROM "players"
  LEFT JOIN "stats"
    ON "players".id = "stats".player_id
  LEFT JOIN "photos"
    ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
  AND "players".id BETWEEN 1 AND 10000

Mysql – Two types of data, so two type of databases

If the two families of data are related then I would generally recommend keeping the in the same database - it makes maintaining integrity easier, for instance taking a reliable and consistent backup is (depending on database engine of course) a single operation.

Of course if the two families of data are very loosely coupled anyway this need not be a concern, and using different databases or even completely different technologies for each becomes more likely to be a good solution particularly as you scale out (your first hardware scaling being too put each family of data on its own (virtual) machine. Remember though that if you do split the data over more than one storage technology you are increasing the amount of knowledge needed to maintain the system.

In MSSQL or similar you could break data with significantly different IO patterns into separate file groups in the same database and store them on different drives if needed for performance - this way all the data can be dealt with atomically for transaction or backup purposes and so forth while still being segregated for performance reasons.

Best Answer

Related Solutions

MySQL Large DELETE with JOINs

Mysql – Two types of data, so two type of databases

Related Question