Mysql – Are Blob Sizes Fixed

memory-optimized-tablesMySQLstorage

I'm using sqlalchemy to store objects in a database. The kind of object I'm creating is unknown until run time. I could be storing something as simple as a string. I could be storing a jpg. I might create my own file format and store that. The point is it's flexible.

The table I envision contains at least two columns. One describes the file type (.jpg, .txt, .py, etc), and the other is a blob with the file data. What I want to know is whether I need to define the blob size when I create the table. Is it possible to let mysql dynamically set the size of blob? This would simplify my design nicely.

I'm aware there are other ways to do it, but I have good reasons to do it this way. I don't mind going down this path and doubling back later if it turns into a dead end.

Best Answer

No, as defined, a Blob is "a binary large object that can hold a variable amount of data." It can hold between 0 and 2^16 - 1 bytes (65535). A LONGBLOB can hold up to 4,294,967,295 bytes (or 4GB).

In most engines, blobs are implemented dynamically, meaning that it will only take its actual size plus some extra bytes for lenght/addressing, etc. In fact, fixed-row size formats are normally not compatible with blobs (MyISAM fixed, Memory storage engine).

However, there are some reasons to avoid storing "files" on a relational database (and MySQL and InnoDB in particular, mainly due to the overhead on the transaction and binary logs, plus multiversioning and its incompatibilities with temporary tables on memory). The actual decision varies depending on the needs -the most common alternative is to store those on the filesystem and just add a URL on the database.

If you store those objects inside the database, the size will not be your biggest problem (unless they are over 4GB), but the performance on complex queries and writes. However, a pure key-value store may work well with MySQL model. There is also specialised document-based databases.

Related Solutions

Mysql – the right way to manage users data

This is a good "there's more than one way to do it" question.

Start by thinking about your data usage:

Think about which fields are going to be read and written more often.

For example, first name and last name are probably written once and are rarely changed, but depending on your app, they could be read very often.

Last login date could be written frequently, and you might want to keep every login date.

Things like activation key and registration date, depending on how authentication works in your app, might be written once and then very rarely read.

You should also think about what data is pulled at the same time, and try to denormalize to make those reads faster. For example, if you are always pulling username and password and first name together (but no other contact info), put firstname in the login table, too. That is, think about avoiding joins if this data will be large, and be kind to yourself.

Also think about which fields might be in the "where" clause of your queries, and index on those fields.

Here's one way I might do it:

Contact Table (one insert, rare updates, many reads)

+------------+---------------+------+-----+---------+----------------+
| Field      | Type          | Null | Key | Default | Extra          |
+------------+---------------+------+-----+---------+----------------+
| user_id    | int(11)       | NO   | PRI | NULL    | auto_increment |
| first_name | varchar(50)   | NO   |     | NULL    |                |
| last_name  | varchar(50)   | NO   |     | NULL    |                |
| address    | varchar(50)   | NO   |     | NULL    |                |
| city       | varchar(50)   | NO   |     | NULL    |                |
| state      | varchar(50)   | NO   |     | NULL    |                |
| country    | varchar(50)   | NO   |     | NULL    |                |
| email      | varchar(1024) | NO   |     | NULL    |                |
+------------+---------------+------+-----+---------+----------------+

Login Table (one insert, maybe updates, many reads; need first_name to be echoed on login)

+------------+---------------+------+-----+---------+----------------+
| Field      | Type          | Null | Key | Default | Extra          |
+------------+---------------+------+-----+---------+----------------+
| user_id    | int(11)       | NO   | PRI | NULL    |                |
| username   | varchar(32)   | NO   |     | NULL    |                |
| password   | varchar(32)   | NO   |     | NULL    |                |
| first_name | varchar(50)   | NO   |     | NULL    |                |
+------------+---------------+------+-----+---------+----------------+

Login History Table (many writes, multiple entries per user_id)

+------------+---------------+------+-----+---------+----------------+
| Field      | Type          | Null | Key | Default | Extra          |
+------------+---------------+------+-----+---------+----------------+
| user_id    | int(11)       | NO   | MUL | NULL    |                | 
| last_login | timestamp     | NO   | MUL | NULL    |                |
+------------+---------------+------+-----+---------+----------------+

Signup Table (one write, rarely read)

+------------------+---------------+------+-----+---------+----------------+
| Field            | Type          | Null | Key | Default | Extra          |
+------------------+---------------+------+-----+---------+----------------+
| user_id          | int(11)       | NO   | PRI | NULL    |                |
| registration_date| timestamp     | NO   |     | NULL    |                |
| activation_key   | varchar(50)   | NO   |     | NULL    |                |
+------------------+---------------+------+-----+---------+----------------+

MySQL – Managing Fixed Length Tables

Could I set the primary key to be the timestamp?

Yes, the primary key can be any column but I think you'd be better off using an ID column for the primary key to ensure it's unique (which is technically impossible to gaurantee with a timestamp primary key). As ypercube suggests, a PK of (timestamp, id) and using milliseconds on the time value may be a good option as well when using InnoDB and version 5.6+.

You can still create an index on the datetime column which will give similar performance, though if you're pulling back all rows all the time it's hardly an issue to begin with.

How would I handle keeping the database to a set amount of rows?

You can use a stored procedure to handle multiple pieces of logic, such as

Insert new row
Check table row count
- If > 100,000 delete oldest row

Or, you can use a trigger which would have similar logic and automatically fires for every insert on the table.

what would the speed be if I get all the rows at once in a query?

100,000 rows isn't that many in terms of a relational database, so you shouldn't have retrieval time being an issue.

What database language would be suited for this?

Again, it's a small size so pretty much anything works. MySQL seems like an obvious choice if you have previous experience with it.

Best Answer

Related Solutions

Mysql – the right way to manage users data

MySQL – Managing Fixed Length Tables

Related Question