Postgresql – storing file paths in table

database-designpostgresql

I'm working on a project which requires millions of images to be stored for their relative rows.

I have decided to store the images on the file system and store the image's path in a separate table which has a foreign key to the row which the images belong to.

Since I will be storing millions of images, I'm a bit concerned about keeping track of these images. I will have to add a few computers/hdds to store the images. From what I have read I should be using a distributed file system. Am I correct?

Are there any best conventions I should follow before I start implementing this? I'm thinking about storing the images and giving them a UUID name, and to store the UUID as the path of the image. Then my application ill contact the rest of the path.

Best Answer

I have decided to store the images on the file system and store the image's path in a separate table which has a foreign key to the row which the images belong to.

Why not just store the paths on the images table? Why have a 1:1 relationship?

Since I will be storing millions of images, I'm a bit concerned about keeping track of these images. I will have to add a few computers/hdds to store the images. From what I have read I should be using a distributed file system. Am I correct?

You likely do not need a distributed file system. I wouldn't bother with that. Just start an ZFS array up.

Are there any best conventions I should follow before I start implementing this? I'm thinking about storing the images and giving them a UUID name, and to store the UUID as the path of the image. Then my application ill contact the rest of the path.

I wouldn't do that. You're only talking millions. Just use an int. If you want to store something useful on the row, I would store the sha256 of the image.

Related Solutions

SQL Server – Storing Massive Data from Sensor Array

Yes, there is a pretty big pitfall you're going to run into fairly quickly, and that is with the size and maintenance of the tables. You are somewhat on the right track by saying that you want to put your data into a temporary table daily, and then move it into your permanent table, but you'll soon run into trouble with this scheme.

For example, let's say you want to "roll off" the oldest month's worth of data after two years. In your design, you would have to issue a DELETE statement against your big, big table. This will likely be somewhat slow, depending on the number of indexes you have. Also, it will cause index fragmentation, and the only way to fix that would be to rebuild or reorganize the indexes on this very large table which would also cause performance problems. There are a whole host of other issues with a big single table type design as well. For example, with a big, single table, you can't do FILEGROUP based backups, which means that if you want to have a full backup of your database, it's gonna be BIG, and it's gonna take a LONG time to complete.

What's the solution? Table partitioning. Read about this in depth, in as many places as you can. Basically, partitioning allows you to split up your data onto "tables within tables" -- each partition shares the same schema, and is accessed through the table object, but can be indexed and maintained differently. Partitions are basically tables, cut up by some useful key. In your case it will likely be date. They can be dropped just like (and just as fast as) tables, which means that if you partition your big data tables by date, you can simply drop old partitions instantly, with no adverse effect to the indexes on any of the other partitions. You can put partitions on different filegroups, which means that older partitions can be rolled off, or rolled on to cheaper commodity storage if it's not commonly used. Last but not least, in SQL 2012 you'll be able to create COLUMNSTORE type indexes on your older, read-only partitions, while having a different, more insert-oriented indexing scheme on the active partition where you're inserting all your sensor data.

Hope this helps. You have a good amount of research to do regarding partitioning and partitioning schemes, but hopefully now you know the direction you need to be looking.

P.S.: Oh, and I forgot your bulleted list of questions... Answer 1, 2, and 5. See above. Answer 3: In SQL Server, you can compress on a partition by partition basis, so compress your older partitions aggressively using PAGE compression. But I believe your out-of-row large data types will not be compressed if you do this -- again, you may want to alleviate this problem by normalizing your sensor values. Answer 4: Absolutely not, but if all you want to do is store static data by day and never search on it any other way, compressed flat files may be a much easier way to go.

P.P.S: Oh, and another thing. You don't need your two-table solution to make this all work. Large binary sensor data should be of type VARBINARY(MAX) because its values can be stored "out of row" but still be a column in a single table (see the sp_tableoption documentation). You may want to consider normalizing some of your sensor data out of the binary data you have in the table, though, because your database won't be good for much beyond retrieving chunks of sensor data by time if you don't.

Mysql – big db for managing user files and settings

The first path is certainly the more conventional of the two, absent a compelling reason to break the system down into multiple schemas. Something along these lines:

CREATE TABLE user (
  user_id INT AUTO_INCREMENT PRIMARY KEY NOT NULL ,
  user_name VARCHAR(50) NOT NULL,
  password VARCHAR(32) NOT NULL,
  status_code INT NOT NULL
) ENGINE=InnoDB;

CREATE TABLE user_file (
  file_id INT AUTO_INCREMENT PRIMARY KEY NOT NULL ,
  user_id INT NOT NULL,
  path VARCHAR(2000) NOT NULL,
  INDEX (user_id),
  CONSTRAINT FOREIGN KEY (user_id)
    REFERENCES user(user_id)
) ENGINE=InnoDB;

CREATE TABLE user_settings (
  user_id INT PRIMARY KEY NOT NULL,
  email VARCHAR(150),
  color VARCHAR(10),
  etc VARCHAR(255),
  CONSTRAINT FOREIGN KEY (user_id)
    REFERENCES user(user_id)
) ENGINE=InnoDB;

Best Answer

Related Solutions

SQL Server – Storing Massive Data from Sensor Array

Mysql – big db for managing user files and settings

Related Question