I have a situation where I want to store data like the following.
Path: String
isDirectory: Bool
isSymLink: Bool
linksTo: File
lastmodifiedData: Long
size: Long,
permissions: String
I will be storing this kind of data for many differet directory structures that very often have the same file metadata.
I do not want to have many repeated entries in the database, but I also do not want to have a 5 column Primary key.
I was considering a table, lets call it FILE_BASE
Path: String
isDirectory: Bool
isSymLink: Bool
linksTo: File
and another called FILE_INSTANCE
directory-root:String
fileBase: // should this be an ID, or a multi-column FK
lastmodifiedData: Long
size: Long,
permissions: String
I am trying to understand the best way to go about creating a key for the first table that can be referenced in the second table.
If I use a composite key, then basically all of the columns in the first table are part of the primary key.
If I use surrogate keys, then I need to do a query before every insert to determine if the file(s) base that i want to add already exists in the table, and recover the surrogate ID. If I am dealing with thousands of files per insert, then this could get cumbersome quickly.
I could generate the ID's myself using some sort of UUID algorithm on a concatenation of the table columns, but that could potentially generate some pretty big ID's for many of the file bases, depending on their path in the file system.
I feel like this is just a trade-off between space and time complexity. I also feel like I am overthinking this, as I do not have a lot of database experience. Luckily this is just a test project that will likely get thrown away anyway, but I could use some input, or suggestions on reading so that I can understand what a good implementation would look like.
Is there something obvious that I am missing here?
–Edited: Added directory root identifier to second table.
Best Answer
In order to normalize, you need to do is accommodate for LVM, and partition information. Then you should only have to keep inode, and meta data, (
stat
andlsattr
if you want to be super anal). You may or may not want emulate hierarchy in path. We do in this example. The repetitive meta-data isn't an issue.