How to best store a directory tree in a database

database-designtree

I want to represent my directory structure in some format (currently I'm just using JSON.)

This is how a sample JSON might look. For those curious it was generated using unix tree command: tree /path/to/folder -J --noreport -h.

{
    ...
    "type":"directory",
    "name":"dev",
    "size":4096,
    "contents":[
        {"type":"directory","name":"protocols","size":4096, "contents":[]},
        {"type":"file","name":"architecture.txt","size":4716},
        {"type":"file","name":"exceptions.py","size":31263},
        {"type":"file","name":"models.js","size":101882},
        {"type":"file","name":"proxy.cpp","size":29097},
        {"type":"file","name":"keylogfile.xyz","size":7889},
        {"type":"file","name":"Readme.txt","size":8857},
    ]
    ...
}

So this is just a representation of the entire folder structure of some path as JSON.

I can have many such separate JSON files, each representing a directory tree. There's no co-relation / links between these files.

On running the tree command on standard Windows "C:\" partition, I get a JSON file of ~30 MB in size. So I think we can assume that the max file size that a user would upload will be ~100 MB.


Once the files are stored, these are the operations I plan to make on a file:

  1. Get the entire file.
  2. Given a path, get it's immediate children (akin to doing ls on that path.)
  3. Given a path, get the complete sub tree of the path.
  4. Modify metadata of some item, say change its name or add a new note with it.

2 & 3 are the operations that I expect to happen the most.


Here are the ways of storing this data that I've come up with:

  1. NO DB:

    • Store the file as-is on disk (/home/forest/<uuid>.json)
    • Operation no 1 becomes fast & easy – just send the entire file
    • but the others can get slow because they all involve parsing the entire JSON first and then iterating on it.
  2. NO SQL

    • I've never used NO SQL databases before (only read a few posts about their use-cases etc.)
    • I think op 1 (entire file read) would be fast
    • but no idea if there will be any improvement on the other ops as compared to just using files.
  3. RDBMS

    • I've used relational DBs before but don't think my data has anything to do with tables
    • I did google around though and found that postgres has an ltree type to store hierarchical data, but I'm not sure If that is what I need.
      • If it is, HOW will I get the data in?
  4. Graph Databases?

    • Again, no prior experience with these, just shooting in the dark
    • At the end of the day, a directory is just a tree
    • Instead of creating a vanilla JSON, maybe I could generate a format that one of the Graph DBs can read-in

    • Once I have some graph DB object, maybe all the operations become fast enough.


My question is this: For my use case, what is the best way to store the data?


Replies to comments.

Why do you think you need a database for this?

Depending upon what you are trying to do you might not need to store the data in a db at all.

To be really honest, I don't know if I need databases, I know I want this data stored in a format that allows me to perform the above defined operations reasonably fast.

What are you trying to achieve by putting this into a db? Reporting? Analysis? Will it be used by an application?

I'm doing this for a web application. Once the data is stored in a way I'm satisfied with, I plan to create a Web API (probably JSON based) that performs the operations I've listed above. The data will be sent to the client where it will get displayed on the frontend in some way.

Do you want 1 row per file? What other meta data do you want to store? Size? date? File owner?

Yeah, I want the metadata that's usually associated with files.


A friend asked this, so I'm clearing it up here: I don't just have 1 JSON file (representing a tree). I could have n number of such trees (they are basically uploaded by the user and I expect them to be <100 MB in size.)

Best Answer

I'm a DB noob as well, I do recall postresql having json datatype for storing JSON structures. Maybe you can review the postgresql doc and decide if it works for you.