What databases or styles of database can use git (or another dvcs architecture) to manage cross-site replication

database-designdatabase-recommendation

I'm looking for a database software package for a dictionary (like the hunk(s) of dead tree that you might have somewhere at home). I would like the database to

scale well to a million key-value pairs. The pairs would normally be fewer than 20 characters, and the normally fewer than 1000 characters.
have perfect UTF-8 support.
deal with distributed version control. I would like it to deal well with having lots of branches and doing merges of branches. This could be built-in or provided independently by something like git.
have a well-supported interop with Ruby. I plan to write a desktop program in Ruby that talks to the database. (Ruby would also work for writing the web part if it ever happens.)
be fairly responsive (though I would like it to be fast).
deal well with changes both while just on a desktop and while providing data for something on the web. (Hopefully this project will get to the web part.)
be free software (as defined by RMS) and to be available at no charge.

All of this data would be entered by hand.

It wouldn't hurt if there were something like github for this database. I don't know if there is a database for which someone provides something like that, though.

No one's job, career, or company rides on this decision.

Any suggestions?

Best Answer

For simple key/value storage, you might like to consider Berkeley DB, though I would not think Git would be the best for 'version control'. You can use Ruby bindings with BDB.

Since your edit, I'm left wondering If you want an open-source implementation of something like Amazon S3

Related Solutions

Database design advice

                                                  +------------ --- ---+
                                                  | Assembly options   |
+------------+------------+----------+------------+---+---+---+ --- +--+
| assembly ▼ | unit cost  | quantity | total cost | 1 | 2 | 3 |     |50|
+------------+------------+----------+------------+---+---+---+ --- +--+
| VSD55      | £10'000    | 2        | £20'000    | 1 | 1 |   |     |  |

If somebody handed that quote to me, my first question would be "What's option 1 for the VSD55?" The answer would be "I don't know." That information isn't on the quote. In the unlikely event that person got to field a second question, that question would be "What does it cost?" Again, the answer would be "I don't know." A very disturbing silence would follow immediately, during which the person who handed me the quote would imagine how much better it might feel to be run over by a train.

Options must be line items on the quote, along with their unit price, quantity, and total price. Options must be named, not numbered. They should appear directly under their parent assembly, too, not scattered all over hell and half of Georgia.

If you want a shot at my money, you'd better make it crystal clear what I'm supposed to be getting for my money.

There's nothing (much) wrong with 50 check boxes on a user-interface form. That makes it easy to pick options. But the UI code should read the checkboxes and insert the right information into normalized tables.

Is it more secure to go through a 3rd database to connect two databases using the same login

One thing jumps out here:

The entire process uses the same set of login credentials

Problem

So hypothetical userX (whether some meatsack using Excel, or IIS AppPool Identity) can see some views and code. It doesn't matter what database these views and code are in because userX is setup in 3 databases anyway.

However, you lose ownership chaining like this.

Let's say WebDB.dbo.SomeProc calls PrivateDB.dbo.SomeTable. UserX requires permissions on both objects. If this was OneDB.WebGUI.SomeProc using OneDB.dbo.SomeTable then only the OneDB.WebGUI.SomeProc needs permissions. Permissions on referenced objects with the same owner are not checked.

Note: I haven't looked too deeply at cross database ownership chaining. I only know plain old "ownership chaining" well

Now, as per comments you really have 2 databases that can be combined. Not 3 which was originally implied. However, the intermediate and web can be combined.

The other "private" databases can perhaps be combined, but that'll be a separate issue. See the bottom link for a fuller discussion of "one database or many"

Solution?

If the extra databases are code containers only, then schemas are a better idea.

This sounds like you've used "Database" where you should use "Schema" (in the SQL Server sense, not MySQL sense). I'd have a WebGUI schema, a Helper or Common schema (to replace Intermediate database) and Desktop schema. This way you separate permissions based on clients and just have one database

With one database (in addition to "ownership chaining") you can also start to consider indexed views, SCHEMABINDING (I use it always) and such that can't be done with seperate databases

For more on schemas, see these questions:

Finally, there appears no reason to have separate databases based on "transactional integrity not required". See this question to explain this: Decision criteria on when to use a non-dbo schema vs a new Database