Why do you have 3 databases with the same users?
What happens if one database goes down now?
I'm asking these questions because the Con of the user-source database going down preventing use of the other two databases may not be as big a problem as it sounds.
Additionally, if a database goes down now you will already have consistency problems if users are modified in the other two DBs during that period. I don't think we can give the best advice without more context.
Right now I would create a fourth database for users, make changes only there, and sync across the other databases. You're already distributed-denormalized by changing essentially the same data in three places, and probably already have consistency problems. If fault tolerance is important then this scheme still gives that while solving the multiple entry problem.
I think that having this fourth user/settings-only database rather than using one of your existing ones is a good strategy as it becomes less likely to go down since it isn't tied to other resources or heavily used. I see now that you said you can't do this, but I'm not clear on why. Do the applications on the main database support user editing directly, or is user editing a completely separate function that can be pointed elsewhere?
True, with this "canonical user table" idea--whether you use synonyms or synchronize the data--you can't modify users during a down user DB, but that seems ok to me. Fix the problem and bring it up! One source, one place to edit, one thing to fix if broken. With synchronization, all other databases have temporary copies they can work from, but no editing. Minimizing data duplication across systems is a great and useful goal. It is a serious problem to enter the same data in three places, so do whatever you can to eliminate that.
To address more of your comments, if your user IDs are different across all your applications, then you should seriously consider a planned down time for your two new databases to bring them in sync with the main one (ask a separate question if you need ideas on how to accomplish this, which is tricky but not really THAT hard).
If you analyze your business needs and find that the main database must have the user data that's okay, you still use it as canonical, then choose between synonyms or synchronizing to address how unplanned down times affect all the databases.
One additional Con to using synonyms is that you would no longer be able to have proper FKs to Users in each database.
I think the main thing to consider here is how you store who have seen what. The most convenient way is to store the post-user pairs. Any solution with less granularity will have some cases hard to handle - your second suggestion, if I understood it correctly, would store a group-user pair, plus a flag that all posts are already seen (I imagine that this would be rather the last post ID at the moment of joining the group). This would only work if you can't miss posts, so that the unseen ones form a contiguous group, all having a bigger ID than all already-seen ones.
Since the user-post pairing consists exactly these two fields, which can be, expecting extremely high user and post numbers, 2×8 bytes per row, I wouldn't expect performance problems with 10,000 users and 100,000 posts (provided there are really groups, I mean, not all users belong to all groups and be able to read all posts - in this case, the table would be approximately 30 GB in size (plus the same size for the index necessary for the primary key). This size can only be handled by appropriate hardware and some magic, like archiving old posts as those are queried rarely and so on. If your system exceeds that size, you probably make nice money of it, being able to beef your system up :)
Best Answer
Normal guiding principles are to,
deleted
and then for it to seem like the data is deleted to users, without it really being deleted. Do you really want the data gone forever?delete_date
with current date and delete them" I would suggest storing thedelete_queue_date
instead. The date that the data entered the queue. That's what matters. The date the row actually gets deleted is a function of that. Generally you don't storef(x)
on the table, you storex
.