Based on your comment above, it sounds like when you said everything was stored in separate "databases", you actually meant to say "tables". I'm going to work with that assumption...
I'd suggest a simple table that stores who's been invited to which event, and what their response was.
invitations
-----------
id (primary key for the table)
user_ID (foreign key to users.ID)
event_ID (foreign key to events.ID)
response (could be a simple string, or an ID to a table of possible responses, such as "attending", "not attending", "unsure"...)
date_sent (a date showing when the invitation was sent)
respond_by_date (date that a user must response by)
mode_sent (was the invitation sent by email, text message, other electronic message, regular mail, semaphore flags, etc...)
The above structure will keep track of which user was invited to which event. It will also keep track of the user's response. Other fields such as date_sent
aren't necessary, but could be useful for other purposes. Data for this might look like:
invitations
-----------
id | user_id | event_id | response | date_sent | respond_by_date | mode_sent
--------------------------------------------------------------------------------------
1 | 123 | 5 | attending | 2014-12-01 | 2014-12-05 | email
2 | 456 | 5 | not attnd | 2014-12-01 | 2014-12-05 | email
3 | 456 | 7 | attend+1 | 2015-01-01 | 2014-12-02 | semaphore flags
(This assumes that the user and event tables already exist)
Would this work for your purposes?
Separate the data from the processing. A database contains data; application code does the processing.
In your situation there are effectively two sets of data:
- Raw data (especially for recurring events, such as "4pm every 3rd tues of the month until next June")
- Calendar events, such as the the dozens of entries that the above item implies.
Consider starting with two tables, one for each of those. Then write the messy code that translates the definition of a recurring event into the individual entries. (Note: That is code, not SQL.)
Obviously, if there is the option to change a recurring event, there needs to be a way to make potentially massive changes to the second table. This implies an id for the recurring event that is stored in the second table. A FOREIGN KEY might be the way to go. (I prefer to do such myself.)
There are probably other tables, such as for 'normalizing' users, etc. (Do not normalize dates or datetimes.)
"What's scheduled for this afternoon?" That turns into a range query.
Hint: If something is ongoing (eg vacationing for 2 weeks), break it into multiple daily events (14, midnight to midnight) for the second table. Otherwise the SQL can get messy and inefficient.
How does all that feel?
Best Answer
You should avoid duplicating rows in a database design. You have a many-to-many relationship between events and users, so you will need the "connection" table between them to do that. There will be a one event to many connection relationship and one user to many connection relationship.
The connection table will have FK columns to both event and user, plus any attribute that is useful about that connection (e.g. visibility indicator, date accepted). Make the two FK columns be a unique constraint to make sure there are no duplicates.
You will need another table, perhaps named USER_EVENT_COMMENT, that will also FK to the user and event tables. You can either use an incrementing as a PK or combine the two FK columns with something else (perhaps a date) to make a composite PK. I think the auto increment approach is cleaner.
As to restricting the visibility of the comments, that depends on how you access the database from your application. If all access to the database is the same identity, then your application will need to enforce visibility. If you access with different identities, you can enforce visibility in the database.
As for size, a few billion rows is no big deal. The proper indexes and thoughtful query construction (use bind variables) and you should be fine.