Shared nothing typically refers to hardware, though it could also be used when describing an SOA architecture.
From a hardware perspective, a "shared nothing" Oracle database simply means a machine using local disks, memory, etc.
Oracle RAC (real application clusters) is a shared disk architecture, but not shared memory. Horizontal scaling is then achieved by adding more machines to the cluster.
When talking in SOA terms, "shared nothing" means that each service has a corresponding database which is only accessed by that service. So the ACCOUNTS service accesses the ACCOUNTS_DB, ORDERS service the ORDERS_DB and so on. These databases could be shared nothing from a hardware perspective as well, or use RAC.
Ensuring consistency of data and references which would normally be handled using foreign keys becomes a challenge in SOA shared nothing databases.
Sharding typically refers to partitioning managed at the application level, rather than within the database. For example, you could partition accounts by email address and direct customers with address starting A-C to ACCOUNTS_DB01, D-F ACCOUNT_DB02 and so on. The shard mapping could be a simple range like this, a function on the input or a lookup database stating which database is stored in. The databases would be "hardware shared nothing" in this case as the idea is you use relatively cheap machines which are easily added and replaced.
You could shard your databases at the application level and still have Oracle partitioning at the table level within the database itself. So you could shard your ORDERS database by customer, then partition the orders table by order date as well inside the database.
The downside to both meanings of shared nothing comes if you frequently run queries that have to access several databases. In these cases your joins will be pushed into the application layer rather than the DB layer so are likely to be slower. Good governance is necessary to ensure this doesn't happen.
The problem with your first example is the tri-link table. Is that going to require one of the foreign keys on either report or recommendations to always be NULL so that keywords link only one way or the other?
In the case of your second example, the joining from the base to the derived tables now may require use of the type selector or LEFT JOINs depending on how you do it.
Given that, why not just make it explicit and eliminate all the NULLs and LEFT JOINs?
Reports
----------
ReportID
ReportName
Recommendations
----------
RecommendationID
RecommendationName
ReportID (foreign key)
Keywords
----------
KeywordID
KeywordName
ReportKeywords
----------
KeywordID (foreign key)
ReportID (foreign key)
RecommendationKeywords
----------
KeywordID (foreign key)
RecommendationID (foreign key)
In this scenario when you add something else which needs to be tagged, you just add the entity table and the linkage table.
Then your search results look like this (see there is still type selection going on and turning them into generics at the object results level if you want a single results list):
SELECT CAST('REPORT' AS VARCHAR(15)) AS ResultType
,Reports.ReportID AS ObjectID
,Reports.ReportName AS ObjectName
FROM Keywords
INNER JOIN ReportKeywords
ON ReportKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Reports
ON Reports.ReportID = ReportKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'
UNION ALL
SELECT 'RECOMMENDATION' AS ResultType
,Recommendations.RecommendationID AS ObjectID
,Recommendations.RecommendationName AS ObjectName
FROM Keywords
INNER JOIN RecommendationKeywords
ON RecommendationKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Recommendations
ON Recommendations.RecommendationID = RecommendationKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'
No matter what, somewhere there is going to be type selection and some kind of branching going on.
If you look at how you would do this in your option 1, it's similar but with either a CASE statement or LEFT JOINs and a COALESCE. As you expand your option 2 with more things being linked, you have to keep adding more LEFT JOINs where things are typically NOT being found (an object that is linked can only have one derived table which is valid).
I don't think there is anything fundamentally wrong with your option 2, and you could actually make it look like this proposal with a use of views.
In your option 1, I have some difficulty seeing why you opted for the tri-link table.
Best Answer
The question I would ask is whether the direct relationship between
Anthology
andComposer
is "important" to the system? There are all kinds of incidental relationships between tangible things that are recorded in any system. However, only certain of these are important for the purposes of the system itself. These are the ones that belong in a relational schema.If anthologies will always be made up of compositions, and compositions will always have composers, then you can always derive the relationship between anthologies and composers using a query. If you did it that way there would be no risk of inconsistent relationships.
This model would look like this:
You would have table definitions something like this:
The good thing about this is that it has no redundant data to get out of synch. The problem is that it doesn't quite fit Shannon's requirement that an anthology be about one composer and that all of the compositions in that anthology must be from the same composer.
Unfortunately, this is not an easy problem to solve with declarative referential constraints. Declarative constraints are great for making sure that everything within a row makes sense. What they aren't built to do is enforce rules between rows.
There is a declarative way to solve this problem, but it involves a trade-off that many people wouldn't like, because it smells an awful lot like violating normalization. Some people would argue (Mike Sherril comes to mind) that this solution doesn't literally violate normalization rules, but people who are less well attuned to the actual rules of normalization will probably look at this solution with skepticism.
So what is this controversial solution? It looks like this:
Note that the primary keys of some of these tables have been modified. Here is the SQL DDL for the solution: (You need to scroll it to the bottom to see the magic.)
Note that the way this works is you have to impose a single composer on an anthology by making the composer part of the anthology's primary key. You do the same thing with composition. Then, when you create an intersection between composition and anthology, you have the composer ID twice. You can then use a check constraint to declaratively enforce that compositions and anthologies have not only a single composer, but the same one.
NOTE: I'm not saying you should do this, I'm just saying you could do this. YMMV etc.