From your question, it doesn't sound like you need true transactional semantics, and that asynchronous replication would be fine.
If that's the case, it's fairly trivial to build a replication system using triggers within Oracle to track data changes, and then replaying those changes into MySQL, using something like perl (or other scripting languages) as the glue code to talk between both systems using SQL.
This might sound like it would be a lot of overhead, but we've built numerous systems like this for our clients, and the overhead from Oracle's standpoint is pretty low; we've run it on some OLTP systems powering top 1000 websites that had hefty transactional workloads, and Oracle handled it like a champ.
Your question focused on this:
However, my question is regarding the non sharded data which is
common/universal. An example of data like this may be an Inventory
table for instance or possibly an Employee table, user table etc .
When you're doing sharding, and you have data that all of the shards need to see, you have to classify that data with a few attributes:
Does it change frequently? In your examples, you listed Inventory, Employee, and User. Typically inventory changes very fast, but the Employees records only change periodically (say, a few hundred updates per day).
How much delay can each shard tolerate? Even though the Inventory may constantly be changing, you can typically tolerate a large amount of delay (minutes or even hours) on a table like that. If you're selling unique items with a very limited quantity that you can never restock (think original artworks), then you don't shard that data at all - you only query the original database. However, in most online stores, you're not selling out of every item every day, and you're going to restock things quickly anyway, so you don't really need up-to-the-millisecond counts of inventory. In fact, in most cases, you only need an In-Stock flag that's either 0 or 1, and a central process updates that flag. That way, you don't have to push every up/down bump of item count out to every shard. Employee or User data, on the other hand, may need to be pushed out to every shard nearly instantaneously.
Will you be joining from the sharded tables to the non-sharded ones? Ideally, the answer here is no - you should make two separate queries to get the data, and then join them on the app side. This gets a lot harder from an app perspective, but it gives you the capability to get the freshest data from each source.
Is this original data, or copied? Another way to think of this question: what do you need to back up, and how frequently? Typically in a high-volume sharding environment, you want the backups to be as fast and as small as possible. (After all, you need to protect each node, and you want all of the shards to fail over to DR at the same point in time - not have some shards with newer data than others.) This means the sharded data and the non-sharded data should be in completely separate databases - even if they're on the same server. I may need constant transaction log backups of my sharded (original) data, but I may not need to back up the non-sharded data at all. It's probably easier for me to just refresh my Employees or Users table from the single source of truth rather than back it up on every shard. If all of my data is in a single database, though, I lose that capability.
Now, about your concerns:
"Transactional issues...you will no longer be able to do this easily." Correct. In sharded scenarios, throw the concept of a transaction out the window. It gets worse, too - for the sharded data, you could have one shard up and online, and another shard down temporarily due to a cluster instance failover or restart. You need to plan for failure of any part of the system, at any time.
"Not possible to do cross database referential integrity." Correct. When you split a single table out across multiple servers, you're putting your big boy pants on and telling the database server that you're taking over for tough tasks like point-in-time backups, relationships between tables, and combining data from multiple sources. It's on you and your code now.
"Recoding large areas of the system so that it knows to write common data to the new universal database but read common data from the shards." Correct here as well. There's no easy button for this, but once you've built this into the app, you're able to scale like crazy. I'd argue that the easier way to do this is to split the app's connections by reads.
"increased database trips." - Yes, if you break the data into multiple servers, the app is going to have to reach out to the network more. The key is to implement caching as well so that some of this data can be stored in lower-cost, higher-throughput, lock-free systems. The fastest query is the one you never make.
I've also laid out more pros and cons to dividing up multi-tenant databases here, such as performance tuning on individual shards, different backup/recovery strategies per shard, and schema deployment challenges.
Best Answer
It sounds like you will be replicating from 1 db server (the web backend) to 1 reporting server and that the data will all be flowing in one direction. If what I wrote is correct I think that transactional replication will be the best fit to your needs - one of the nice things about replication is that you donĀ“t have to include all the tables (indeed you can filter out columns or rows if required). Start by checking the tables you want to replicate have a primary key defined.