Amazon RDS for PostgreSQL – How Amazon Creates RDS for Postgres

amazon-rdspostgresqlschema

I am really curious about the infrastructure of Amazon. When I create a new instance of RDS, it gives me a host, username, password.. Everything. Behind the curtains, how do they do this? It´s some infrastructure like Docker to run multiple instances of PostgreSQL? How do I replicate this?

My problem is: I have many users and I would like to let then manage their own tables. I thought that it could be done with schemas, but they still can see the other users tables, and I dont want that. Any sugestion?

Best Answer

Amazon doesn't talk about this much and the servers are intentionally locked down, so it's hard to be completely sure.

They're EC2 instances that run a custom AMI and have automation tools - in-house, or something like Puppet/Chef/etc. These automation tools communicate with the AWS control panel over web service APIs, SSH push access, etc, and are responsible for managing the PostgreSQL configuration, starting/stopping/reloading the server, etc.

Each EC2 instance runs a single PostgreSQL database server, with its own users, roles, etc.

It's basically just a sealed AWS EC2 instance that you don't have much access to, you just get a locked down non-superuser PostgreSQL connection. Nothing magic.

This isn't the only way to do it. Heroku used to use OpenVZ on top of EC2 to partition EC2 instances into smaller containers, for example. I think these days they always have one EC2 instance per database though.

It sounds like what you want is multi-tenant hosting. You have many options for this:

One server per user with a single PostgreSQL instance on each server (EC2 or Heroku style)
one PostgreSQL instance per user on a single host server;
one database per user on a single PostgreSQL instance;
one schema per user in a single PostgreSQL database;
a single set of tables with your application limiting access to data within the tables based on enforced WHERE clauses or row-level security policies.

Which to choose depends on trade-offs involving isolation of users, performance, and cost.

There aren't currently any convenient canned recipes to do this that I know of, but searching for "multi-tenant postgresql" will help you find more information.

Related Solutions

Mysql – Random write freezes

I think the issue is due to the choice of the clustered index. From MySQL docs, Clustered and Secondary Indexes:

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

Also check the answer by @marc_s in this SO question: How to choose the clustered index in SQL Server?, where he mentions:

According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

Unique

Narrow

Static

And if you can also guarantee:

Ever-increasing pattern

then you're pretty close to having your ideal clustering key!

Now, your clustered index is the (Primary Key):

hash varchar(5) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,

which (lets go through the check-list) is:

Unique (yes, OK)
Narrow (yes, OK)
Static (perhaps, you know that)

but is probably not:

Ever-increasing pattern (No, it probably isn't)

So, what happens when you use a non-ever-increasing clustered index?

I can't answer better than Kimberly L. Trip: Ever-increasing clustering key - the Clustered Index Debate..........again!

If the clustering key is ever-increasing then new rows have a specific location where they can be placed. If that location is at the end of the table then the new row needs space allocated to it but it doesn't have to make space in the middle of the table. If a row is inserted to a location that doesn't have any room then room needs to be made (e.g. you insert based on last name then as rows come in space will need to be made where that name should be placed). If room needs to be made, it's made by SQL Server doing something called a split. Splits in SQL Server are 50/50 splits - simply put - 50% of the data stays and 50% of the data is moved. This keeps the index logically intact (the lowest level of an index - called the leaf level - is a douly-linked list) but not physically intact. When an index has a lot of splits then the index is said to be fragmented. Good examples of an index that is ever-increasing are IDENTITY columns (and they're also naturally unique, natural static and naturally narrow) or something that follows as many of these things as possible - like a datetime column (or since that's NOT very likely to be unique by itself datetime, identity).

Note that despite the mention of SQL-Server, the same concept applies to InnoDB clustered indexes as well. I suppose that the clustered index has 2 issues:

When you are inserting a new row (the "random" hash guarantees that) it gets inserted in a random location of the index. This means that it sometimes will find no space there available to be inserted (note that InnoDB always leaves some space free in the index but when that free-available space is filled) there has to be some rearrangement of the index - and that takes time.
What the rearrangement is also causing over time is fragmentation of the index. Which will eventually make other queries and statements slower.

Postgresql – Get postgres snapshot from Amazon RDS

You can't download a snapshot from RDS you have to use a tool like pg_dump

This has already been answered multiple times on regular Stack Overflow: https://stackoverflow.com/questions/14916899/download-rds-snapshot

Best Answer

Related Solutions

Mysql – Random write freezes

Postgresql – Get postgres snapshot from Amazon RDS

Related Question