Thursday, September 23, 2010

Postgres 9.0 hot-standby

Postgres 9.0 have a new feature "streaming-replication" + "warm-standby". In few words , it allow asynchronic replication between master and slave (there is a delay , but it is a small one) and the slave can do read-only queries.
synchronios replication is only planned for 9.1

It`s a good time to recap the different sharding/master-slave options and their roles.

At first you develop the code for one DB instance and you assume three things about it:
1. It will never fail - the machine,process and disks are eternal.
2. data-scale: It can grow is size as much as you want - terra-bytes of disks space.
3. query(users) scale : The machine is as fast as you want - It can supports millions of fast queries in parallel.

After few months , you deploy it and then find out that these 3 things are not so easy to achieve. Lets see what are the (Postgres) solutions for them.

query-scale
  1. Scale-Up : Still one machine, but buy the best one money can get - 8 cores,32GB are relatively cheap. But twice as strong machine will cost a lot more the twice the money. If you need x2 or x4 , switch to scale-out.
  2. Scale-Out :
    2.1 Scale reads not writes: (good for mainly read DB)
    Use one master server (read&write) and multiple read-only slave servers which get periodically updated from the master server. The master and slave share the same data.
    Postgres 9.0 supports it out-of-the-box
    2.2 Scale writes by sharding

Fault-tolerance
For transaction-DB where no record can be lost , this is a hard task. For data-warehouse which may loose the latest few minutes of updates , this is much simpler.

No data loss:
  1. Shared-disks : the master writes to a shared storage. When it fails the slave mount it and loads the data. It is a "code-standby" as it can takes few minutes to recover.
  2. pgpool II - slows the updates on the master!
  3. synchronios log shipping (should be in postgres 9.1) - does not (considerably) slow the master. Failover take 15 seconds (10 to detect failure , 5 to start the warm-standby)
Some data loss allowed:
  1. Postgres 9.0 warm-standy & streaming-replication
  2. Slony-1 updates the slave using triggers

Data scale
Start with more disks on the machine , then scale-out with sharding.