Monday, December 28, 2009

Why small objects cloud hosting is so slow? Amazon S3 and SimpleDB performance

Is it possible to use S3 as a scalable storage for small objects?

S3 is perfect for large objects (1GB file) but is quite bad for small objects like those a media-server needs (songs,images).
Behind the scenes S3 uses standard disks with varying levels of replication and striping. Standard disks have poor random-access, but a good sequential throughput.
Each object has a minimal access time, caused by the disk seek time.see disk throughput and seek-time .
The real solution for small files is either caching the most-accessed in-memory (done by most of the Content-delivery-networks like CloudFront).
If you want to retrieve thousands of small-objects per second and not use a cache, then the only way to do it fast is to spread them across as many physical disks as possible. The number of disks does grows linearly with large files, as few files take a whole disk.
But with small files, the number of disks grows very slowly (500GB contains thousands of images) which may all sit in one machine. That`s why S3 is not appropriate for small files.
see read-performance of S3

For a real fast , cost-effective, random-access of small files the only solution will be different hardware: Solid-state disks. They are becoming cheaper every year and their random access speed is close to sequential performance.
Why there is no CDN with this solution yet? Because caching of the most-access is enough for the typical website. The access pattern of most users is not really random.




Link