Thursday, June 29, 2006 3:31 PM
by
dodyg
Database war stories
"
Web 2.0 and Databases Part 1: Second Life: Like everybody else, we started with One Database All Hail The Central
Database, and have subsequently been forced into clustering. However,
we've eschewed any of the general purpose cluster technologies (mysql
cluster, various replication schemes) in favor of explicit data
partitioning. So, we still have a central db that keeps track of where
to find what data (per-user, for instance), and N additional dbs that
do the heavy lifting. Our feeling is that this is ultimately far more
scalable than black-box clustering.
Database War Stories #2: bloglines and memeorandum: Bloglines
has several data stores, only a couple of which are managed by
"traditional" database tools (which in our case is Sleepycat). User
information, including email address, password, and subscription data,
is stored in one database. Feed information, including the name of the
feed, description of the feed, and the various URLs associated with
feed, are stored in another database. The vast majority of data within
Bloglines however, the 1.4 billion blog posts we've archived since we
went on-line, are stored in a data storage system that we wrote
ourselves. This system is based on flat files that are replicated
across multiple machines, somewhat like the system outlined in the
Google File System paper, but much more specific to just our
application. To round things out, we make extensive use of memcached to
try to keep as much data in memory as possible to keep performance as
snappy as possible.
Database War Stories #3: Flickr: tags are an interesting one. lots of the 'web 2.0' feature set doesn't
fit well with traditional normalised db schema design. denormalization
(or heavy caching) is the only way to generate a tag cloud in
milliseconds for hundereds of millions of tags. you can cache stuff
that's slow to generate, but if it's so expensive to generate that you
can't ever regenerate that view without pegging a whole database server
then it's not going to work (or you need dedicated servers to generate
those views - some of our data views are calculated offline by
dedicated processing clusters which save the results into mysql)."
Dare
Comment Notification
If you would like to receive an email when updates are made to this post, please register here
Subscribe to this post's comments using
Comments