SimpleDB evaluation

[The following post was created a year ago, but never published]

Cloud Computing developer Alexander Tolley has made an evaluation of SimpleDB. Here are his findings.

Data Setup:

Build 2 domains – one very small (20 items), the other with ~ 100k records (items).  These were modeled on the “sweater” demos in the “how to’s”.  I create variable numbers of attribute values 1-3 for one field.  20 items had data that was unique enough to be separated from the main data and reliably searchable.  This was used to populate the small domain and to populate the first 10 and last 10 items of the large domain.

Latency Test.

I was interested in the latency for the DB lookup for the 2 domain sizes.  I tested with simple query lookups for the 20 items in both cases, and assumed any time differential was due to database size.

Result:
There is no measurable different between the latency for the 2 domain sizes.  In other words, scaling to 100K records has no effect on latency.  Average query latency ~ 238ms.

Retrieving Attributes:

Next I wanted to determine how long it would take to retrieve the attributes for the 20 items in each case.  I did this using a simple serial request and also with a threaded approach.

Results:

For the serial retrieves, average total retrieve time ~ 3.5 seconds in both domain sizes.  Again no difference due to domain sizes.  For threaded retrieves, the same result between domains, but with shorter retrieve times – ~ 1.25 secs.  Parallel retrieves significantly reduce latency.

Conclusion

1. SimpleDB is effectively scale invariant for the data set size test – 100,000 items.  (~ 750 bytes/item).  Latency is most probably due to marshaling and unmarshaling the requests and responses (Java code – running on AMD 2.0 GHz, 2 GB memory).

2.  For my application, this performance is quite adequate (although I need to test build my application data with up to 1 million items).  The multiple attribute values (max 256) per column meets my expected needs and reduces my large table (domain) use from 2 to 1.   (I am already looking to reconfigure my schema to see where other tables can be collapsed -  this works well for one-to-many relations of RDBMSs.

3.  There are some more issues I would like to address in a separate email, but at this stage I am going to port my app to use SimpleDB as the DB storage mechanism and probably S3 for the large metadata files.

[As an aside, I was encouraged to see a contributor posting Erlang code to access SimpleDB.  I am very interested in using Erlang for the server code, especially when I get closer to using EC2 and really want to use the power of parallelizing my code for performance.]


Leave a Reply

You must be logged in to post a comment.