Jul 31 2009

SimpleDB evaluation

[The following post was created a year ago, but never published]

Cloud Computing developer Alexander Tolley has made an evaluation of SimpleDB. Here are his findings.

Data Setup:

Build 2 domains – one very small (20 items), the other with ~ 100k records (items).  These were modeled on the “sweater” demos in the “how to’s”.  I create variable numbers of attribute values 1-3 for one field.  20 items had data that was unique enough to be separated from the main data and reliably searchable.  This was used to populate the small domain and to populate the first 10 and last 10 items of the large domain.

Latency Test.

I was interested in the latency for the DB lookup for the 2 domain sizes.  I tested with simple query lookups for the 20 items in both cases, and assumed any time differential was due to database size.

Result:
There is no measurable different between the latency for the 2 domain sizes.  In other words, scaling to 100K records has no effect on latency.  Average query latency ~ 238ms.

Retrieving Attributes:

Next I wanted to determine how long it would take to retrieve the attributes for the 20 items in each case.  I did this using a simple serial request and also with a threaded approach.

Results:

For the serial retrieves, average total retrieve time ~ 3.5 seconds in both domain sizes.  Again no difference due to domain sizes.  For threaded retrieves, the same result between domains, but with shorter retrieve times – ~ 1.25 secs.  Parallel retrieves significantly reduce latency.

Conclusion

1. SimpleDB is effectively scale invariant for the data set size test – 100,000 items.  (~ 750 bytes/item).  Latency is most probably due to marshaling and unmarshaling the requests and responses (Java code – running on AMD 2.0 GHz, 2 GB memory).

2.  For my application, this performance is quite adequate (although I need to test build my application data with up to 1 million items).  The multiple attribute values (max 256) per column meets my expected needs and reduces my large table (domain) use from 2 to 1.   (I am already looking to reconfigure my schema to see where other tables can be collapsed -  this works well for one-to-many relations of RDBMSs.

3.  There are some more issues I would like to address in a separate email, but at this stage I am going to port my app to use SimpleDB as the DB storage mechanism and probably S3 for the large metadata files.

[As an aside, I was encouraged to see a contributor posting Erlang code to access SimpleDB.  I am very interested in using Erlang for the server code, especially when I get closer to using EC2 and really want to use the power of parallelizing my code for performance.]


Jul 23 2009

SPIN Selling and Traction

If you ever need to sell something to an organization (or even to an individual), and I’m sure you have at some point, Neil Rackham’s SPIN Selling and Steve Browne’s Traction offer excellent advice.

Traction provides a powerful acid test to evaluate your chances of selling your likeness of succeeding in your sales, “Closing your sale”. The test boils down to three questions to which you must be able to provide answers. If you can, you will probably land the sale. If you cannot, you probably won’t.

The questions are: Why but at all? Why buy from you? Why buy now?

As a salesperson (even temporary), it is your job to lead your prospect to the answers. If you cannot find an answer for the client to Why buy at all?, then he doesn’t have any use for your product and you are wasting his and your own time. If you cannot find an answer to Why buy from you?, then you can blame your marketing department for not having created a differentiator from your competition – and if you don’t lose the sale to the competition, you’ll lose much of the profit. And if you cannot find an answer to Why buy now?, then the prospect has no reason to go forward with the sale and will stall the order indefinitely in favor of other other tasks.

SPIN Selling provides a powerful persuasive framework, built from scientific experimentation. It is a series of techniques that uncover implied needs, develop them into explicit needs, then lead the prospect to imagine to consequences of not satisfying those needs. SPIN is an acronym for each of the steps.

Situation questions come first. You uncover the facts about the prospect’s situation that you couldn’t get by doing our homework ahead of time.

Problem questions come second. You ask about the prospect’s difficulties, dissatisfactions, or problems with the current situation. These are Implicit Needs.

Implication questions come third. You ask about the consequences, effects, or implications of the prospect’s situation.

Need-payoff questions com fourth and last. These are questions about the value, importance, or usefulness of a solution to the prospect’s problem. The Implicit Needs have been converted to Explicit Needs.

You should then expose the benefits of your product – how it meets the Explicit Needs of the prospect. This is much better than objection handling which is just confrontation, which is built on the assumption that if you beat your opponent at the objection game, he will purchase your product.

Bear these in mind next time you are selling something to someone – Traction’s Acid test and SPIN Selling’s sales process – and hopefully you’ll find more success!


Jul 15 2009

Building web applications – a Core / Context analysis

I’m not a fan of web-based site creation tools. You get locked in, you have to learn skills specific to the vendor which are of no further use, and if you bring in outside expertise, you’ll end up paying for the time they spend learning to use the platform. But there is space for them, for rapid prototyping, and temporary sites. In my opinion it is best to get someone to do it for you, unless the knowledge gained from resources spent adds to your own marketable value.

This is pretty much Geoffrey Moore’s Core and Context approach to managing businesses, applied to web development. Core is sustainable competitive advantage, and commands a premium in the marketplace; context is everything else. One’s Context can be another’s Core. For example, Payroll is most likely your context, but it is ADP’s core. You should externalize your context to a service provider whose core it is, as their marginal cost is likely lower than your full cost, they gain from economies of scale, etc.

Context is not necessarily unimportant. It can be mission critical, like website uptime or even payroll. Core is measured in Excellence, Context by Good enough. So if your business does marketing consulting, then a Good enough website is what you want. If you do Web-based CRM, then you want Excellent. In the former example, clients won’t pay you more if your website has the latest Scriptaculous effects, but they will if your counsel is truly awesome. In the latter, clients will love your product and be willing to pay more for it if it is excellent.

However do note that these are not rules, but guidelines. If you are strapped for cash and have a lot of free-time, things are different.

Part two on building web applications.

If your website is Core to you, then bare in mind the following:

Use other people’s code. Don’t reinvent the wheel. By this I don’t mean steal other people’s code, but use open source or licensed software for the components you can. Not only do you not have to write and maintain the code, you also get future improvements for free. This again applies the Core and Context methodology inside the application itself. You have to find out what parts of the application are Core and which are Context. This also means that you have to develop your integration skills, your capacity to bring parts together and make them work.

Use other people’s services. A corollary to the above is to look for web services and APIs, as well as the cloud computing technology (There was an excellent VLab session on this: http://www.vlab.org/article.html?aid=188) to replace components of your application. Hardware, updates, trips to the datacenter is probably not your idea of fun. Get a company like Cloud in Code to help (disclaimer: my company).

Iterate often. At the crux of the web 2.0 generation, and key benefit of being agile, is the ability to iterate through the measure / conduct experiments / adapt process quickly. This means building your application to make experimentation as easy as possible. This includes the user interface, for which I recommend Appcelerator.


Apr 23 2009

How the Conficker worm gained in perceived threat

Everybody on the net got scared of the Conficker worm, and it got much press, including the New York Times.

Bruce Scheier wrote about Conficker, and this quote caught my attention:

Conficker’s April 1st deadline was precisely the sort of event we humans tend to overreact to. It’s a specific threat, which convinces us that it’s credible.

The deadline is an honest signal according to the handicap principle. By mentioning a specific date, the worm is exposing itself. This exposure is more costly in terms of credibility if the worm does not subsequently perform, and therefore adds to its perceived menace.


Apr 17 2009

Conspicuous consumption is perfect candidate for taxation

It is interesting to see Conspicuous Consumption in the light of the Handicap Principle, and how it could lead to more efficient taxing.

Conspicuous consumption is the lavish spending on goods and services acquired mainly for the purpose of displaying income or wealth, as a means of attaining or maintaining social status.

How does it work? Buy a new hand bag for $1,000. Or buy a $200,000 car. Or a $10,000,000 yacht. Whatever it is, make sure your neighbors see it.

In doing so, you are displaying that you can afford a squander $1,000 on a hand bag. If you were poor, or tight on financial resources, you couldn’t. This behavior is costlier to poor individuals, to whom $1,000 represents a much larger portion of their income, than to better off ones. In other words, a behavior that costs more for someone with less of a trait than someone with more of it.

Sound familiar? That’s because conspicuous consumption is an honest signal. Is it a signal for richness, and the social status that accompanies it.

A perfect candidate for taxation

Since cost is not an issue for conspicuous consumption -quite the contrary, cost is a key element- it is a perfect candidate for being taxed. By heavily taxing goods that signal wealth -lets call these luxury goods- you make it even harder to attain, and hence more exclusive and attractive.

Countries like Denmark are very smart to have high taxes on luxury cars, for example. Likewise, the European Union has two rates for VAT, a normal one not to deceed 15%, and a reduced one not to deceed 5%. The reduced rate applies to first necessity goods, and the normal rate to the rest. The US in some states has no or close to no taxes on food, because food represents a large portion or a poor person’s budget, and taxing food taxes them a disproportionate amount.

I would like to see that taken a step further, and see different rates for different categories of purchases. We want to use taxation to improve people’s welfare in general. To do so, we must tax bad behaviors, and subsidize good ones (as a side-note, I’ve often heard that taxes is punishing not sharing. As such, taxing income is punishing people for creating wealth). If we combine this with the other goal of taxing the wealthy more than the poor, we end up with a matrix of items that should be taxed and subsidized differently.

Take alcohol. I think we can agree that drinking alcohol is a bad behavior (antioxidants can be found elsewhere, if beneficial at all). However, a uniform tax on alcohol disproportionately affects poor people over better off ones. But it also happens to be that poor people drink more beer, and wealthy people drink more wine. Therefore a good tax system should tax wine at a higher rate than beer.

Or take food. Some foods are more nutritious than others, while some better for a particular diet. Red meat should be taxed higher than fish (CO2 emissions per kg of red meat > fish), and even within these categories, some products should be taxed higher than others. For instance, salmon should be taxed higher than sardines.

While this system would be quite complex, this is increasingly made feasible as we move towards the computerized and networked economy. Tax revisions could be pushed out to super market chains, then updated on their electronic pricing system. Or why not have an API for value added tax? Maybe a bit too much, but technically feasible.

Another interesting thought is to make the amount someones pays out in taxes an honest signal about wealth.

So Governments should pay attention to goods and services that signal wealth, whether a byproduct (behavior) or a deliberate status-seeking choice, and tax them accordingly.


Apr 14 2009

Paper versus Pixel

People who know me know that I am a proponent of the paperless office, and even the diskless office (aka. the cloud office, which involves using a combination of software as a service and cloud software for everything). I use a variety of tools to achieve this, Google supplying a large amount of these, and Mozilla a key one with Weave.

However, in writing my thesis, I had to switch to paper for the mathematics / game theory model. It was just that much quicker to write new equations and solve them. While limited, paper for math certainly is more agile.


Apr 11 2009

Conditions for Honest Signals

In this blog post, we study the conditions for which a signal can be trusted when sent to a distrustful party.

The behavior of animals in the wild is often puzzling. Why do babies cry so loud? Why do gazelles jump vertically when they see a predator? It turns out these are primitive forms of communication in the form of signals. Gazelles signal their health and fitness, babies signal their hunger or fear.

But… How do these signals come about?

A branch of mathematics called Game theory provides an insightful framework for understanding how these behaviors come to be, and why they are the way they are. Seen through the lens of Game theory, the previous examples are forms of signaling games.

A signaling game, as defined in Wikipedia, is a dynamic game in which two players, the sender (S) and the receiver (R), interact. The sender has a certain type τ, which is given by nature. The sender knows his own type while the receiver does not know the type of the sender. Based on his knowledge of his own type, the sender chooses to send a message from a set of possible messages M = {m1, m2, m3,…, mj}. The receiver observes the message but not the type of the sender. Then the receiver chooses an action from a set of feasible actions A = {a1, a2, a3,…., ak}. The two players receive payoffs dependent on the sender’s type, the message chosen by the sender and the action chosen by the receiver.

An example from nature

Lets consider a lion chasing a gazelle. The observer would notice that in such cases, the gazelle, upon detecting the lion, will start stotting. By doing so, it is signaling its fitness and probable ability to outrun the lion. The lion can then decide not to chase the gazelle, and wait for another (better) opportunity.

Both the lion and the gazelle have an interest in avoiding unsuccessful chases. Both lose energy during the chase, and the gazelle loses out doubly as the time spent running is time not spent grazing. Therefore evolution puts pressure on these species to develop something in order to avoid this particular outcome. This something is a signal-producing capability, that provides orchestration by communication.

Framing the game

In this example, the signal sender S is the gazelle, and the signal receiver R is the lion. The signal is either stotting m1 or no stotting m0, so M = {m0, m1}. The lion’s feasible actions are chase a1 or ignore a0, so A = {a1, a2}. The payoffs are chance to catch for the lion, and chance to get caught for the gazelle, which are a function of the gazelle’s fitness τ, the action taken by the lion, and the signal emitted mi; hence Plion= f(τ, aj, mi), and Pgazelle= g(τ, aj, mi). We can now draw the payoff matrix:

figure 1: payoff matrix in a signaling game

figure 1: payoff matrix in a signaling game

Where m0= 0, and m1= m.

Conditions for honest signaling

Now that we have framed the example, we can analyze the different strategies available to the players. Three factors influence the payoff outcome as we have seen; the gazelle’s fitness, the signal it chooses to send (as defined by the type of game, and since there would be no use in sending it if it had no impact), and the lion’s reaction.

Lets examine the conditions to which signaling is beneficial to both parties. That is, how does signaling drive out wasteful unsuccessful chases, which use up energy for nothing, through evolutionary pressure?

For this, we must establish relationships between fitness, energy expenditure due to chasing or fleeing, and emitting the signal. We’ll function on the following relationships for the fitness cost of the signal Csignal and the fitness cost of a chase Cchase:

Csignal = h(τ, m) and Cchase = k(τ)

If the signal mi has an effect on fitness that is not related to τ, then

Csignal = h(τ, m) = h(τ – m)

and

Pgazelle= g(τ, aj, mi) = g(h(τ, mi), aj)  = g(τ – mi, aj)

If we furthermore assume that a0has no effect on payoff, then

g(τ, a0) = g(τ) and g(τ – m, a1) = g(τ – m)

We can then establish the following payoff matrix:

figure 2: gazelle payoff matrix if signal has an invariant effect on payoff

figure 2: gazelle payoff matrix if signal has an invariant effect on payoff

Lion payoff is calculated with function f instead of g.

We can now calculate the difference in fitness between with and without the signal for the lion and the gazelle. Lets assume that f and g are linear functions.

If lion does not chase, then

Δfitness= g(τ) – g(τ – m) = g(m)

but if it does, then

Δfitness= g(τ  – k(τ)) – g(τ – k(τ) – m) = g(m)

The lion will chase the gazelle if its payoff for chasing is larger than payoff for not chasing

f(τ  – k(τ)) > f(τ)

f(τ  – k(τ)) – f(τ) > 0

f(k(τ)) > 0

The gazelle will emit the signal if its payoff for emitting the signal is larger than for not emitting,

In the case the lion chooses to chase, then

g(τ  – k(τ) – m) > g(τ)

g(τ  – k(τ)) – g(τ) > 0

g(k(τ)) > 0

In the case the lion chooses not to chase, the calculation is very much the same.

As we can see, it does not make sense for the gazelle to send a message in this particular case. This is because we are analyzing a single occurrence of a signaling game. In reality, the lion and gazelle face a repeat game. For example, Tit for Tat is not a strategy for single shot prisoner’s dilemma games, but it is optimal for repeat ones.

What do we learn from this?

That Game theory provides us with a framework for understanding the theoretical underpinning of phenomena observed in nature, and helps build models of prediction for anticipation and comprehension.


Apr 10 2009

Signal Oriented Marketing

First rule of Fight Club (or Marketing in this case) is to have a catchy label.

So I propose Signal Oriented Marketing. It is actually close to the innovation of Object Oriented Programming. The key innovation was the message passing between objects, as much as the objects themselves.

Looking at marketing from a perspective of signal can be extraordinary useful.

Take Market Research. One of the criticisms is that segmentation is often done using attributes that can be gotten at rather than segmentation by care abouts. It is segmentation by demographics rather than by job to be done.

I am proposing a third way: you look at how you are going to signal the attributes of your offering and you segment by that. If you can devise an honest signal to a subset of potential consumers that is what you should use as your segment. If you can not, don’t add the feature set.

Specialized language is often a good avenue.  If I say “ay, there’s the rub” in a discussion, you hone in on it as the crucible of the problem.  We have established that we are both fairly well educated outside the domain at hand.

What does this imply? Don’t shy away from specialized language in advertising. Using plain well understood language is for losers.

Red Bull used this signaling approach in establishing themselves. They deliberately sought out honest signals. Like limiting their product to certain clubs and bartenders that “saw” the benefits. They had the product delivered via special carriers, like FedEx 12 cans sent overnight to a club, etc. Things that are expensive and hard to fake.

The result was that people believed the signal and as such, bought into the benefit of Red Bull. It obvious has an element of the scarcity phenomena as per Cialdini’s Influence, but there is more to it and analyzing this from a perspective of signaling is a better approach.


Apr 5 2009

The Hidden Cost of Customization

If you’re like me, you probably have a bunch of addons for Firefox. I have between 5 and 20, depending on whether I’m running the latest beta (3.1 beta 3), or the stable version.

The problem with this, as I have just experienced, is that you not only increase the chance of failure, but you also increase the cost of repair. Let me explain.

Something happened to my Firefox installation, and anything typed in the location bar would grind Firefox to a halt, then eventually crash it. I tried re-installing it, but the problem remained. I then upgraded to Firefox 3.1 beta, without any further success. Since Firefox has such a large install base, I assumed someone else would have the same problem, and eventually there would be a fix. The plan was to use Safari (4 beta) until then.

However 3.0.9 came out and when I installed it, Firefox didn’t work any better. I resolved myself to do some sleuthing, and eventually figured the problem out, and solved it. The issue was caused by a mix of Google Gears, Adblock, PasswordMaker, and some others.

What’s to learn from this?

Customization is costly. Maintenance costs and instability are exponential to level of customization: if each component has a 1% chance to break per month, and you have 20 components, then you have an 18% chance of failure per month (1 – .99^20). Bear that in mind when you customize anything.

You also lose in security: the more you customize your software, the more lengthy and intricate it becomes to upgrade to the next version. This means that you run software that is always a few patches behind, and you become an easy target for hackers. The vulnerabilities of the version you use are public and easy to exploit.

Rising Probability of Similar Error

Furthermore, the more you customize, the more you reduce the userbase of similar installations. If you use default Firefox on default Ubuntu, then whatever error you encounter, millions of others will as well, and you can count on a patch to soon follow. If you use a highly modified version of Firefox on Ark Linux, you’ll have to do the detective work and patching yourself.

A corollary to this is that it makes an excellent argument for buying Dell when considering using Linux. You’ll have much less problems as the userbase is much larger. If you use a Fujitsu LifeBook 7010, with a tiny userbase, you’ll have many driver issues.

Analogy to Firefox addons

Coming back to my Firefox addon example, I could have waited years until this problem was found and solved. I would have to wait until someone else using Firefox AND Gears AND Adblock AND PasswordMaker AND Mac OS X find the error, come up with a fix, and share with all. This means that you just can’t wait bugs out if you aren’t in the mainstream. Whereas you can if you use plain vanilla Firefox on Ubuntu.

EDIT: Apparently, this is Oracle’s approach. Paraphrasing them, they advise to “stick with us; we will do it all, and make sure everything works together, and do not customize as you can not then take advantage of our innovations“. This goes in the same direction as what I argue in this post. Customization is costlier than it appears.


Apr 2 2009

Revevol.eu competition earns me a Netbook

Some time ago, a friend suggested I enter the YouOnTheWeb competition organized by Revevol.eu. All I needed to do was give my name, phone number, and email address to gain entry. So I did. The Jury would then google for all participants, and award the one with the best online presence with a Samsung NC10 Netbook.

I was told Monday that I had won, much to my surprise.

I am to meet the Revevol team on Friday next time I am in Paris, and there will be some picture-taking and formalities. I look forward to it, as Revevol is doing something similar to what I am doing: promoting the use of the Cloud in the enterprise. They provide services in adopting Google Apps and other SaaS vendors. I, with Cloud in Code, provide services in adopting Cloud Computing.

So I went home with my brand new Samsung NC10 netbook, wondering what I was going to do with it. It had Windows XP pre-installed. I thought to myself that I would keep it that way, since all I need is Firefox. 4 restarts later, I started downloading Ubuntu Netbook Remix out of frustration. It is amazing how three service packs later, Windows XP is still crap. However the Ubuntu page dedicated to the NC10 mentions some functionality lacking until the next kernel, so I’ll wait for 9.04 due this month (Jaunty Jackalope).

Incidentally, Mozilla Labs announced the general beta availability of Weave, a Firefox addon that lets you sync your Firefox profile across multiple computers (only available on Firefox 3.1 beta). This was a perfect occasion to install on both my MacBook Pro and the NC10. And it works, allowing me to get the same results on the Smart Location bar, sync open tabs, keep same bookmarks and passwords (I use a unique password for every site, a hash of the site url concatenated with a master password, a pain on mobile devices). Works great, highly recommended if you have multiple computers.