Building Out Distributed Apps (Big Data)

Yesterday, I attended a webinar by O’Reilly on how to reduce the pain of building out distributed applications. The focus was on scalability, which makes sense, since this is why you would want to distribute your applications.

Apart from the host’s unfortunate resemblance to Little Lord Fauntleroy, there was some interesting observations to be made. To wit:

Engineers versus Ops

When there’s an issue affecting your customer in large systems, it is most likely an engineering issue, especially in emerging products. You need to staff up on Engineering talent for your projects at a much greater rate than Ops.

Data is not always relational

Data these days is more than OLAP stuff. Things being captured and crunched include data graphs, key-value pairs, etc. So, something non-SQL based might be called for as a datastore. Only a handful of SQL features are used in most large data projects. As the data sets get larger, SQL gets less useful.

Real-time versus Batch Processing

Something to consider. How is your data being created, in one-sy/two-sy fashion online, or in large grabs of data. This will affect your basic understructure.

Cost of Research

It is very easy to under-estimate the cost of research when moving into a new area. Executive management wants hard numbers to be able to plan and manage costs, but anybody who’s developed new systems knows that costs tend to be unpredictable because you just don’t know what you don’t know yet.

What is your experience involving Big Data and Distributed Applications?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s