Yesterday, I attended a webinar by O’Reilly on how to reduce the pain of building out distributed applications. The focus was on scalability, which makes sense, since this is why you would want to distribute your applications.
Apart from the host’s unfortunate resemblance to Little Lord Fauntleroy, there was some interesting observations to be made. To wit:
Engineers versus Ops
When there’s an issue affecting your customer in large systems, it is most likely an engineering issue, especially in emerging products. You need to staff up on Engineering talent for your projects at a much greater rate than Ops.
Data is not always relational
Data these days is more than OLAP stuff. Things being captured and crunched include data graphs, key-value pairs, etc. So, something non-SQL based might be called for as a datastore. Only a handful of SQL features are used in most large data projects. As the data sets get larger, SQL gets less useful.
Real-time versus Batch Processing
Something to consider. How is your data being created, in one-sy/two-sy fashion online, or in large grabs of data. This will affect your basic understructure.
Cost of Research
It is very easy to under-estimate the cost of research when moving into a new area. Executive management wants hard numbers to be able to plan and manage costs, but anybody who’s developed new systems knows that costs tend to be unpredictable because you just don’t know what you don’t know yet.
What is your experience involving Big Data and Distributed Applications?