Building Out Distributed Apps (Big Data)

Yesterday, I attended a webinar by O’Reilly on how to reduce the pain of building out distributed applications. The focus was on scalability, which makes sense, since this is why you would want to distribute your applications.

Apart from the host’s unfortunate resemblance to Little Lord Fauntleroy, there was some interesting observations to be made. To wit:

Engineers versus Ops

When there’s an issue affecting your customer in large systems, it is most likely an engineering issue, especially in emerging products. You need to staff up on Engineering talent for your projects at a much greater rate than Ops.

Data is not always relational

Data these days is more than OLAP stuff. Things being captured and crunched include data graphs, key-value pairs, etc. So, something non-SQL based might be called for as a datastore. Only a handful of SQL features are used in most large data projects. As the data sets get larger, SQL gets less useful.

Real-time versus Batch Processing

Something to consider. How is your data being created, in one-sy/two-sy fashion online, or in large grabs of data. This will affect your basic understructure.

Cost of Research

It is very easy to under-estimate the cost of research when moving into a new area. Executive management wants hard numbers to be able to plan and manage costs, but anybody who’s developed new systems knows that costs tend to be unpredictable because you just don’t know what you don’t know yet.

What is your experience involving Big Data and Distributed Applications?

Sharepoint 2010 – Why is Text Layout Always Disabled?

Sharepoint 2010 uses the feature rich ribbon, similar to what was introduced in the Microsoft Office 2007 suite.  This ribbon provides users with many capabilities, such as text editing and content publishing.  Among these features is the ability to change the format of a rich text control, adding headers, footers and columns.

When designing a publishing site, I had hoped to make use of this neat feature to keep Page Layout designs simple and easy to update for content contributors.  However, when designing the publishing site, I noticed that I was unable to make use of this feature of the ribbon, as it was always disabled.

Researching the problem, I was having a hard time finding out why this button was disabled.  I checked security settings and the control properties in my Page Layout aspx file, nothing seemed to solve this dilemna.

It turned out, after asking on a Sharepoint forum, that the Text Layout feature is only available for wiki sites.  The Publishing Portal and Enterprise Wiki are built off of the publishing framework, which replaces the Text Layout button with the Page Layout button.

So custom layouts will have to be design specifically at the Page Layout.  If a layout requires 3 columns of data, then a Page Layout has to be created that allows 3 columns of data.  Lesson learned, I am providing this as a tip for anyone else having a hard time figuring out why they can’t use the Text Layout button in their Publishing site.

Distributed Capture & Document Capture

Distributed Capture & Document Capture

Capture is only a part of the ECM universe, but a crucial part nonetheless. Once a document is captured into an Enterprise Content Management system, it must be stored, perhaps put into a workflow process, archived, and made available for retrieval. Retrieval is in many ways the main thrust of an ECM system (no point putting it in there if you can’t ever see it again); retrieval is dependent on the index values associated it with it, which brings us back to capture.

Capture is the process of getting documents (and their data) into the system. Distributed Capture is the mechanism by which documents from a variety of locations (near and far) enter the system. The easiest way to do this is to utilize the file system. When different offices (or locations — work from home, anyone?) of a company are on the same network, specific locations on the shared file system can be designated for various purposes. Different directories can be used to input different kinds of documents.

I thought we were going to be paperless by now

This type of taxonomy works okay for existing electronic documents (Word files, spreadsheets, PDFs, etc); but what about hard-copy? The seemingly ubiquitous paper which exists in our so-called paperless office? Well, it needs to be scanned in. You want documents classified in a consistent manner, and the metadata (index values and other interesting info about the document) as accurate and as consistent as possible.

Consistency is key. When setting up a company-wide ECM system, it is a a key success indicator that everybody to follow the same set of procedures and guidelines involved in getting documents into the system. This can be accomplished by having a distributed capture system available.

The company I work for makes and sells a distributed capture system today. As we go through our roadmap discussions for where we want to take the product to solve customers’ future problems, we developers have have to grapple with some fundamental issues, mainly, what is the best technology to use as a platform.

It’s easy to imagine using the web to provide distributed document capture throughout your enterprise. You have centrally managed web servers. Everyone has a web browser on their computer (and cell phone, for that matter). In fact, anyone who’s ever attached a document using an html-based email program has already exercised the base technology necessary for a distributed capture system. One key advantage of Distributed Capture is that you get rid of paper at the source; take a moment to think about the implications of that. It’s okay, I’ll wait.

What else is needed…
There are two main improvements to simply uploading a document by way of a web page. One is the acquisition of the paper document, the other is the user-experience and business process to build into the hosting program. I’ll go into the physical acquisition in a later post, but the user-experience of a distributed capture system has to provide two things to be successful. It must be Dead Simple to Use and it must provide the functionality necessary to get good data into the system.

Our checking with users shows again and again that a single button is an attractive interface, with more functionality exposed as needed. One key question developers raise is what technology to build the interface in?

Technology Pros Cons
HTML Standards compliant, supported by all browsers. Primarily a static user interface. AJAX can add some Zing to the interface, but is problematical in certain situations (back-button, anybody?)
Flash Ubiquitous; Flash player in something like 90% of all browsers. Began life as an animation scripting language, although ActionScript 3.0 is more sophisticated. IDE support is poor. Hard to get my head wrapped around the timeline model.
Silverlight Microsoft integration and toolset. Microsoft has an army of developers working on tools and technologies; big changes in how Microsoft handles internet computing are emerging. Current market adoption is a little slow. Microsoft talks the big talk about cross-platform now, but has a history of embracing, extending, then co-opting technology (in my opinion)
JavaFX Ubiquitous. Many very good VM’s out there. Java itself is well suited to backend, server-side development. UI is not Java’s strong-suit; AWT ring a bell?
Platform Specific Code Leverage native functionality, look and feel. Lots of code bases to implement and maintain. Cross-platform toolkits and libraries tend to dumb-down the functionality to the lowest-common denominator.

I’m sure anybody reading this has ideas of their own about the pros and cons of the platforms listed out, and perhaps other ideas to add to the list. I welcome your comments.

Share on Twitter

IPM Process Scripts Tips and Tricks

Process workflow scripts, the very mentioning can make a developer shudder.  However, scripts are the unsung heroes of the Process workflow and have been an integral part of ImageSource’s Customer Solutions.  While the Oracle IPM Process workflow now supports using .NET dlls as script events, that was not always the case.  VBScript code was originally used to handle workflow events and that option for such is still around.  There are a few reasons why you might still need to code a process script in VBScript, either because it is an upgrade of legacy code, the system is a pre 7.7 version of IPM or if the required custom functionality is so small that writing a script is much quicker than a full blown .NET Module.

Below are some helpful tips and tricks if you find yourself writing a VBScript workflow event.

Write Logs That Make Sense
While flagging parts of the code as A, B and C might make sense to you while developing the script, trying to remember what they meant years down the line (like for an enhancement or bug fix) will be a futile exercise.  Better yet, try explaining cryptic log messages to the Systems Engineer or another developer.  After they’ve finished giving you annoyed looks, it might be time to make the logging messages understandable to another human being.  When writing logs, output something useful like current field values, sql parameters or the result of conditional operations.  Just make sure not to log sensitive data and also provide an option to turn logging off if it is not needed.

Error Handling Is Key
By default, IPM scripts use the following option: On Error Resume Next.  This means that if an error occurs in the code, it will continue operating until the error is handled or the code finishes.  Normally, error handling code is put after code like SQL calls, file system operations and usage of the IPM SDK. However, sometimes developers can get lazy and forget to put error handling in the code where it would be useful, only to have it handled later on with an error log message that does not make sense.  For example take a look at the following code:

oCon.Open "File Name=C:\process\connection.udl"
oRS.Open "select * from DATA_TABLE_1 where ID=" & iId, oCon, Readonly, Readonly
If Not oRS.EOF Then
    iValue = oRS("VALUE_1")
End If
oRS.Close

oCon.Execute  "insert into DATA_TABLE_2 (VALUE_1) values (" & iValue & ")"
If Eval("Err.Number <> 0") Then
     objExecutionContext.ErrorDescription = objExecutionContext.ErrorDescription & _
          "Example Script: Error inserting data into DATA_TABLE_2 - " & Err.Description & vbCrLf
     oCon.Close
     Set oCon = Nothing
     Exit Sub
End If

If an error were to occur at either the open connection or select data steps, the error log would still report that there was an error inserting data into DATA_TABLE_2.  A better solution would have error handling checks after the open connection and sql select lines of code.

Remember To Set Objects
Working in the .NET environment, you tend to forget some of the nuances of writing unmanaged code.  One of these is remembering to set objects when they are assigned in VBScript.  This can be the source of much unneeded frustration when the code continually errors out without a very helpful reason why.  Always keep in mind that when not assigning a primitive variable, always use Set.

Clean Up After Yourself
Another thing to keep in mind when writing scripts is to clean up your variables when you are done using them.  This is especially important for preventing memory leaks in the script, which might have to run 24/7.  Always remember to set your objects to nothing, to close all sql connections and recordsets, and even clear string values when they are not needed.

Be Careful With String Concatenation
VBScript string concatenation has an N-Squared cost.  Repeatedly concatenating a string within a loop is a very expensive operation.  While okay for small strings, it is not recommended for building out long strings, like from database information.  The suggested alternative to the built in concatenation is to preallocate memory for the concatenation operation.  A more detailed solution can be found at Microsoft’s support site.

Hopefully this advice will help you in the world of IPM Process script writing.

Richard Franzen
Developer
ImageSource, Inc.

Share on Twitter

When In Doubt, Ask the User: And You Should Always be in Doubt

It’s a fact of life: Developers don’t always understand the end-user experience. Software programmers, as a group, are more comfortable designing and writing code against a written specification.

“The spec says to put the log file in directory XYZ, so that’s where I’ll put the log file.”

We are all sometimes guilty of this kind of thinking: “Well, the project plan doesn’t require a UI for the config file, so maybe I can get away without one, even though I know it will make life easier for everybody but me (because I have to create it)…” but it will always return to bite you in the backside. If there’s a concern in the back of your mind that you are maybe not doing the right thing, don’t just rely on the written documentation, go check with somebody.

The developer doesn’t always have direct access to the customers (and sometimes for very good reasons), but on each and every project we undertake, there should be a person (or even a group in a large organization) who is the Customer Advocate.

The role of the Customer Advocate could be performed by the Project Manager, the QA Dept, Tech Support,  the Product Marketing group, the Program Manager, or even, *gasp*, the customer. This role is crucial in helping make informed decisions about the product you are working on; this person is helping the customer “scratch” the “itch” which caused them to want to buy your product or engage your services.

Don’t just assume that your idea is going to work, check with somebody. A sample size of two is infinitely better than just you and your keyboard.

What brought all this to mind was a simple poll my boss sent out on LinkedIn, asking folks which additional vendor we might invite to our annual ECM Conference (NEXUS 2009). Instead of just guessing, he put out a poll.

Anybody who’s taken a Statistics class could argue with the methodology, but the point of this particular poll is not to predict a presidential election, but rather to solicit input and get some guidance.

This kind of thing can only make your software better.

If you want to see the LinkedIn Poll, you can try it out here

Martin O. Waldron
Program Manager, SW Development
ImageSource, Inc.

Share on LinkedIn   Share on Twitter

Make It Dead Simple

Make it Dead Simple

The whole point of ECM (Electronic Content Management) is to manage electronic content, meaning you have to have a way to put information in and to get it back out. You will also need a way to control (restrict or grant) access to the data. The data going in to storage must be findable again.

The success of your ECM solution is predicated on the validity of the metadata which goes in with it.Simply put, metadata describes the content you are storing in a way which allows you to find it again.

Back in the day, my fellow propeller-heads and I used to joke about Write-Only storage, meaning that data could be written to a disk, but never read, which of course renders it useless. Just as useless as Write-Only storage is content which is unfindable, or, just as bad, is data which matches too many criteria. This also makes it hard to use.

When getting content into the system, it is imperative that good, solid metadata is entered into the system along with the data.

What’s the best way to get good metadata?

  • Automated capture
    Grabbing data directly off the content being inserted. This can be scanned-in images, using Capture Software. Today, this software is getting quite sophisticated and can read handwritten data as well as recognizing printed text, specialized bar codes, and images
    If data is being inserted in electronic form, such as through web-services, there is likely already metadata associated with the content
  • User Input
    Sometimes you must let the users enter the data; in this case, you must keep it dead simple to be effective.
    Keep the user interface simple — only ask for the data you actually need
    Provide lookups for data to restrict the domain of possible results
  • Validation
    This adds a separate step, but can greatly increase the accuracy of the input.

Your suggestions?

 

Martin O. Waldron
Program Manager, SW Development
ImageSource, Inc.

Share on LinkedIn   Share on Twitter

Follow

Get every new post delivered to your Inbox.