Java Grid Processing

Many people believe that distributed (or grid based) computing has only recently been possible on home machines. This isn’t true.

Java has offered the basic tools to perform distributed processing since it’s earliest days (even when it was called Oak on the *7 research system in 1992). The key piece of technology for doing this are Java Applets (or the agents from Oak). The distributed processing system detailed here is suited to tasks which involve datasets which can be broken down into small components.

Applets have the ability to make an HTTP request to the server from which they downloaded. This can be used by an Applet to obtain a subset of the data which will processed to solve a problem. This can be obtained from a co-ordinating server (or set of servers) whenever the Applet wishes, which means you can write an applet which follows the following loop until it is terminated by the user;

  1. Download subset of the data to process from the server using HTTP.
  2. Perform actions on subset
  3. Send results to co-ordinating server using HTTP
  4. Repeat from first step

Due to applets requiring a web page in which they operate it is advised that the page hosting the applet contains a large warning to the user which explains that closing the window will stop the computer participating in the grid.

The only other component that needs to be written is the co-ordinating server. This can be very simplistic in that it can be a Servlet or CGI script which accesses a database storing the status of the subsets of the data (i.e. whether they are unprocessed, being processed, or have been processed and a result is available).

When writing such a system it is recommended that the following things are included;

  1. Result Timeout - If a result has not been received from an Applet after a set amount of time the system act as if the Applet will never complete processing of the subset. The timeout should be carefully chosen so that slow machines are still able to participate and do not suffer from the subsets allocated to them being timed out before they can process them.
  2. Result Verification - It is potentially possible that an incorrect result may be returned from a client, in order to minimize the possibility of this causing problems each subset should be processed by at least two separate systems, if the results submitted by the different Applets are not the same then either further investigation needs to take place, or the subset needs to be sent out to more Applets in order to achieve a “majority” agreed result.