Overview

Product Name: GriF - Grid Framework
Version: 4.2 GP (General Purpose) - January 2013
Virtual Organization: COMPCHEM (http://compchem.unipg.it)
Main Developer: Carlo Manuali (carlo@unipg.it)
Platform: Multi-platform JAVA based (tested on Mac OS X 10.6.x and 10.8.x, Scientific Linux 5.x, Windows XP and 7)

GriF is a SOA Grid Framework aimed at running both on the EGI Grid and various HPC platforms multi-purpose applications. The SOA-based organisation of GriF consists of two JAVA servers (YR and YP) and a JAVA client (YC). The first server YR (Yet a Registry) is based on the standard UDDI (Universal Description, Definition, and Integration) protocol. Users inspect YR to the end of finding the appropriate YPs. The second server YP (Yet a Provider) makes use of the Simple Object Access Protocol (SOAP) which is the XML-based messaging format established as transmission framework for inter-service communication via HTTP or HTTPS. YP holds the Grid Services of the VO. Both YR and YP make use of WSDL (Web Services Description Language) to describe the services provided and to reference self-describing interfaces which are available in platform-independent XML documents. The JAVA multi-platform client YC (Yet a Consumer) needs not the issuing of Grid Certificates (security is initially granted by the fact that only VO members can access GriF and they need to specify on YC the related username and password). The selected YP takes also care of running the jobs on the associated User Interface (UI), of managing their status and of notifying the users upon completion, leveraging on a Robot Certificate strategy. YC is weakly coupled with respect to the Grid Middleware and implements all the extensions and the protocols mentioned above in order to correctly interface the Grid Services offered. It supports the management of large result files (even Gigabytes) empowering Quality-based Single, Parameter Study and Workflow job approaches.

Requirements and Installation

1. A Java(TM) Runtime Environment >= 1.6 (mandatory)

For example, you can verify your JAVA version with the following command:

# java -version
java version “1.6.0_37”

Please note that all these instructions assume that the 'java' command is in your path. If it isn't there, then you should either specify the complete path to the 'java' command or update your PATH environment variable.

2. An Internet connection with outbound connections to the HTTP protocol and port 8080 active. Please note that a degraded Internet connection can introduce considerable delays during all the interactions between GriF and the Grid.

3. An utility to access your job results which will be returned in a compressed TAR archive format (.tar.gz).

4. No installation steps are requested. This software is ready for use as it is.

Limitations and Documents

1. Some errors resulting from the use of YC can be connected with the real status of the Grid Middleware and of the network connection. Sometimes it can happen that one experiences troubles (e.g. on retrieving 'Done' results or in running Grid jobs) which don't depend on GriF. In any case, a report on any kind of problems and/or malfunctions is welcome.

2. More information on the YC implementation are in the 'docs/' directory (Javadoc) shipped within this package.

Usage and Description

1. Start YC with the command:

# java -jar GriF.jar

2. Enter your login and password (if you do not have a login and a password please contact us). After YC is loaded, five different panels called 'Run Applications', 'Manage Single Jobs', 'Manage Multiple Jobs', 'Settings' and 'Contacts' are displayed. Please note that multiple instances of YC are allowed.

3. Within the 'Run Applications' panel, run the desired binary application (or shell script) on the EGI Grid after uploading it by pressing the 'Upload' button. You have also the option of choosing among those already offered by COMPCHEM to its users. Next, the related input needs to be provided either as a single plain text file or as compressed TAR or ZIP file (in that case, respectively, the .tar.gz or the .zip extension is required) for multiple input (Single Job type). On the other hand, in a case in which you are going to run a Parameter Study job type, the input has to be provided in a compressed format according to the followings:
a) when the original input of the application to be run is based on a single file, you need to specify in the compressed input archive the different input file names (one for each subjob). For example: in_file-0.txt, in_file-1.txt, [...], in_file-N.txt;
b) when the original input of the application to be run is based on multiple files, you need to specify in the compressed input archive different directories (one for each subjob) containing the input files set. Accordingly, same file names are allowed for different sets. For example, having an application based on 2 input files to be distributed 3 times you can specify: inputdir-0/in_file-0.txt, inputdir-0/in_file-1.txt, inputdir-1/in_file-0.txt, inputdir-1/in_file-1.txt, inputdir-2/in_file-0.txt and inputdir-2/in_file-1.txt.
You can also combine the 'Parameter Study' option with that of 'Workflow'. In particular, the following actions are required in order to run on the Grid multiple Workflows (each starting by a different initial input) at the same time:
a) Upload your application under the form of a Shell Script containing the sequential, conditional or iterative path that your programs (used by the Workflow) have to be follow;
b) Select your multiple input (corresponding to different inputs for your first program in the Workflow) as for the 'Parameter Study' running modality;
c) Select 'Workflow' as Job Type;
d) Upload all the binary programs forming the Workflow under the form of a compressed TAR or ZIP package.
Before running, basically you can choose between using GriF Ranking or not. In the former case, specific functions ensuring two running days for the Grid job and enabling empowered quality algorithms on the queues available to the VO will be applied in order to obtain more reliable jobs and results. In the latter case, reliability is not guaranteed yet you can choose between three running days or three Gigabyte ram ensured for your Grid job. Moreover, running a Parameter Study job allows also the use of the so called 'HPC Ranking' option. In this case, all your subjobs will be submitted on the same CE queue having the maximum number of available CPUs in that moment (ensuring 1 running day). Please note that this option does not provide the same level of reliability as from the 'pure' GriF Ranking mentioned above. Finally, you can also choose to avoid any kind of Ranking. Then, by pressing the 'Start' button, you will distribute the job on the Grid. Don't worry if some delays occur during this task (up to 30-35 seconds, even 45 when using the 'HPC option') because it's normal (especially when you choose not to use the GriF Ranking). Please also note that: your application is allowed to produce more than one file of results (for each Single job or subjob), for normal users the maximum number of subjobs allowed for each (Parameter Study) Grid Job is 15 (unless you adopt the 'HPC Ranking' option that is limited to 500) and when you use multiple input, they are considered in alphabetical order.

4. Within the 'Manage Single Jobs' panel, manage your pending 'Single' Grid jobs.
In the main window you have your pending jobs list that can be refreshed by pressing the button 'Refresh'. Just by selecting one job you can see its description on the right. Please note that 'Done' jobs are returned on top of the list. The remaining jobs are ordered by status and then by submission date & time. The effective status of your jobs is updated by YP at regular intervals. When you cancel a job (by pressing the 'Delete' button), or retrieve its results, it will disappear from the list. You can retrieve job results only for’ Done’ jobs by pressing the 'Get Results' button. After retrieving, the related results will be automatically purged from the Grid. Accordingly, remember to save them in a safe way.

5. Within the 'Manage Multiple Jobs' panel, manage your pending 'Multiple' (Parameter Study) Grid jobs.
In the main upper window you have your pending jobs list that can be refreshed by pressing the button 'Refresh'. Just by selecting one job you can see its main description on the right and its subjobs on the window below (also reporting the execution queue assigned to each of them). Even in this case please note that 'Done' subjobs are returned on top of the list and that the remaining jobs are ordered by status and then by submission date & time. The effective status of each subjob is updated by YP at regular intervals. When a subjob has failed or it has taken too much time to finish you can re-schedule it by pressing the 'Re-Send' button. Accordingly, this feature will allow you to always complete your Parameter Study experiment. On the same fashion, you can re-schedule all the failed or pending subjobs (Submitted, Waiting, Ready, Scheduled and Running) just by clicking, respectively, on the 'All Failed' or 'All Pending' button below (in both cases a new Grid job will be created formed by those subjobs). When you cancel a job (by pressing the 'Delete' button) it will disappear from the list and even each related subjob will be deleted. You can retrieve job results only when at least one subjob is 'Done'. Accordingly, for each 'Done' subjob, you can retrieve its results by pressing the button 'Get It'. Moreover, you can also gather all the available results by pressing the button 'Get All'. Please consider that this action will also allow GriF to stop the related job and all its subjobs not yet finished disappearing from both lists. In any case, each single or multiple result will be automatically purged from the Grid after retrieving. Accordingly, remember to save them in a safe way.

6. Within the 'Settings' panel, search for different YPs and then set your favourite one by a simple copy & paste.
For example, this could be useful when more than one YP is returned and you want to change it after YP server errors. Moreover, you can examine, respectively, the current active sets of ranked CE queues for Single and Multiple Jobs and also retrieve which is the current CE queue available (and its number of free CPUs) for the next Parameter Study job supporting the 'HPC Ranking' option.

7. Within the 'Contacts' panel, send messages to us. Any kind of reports, questions, feedbacks or other information are welcome.

8. Other minor functions:
a) You can check the health of the whole system (Database, Web Services, UI connection, LFC subsystem and Queues availability) by pressing the 'System Status' button ('Run Applications' panel) for example when you change type and/or address of network connection.
b) You can check the last job run on the Grid by pressing the 'Check the status on the Grid in real-time' button ('Running Applications' panel) for example to immediately verify that your Grid job is in charge of the EGI Grid.
c) You can save YC messages by pressing the 'Save' button ('Running Applications', 'Manage Single Jobs' and ‘Manage Multiple Jobs') for example when you desire to send us log information or store useful data returned by YC.
d) You can clear YC messages by pressing the 'Clear' button ('Running Applications', 'Manage Single Jobs','Manage Multiple Jobs' and 'Settings' panels).
e) You can exit from YC by pressing the 'Logout' button (all panels).

Future Work

In order to improve GriF, the following activities are already planned in the next releases of GriF:

1. Writing an interface enabling the compilation of Grid applications;
2. Providing an automated first evaluation for new CE queues;
3. Managing encrypted results;
4. Registering distinct values for real and virtual memory (at present only the total memory is considered) to GriF accounting.

Note: Any kind of collaboration is welcome and can be proposed by email.