Slicer3:Grid Interface UseCases
Contents
Use Cases
Here are two base use cases that we can consider as an initial step in moving the Grid Interface (Grid Wizard) into NA-MIC DBP analyses. These two use cases are "deliverables" for NAMIC, but on different time frames---the first use case should be completed, documented, demonstrated, and in the hands of users by early August. The second use case should be completed, documented, demonstrated, and in the hands of users by mid-to-late September.
EM Segmentation
The EM (Expectation Maximization) Segmenter is an algorithm that performs a two-step iterative optimization procedure to detect "stuff" from "non-stuff" (where usually "stuff" is "white matter" and "non-stuff" is everything that isn't). The algorithm itself is slow, but frequently needs to be run on many data sets as an initial step in some larger scientific analysis problem. One of the main problems with EM Segmentation is that configuring a set of parameters for a segmentation run is, mildly put, a bit of an art rather than a specific procedure. We propose that Slicer's main function here is as an exploratory data analysis platform:
- User loads an image into Slicer
- User goes through a several step process in configuring and initializing the EM algorithm
- User performs EM segmentation on a subsection of a single volume
- User goes back to step 2, until satisfied that the EM iterations are converging to a useful (though local) optimum.
- User saves a "parameter file" [need more info here about what this is] and then performs the EM segmentation algorithm on the whole volume.
- User looks wistfully at a directory, and wishes that she could just "run it on all those files", rather than repeating the process from step 1.
There are a couple of problems with these last two steps, including the general warning from Slicer 101 claiming: don't run this on a machine with less than 2 Gb of RAM, and even that might not be enough sometimes. This is an obvious case for where we can benefit from performing an initial analysis on a cluster, and then performing a batch analysis "in the large" on a cluster.
What we propose as a software deliverable comes in four parts:
- A configured GridWizard system for the NAMIC cluster that a user can install into his or her own workspace (users can reconfigure for other clusters, but this may not be terribly friendly)
- An EM Segmenter Slicer command-line module that allows a user to perform a single segmentation on a cluster, or a batch segmentation on a cluster
- Documentation on how to use both of above
- A roll, RPM, or tarball for EMSegmenter for 32-bit and 64-bit Linux cluster so that it can easily be installed on many compute nodes simultaneously
In these use cases, we presume that the user is, virtually speaking, inside the SPL proper and does not need to contend with gatekeeper's s/KEYS password system. We also assume the following "preamble" to each use:
- User loads an image into Slicer
- User configures EM Segmenter
- User performs small segmentation locally
The usage we propose of EMSegmenter is in two parts: simple, and batch. "Batch" here simply means "the processing of multiple files simultaneously"; in each case, the job physically runs on a batch-oriented processing system.
- Simple mode
- Preamble...
- User launches Segmentation on a cluster (specifically, the NAMIC cluster)
- Window pops up with the task to be performed
- User reviews the "task list" and clicks "run"
- Job starts
- Job output ends up on an sshfs-mounted directory
- User can reload results into Slicer, check for accuracy.
- Mega-mode
- Preamble...
- User chooses a directory
- User enters a file glob (i.e., filter) into a parameter box in Slicer; this file glob applies to files in the chosen data directory but is not recursive; the file glob may not adhere #: to strict POSIX guidelines (? or [] might not be implemented)
- User choose "launch"
- Window pops up with the list of tasks that need to be performed
- User reviews the task list, and clicks "run"
In either case, a future version should include some additional features from the "job manager" window that pops up when the user wants to run batch jobs:
- Ability to monitor jobs
- Ability to inspect job outputs (PBS stderr streams, etc)
- Ability to inspect the job artifacts (parallelization scripts)
- Ability to choose the scheduling algorithm (and all that implies): do it all statically, or on a first-come-first-served basis.
- Ability to select multiple clusters, connected remotely
Many of these features are already present in GridWizard, but validation that they work with the above use case needs to be done _after_ the software is successfully demonstrated.
Note: See more information about the newer EMSegmenter.
SPHARM-PDM: Spherical Harmonic-based Brain Shape Analysis
The SPHARM-PDM based shape analysis aims at analyzing group shape differences in sub-cortical structures, such as the caudate or hippocampus. The overall procedure converts a binary segmentation first into a surface mesh and then computes a parametric shape description using spherical harmonic basis functions for encoding separately the x, y, and z coordinates of the surface. The first order ellipsoid alongside the spherical parametrization established via equal-area transform establishes the correspondence between objects. An icosahedron subdivision of the spherical parametrization results in a sampled triangulation with inherent correspondence between objects. The sampled surfaces of two groups are then compared via multivariate Hotelling T^2 group mean difference evaluated using a non-parametric permutation test.
A key step in the SPHARM-PDM processing is the initial mapping of an arbitrary binary segmentation onto a surface of spherical topology and then onto spherical coordinates. The software that performs this is part of the SPHARM 1.7 toolkit, using the commands SegPostProcess and GenParaMesh. The SPHARM-PDM description is the computed from this information via ParaToSPHARMMesh. The statistical analysis is computed via the tool StatNonParamPDM.
The goal of our activities is to take a large list of segmentations and process these through the first 3 steps in parallel. The time savings is obvious: 60 volumes acquired throughout a day can be processed in parallel in under two hours on a moderate cluster; on a single processor, it could take more than 5 days or longer.
Initial command line based interaction model
SPHARM-PDM shape analysis is quite different from EMSegmenter. Here, we can rely on two things:
- The user does not need to start slicer in order to start a job
- The user is comfortable with command-line applications
This is an ideal situation for =gwiz-run=, a command-line application scheduler.
The software we propose for this use case comes in three parts:
- A configured GridWizard for the NAMIC cluster, which can be installed by a user into his or her own workspace
- A software package (RPM, tarball, or rocks roll) for 32- and 64-bit clusters to install the SPHARM-PDM software
- A set of "template" commands that the user can apply to process large numbers of images
Like the EMSegmenter use case above, we assume that the user is virtually inside the SPL and does not have to contend with the gatekeeper login system, or is using the BIRN cluster. Similarly, the cluster will need to have an =ssfs= filesystem mounted to some remote data repository (or the cluster needs to be the repository). The usage we propose of SPHARM is as follows:
- User gathers a large set (>= 25, <= 500) of files to process
- User picks a set of configuration parameters
- User goes to a NAMIC wiki page listing a "template command"
- User runs the =gwiz-gui= application
- User cut-and-pastes the template command into the gui, edits it as appropriate
- User reviews task list, optionally chooses to change the scheduling algorithm's parameters, clicks "Go"
The "nice-to-have's" from EMSegmenter apply here as well.
GUI based interaction model for clinical investigators
The above scenario is really only short term vision, what we really need is a GUI that lets users select the datasets and parameters. Then the user clicks a 'run' button and everything is automatically being distributed to the grid and run. We are currently developing exactly such a module (external) for Slicer. The module lets you first select the binary segmentations of multiple subjects. Then group information and subject variables are provided using a table interface. Upon completion of the table information, all computation are performed automatically via a BatchMake script. The BatchMake script sends the jobs to distributed computation grid (via Condor).
The advantages of this scheme are:
- The user does not need to be comfortable with command line tools
- Easy cross platform usage
- Data entry for group comparison and correlational analysis is straightforward
- Online help for parameter selection (in case default parameters need to be changed)