Documentation/Labs/MultiDimensional Data Management
Contents
- 1 Andras Initial proposal
- 2 Jim
- 3 Andras
- 3.1 Let’s assume that the “multi” dimensional is arranged into timepoints corresponding to “visits”. Then in your terminology, a “branch” would correspond to a visit and contained in that branch would be all the data for that visit: Ultrasound images, CT images, models, transforms, etc. Is this correct?
- 3.2 Are you trying to generalize the Multivolume concept (which is a single node that manages a time varying volume and provides mechanisms to view a single time point)?
- 3.3 You make reference to memory issues several times. Would it help if Slicer could be told to only load the meta-data for a volume until the pixel data is truly needed?
- 4 Jim
- 5 Andras
- 5.1 Is there any linking of data between different MultidimensionalDataitems?
- 5.2 How would you model your ultrasound data you referenced? One MultidimensionalDataItem with N ScalarVolumes and N Transforms
- 5.3 What if the MultiVolume were extended to support a transform per time point? Does that address your major need?
- 6 Jim
- 7 Andras
- 8 Jim
- 9 Andriy
- 10 Kevin
Andras Initial proposal
The main idea is to represent multidimensional data in Slicer in a generic way (for any number of dimensions, any kind of dimensions, any type of nodes). The N-dimensional data would be stored in a two-level tree structure: each branch would correspond to a specific parameter combination (“coordinate” in the N-dimensional space), and each branch would contain a list of nodes (“value” at that coordinate). Selection/filtering/browsing in such a structure is trivial. We may add indexing for each dimension for performance optimization.
In addition to the data model, some related components would be developed as well:
Importers for specific file formats Storage nodes for specific file formats Editor module (to allow the user manually constructing multidimensional data sets from existing nodes) Browser module (to allow browsing of a multidimensional data set along any dimension by copying the nodes in a selected branch to the scene) Batch processing module (to apply a CLI with the same parameters to each node of a multidimensional data set)
It seems that there are two main design options:
Option A: Keep all data in the scene
Create a new MRML hierarchy nodes (MultiDimensionalData node for root node, MultiDimensionalDataItem for branches) to create the tree structure.
Pro:
- any node/module can refer to any multidimensional data element at any time
Con:
- the scene may become huge, leading to performance problems (the only issue is the qMRMLSceneModel, which is created for each node selector widget or node tree view widget; each widget observes all the potential nodes; full update of a single widget may take seconds)
- the scene may become huge, leading to out-of-memory issues (difficult to implement delayed, on-demand loading in the current scene architecture as it is expected that everything is available in memory)
- most of the nodes have to be hidden, otherwise node selectors would become unusable (if we had volume data for 100 time points then we would have 100 volumes in the slice viewer's volume selector); however, this introduces issues:
Remarks
- making HideFromEditors property of nodes dynamically changeable introduces significant slowdown (due to qMRMLSceneModel, loading and batch processing update of large scenes is slowed down by up to 30-50%; we may need to redesign the whole Qt MVC concept; maybe have only one model for the scene, not for each widget; or centralize the node observation in some other way)
- hidden nodes are not saved (except if we save as .mrb file, but the current implementation is very hacky and storage nodes are not always created correctly)
We’ve already implemented this option for one parameter dimension: there is a browser node, editor node, and importer for tracked ultrasound data (https://subversion.assembla.com/svn/slicerrt/trunk/MultiDimension/). Without making the HideFromEditors dynamic, the scene performance is usually quite good, the refresh rate during browsin is about 10-20fps for a single-slice volume and 4 transforms (for about 500 time points). However, as we open some module GUIs it sometimes slows down to 1-2fps. Due to the above limitations, saving only works in .mrb format and nodes cannot be shown/hidden dynamically (nodes are copied to the scene for showing it in the scene).
Option B: Keep nodes within private “sub-scenes”
Have only one new MRML node for root node and store all the other MRML nodes privately (by default the nodes in the branches would not be included in the scene). It could work similarly to scene views management, where nodes are stored outside the scene and copied into the main scene on request.
Pro:
- scene is small, no performance issues
- delayed loading is possible (it’s enough to load the volume from disk when it is requested to be included in the main scene), so no memory issues for large data sets
Con:
- node contents must be copied from sub-scene to make it available in the main scene (may be slow if it’s completely generic, as shallow copy function in MRML is not yet available)
- saving is complicated (similar to scene views, but even more complex, as it’s a two-level tree instead of just one; scene views saving implementation is already quite messy and actually broken in Slicer – doesn’t work if nodes are changed or deleted)
- editing is complicated (needs sync between the node copy in the main scene and in the sub-scene)
- allowing other nodes/modules to refer to individual nodes is difficult (as nodes may not always be available in the main scene)
Remarks
For option A the hidden nodes management should be fixed (and probably the Qt scene model MVC classes should be redesigned); for option B the scene views management should be fixed and extended.
Do you have any preference for option A or B? Any other ideas?
Jim
Your description is a little abstract. Let me ask a few questions.
Let’s assume that the “multi” dimensional is arranged into timepoints corresponding to “visits”. Then in your terminology, a “branch” would correspond to a visit and contained in that branch would be all the data for that visit: Ultrasound images, CT images, models, transforms, etc. Is this correct?
Or, are you saying a “branch” refers to a specific type of data, for instance a model. And the data within that branch is all the “versions” of that model across all the timepoints?
Are you trying to generalize the Multivolume concept (which is a single node that manages a time varying volume and provides mechanisms to view a single time point)?
You make reference to memory issues several times. Would it help if Slicer could be told to only load the meta-data for a volume until the pixel data is truly needed?
Andras
Let’s assume that the “multi” dimensional is arranged into timepoints corresponding to “visits”. Then in your terminology, a “branch” would correspond to a visit and contained in that branch would be all the data for that visit: Ultrasound images, CT images, models, transforms, etc. Is this correct?
Yes, in one branch we would have all the nodes related to that visit. Another example with 3 additional dimensions (visit, time, treatment):
<MultidimensionalData Parameters="visit,int;time,float,sec;treatment,string"> <MultidimensionalDataItem visit=”1” time="10.0" treatment="asdf"> ...nodes in the scene... (CT, models, transforms, etc.) </MultidimensionalDataItem> ... <MultidimensionalDataItem visit=”1” time="11.0" treatment="asdf"> ...nodes in the scene... </MultidimensionalDataItem> </MultidimensionalData>
Are you trying to generalize the Multivolume concept (which is a single node that manages a time varying volume and provides mechanisms to view a single time point)?
Yes, the multivolume is a very limited, specific case. For example, we cannot use the multivolume for ultrasound, because only the pixel data can change between time points (image origin and direction cannot) and it doesn’t support transforms changing in time.
You make reference to memory issues several times. Would it help if Slicer could be told to only load the meta-data for a volume until the pixel data is truly needed?
The main challenge is to avoid any performance degradation, but potentially loading all data into the memory could be problematic as well. On-demand loading of voxel data for volumes and deformation fields and polydata for models would certainly help, that’s the main idea of on-demand (or delayed) loading. It probably can be implemented for regular scene nodes, but it may be tricky to make it fully backward-compatible and may be non-trivial to implement an unloading strategy. We can easily guarantee backward compatibility if we just use on-demand loading for nodes that are in private sub-scenes and not in the main scene.
Jim
Is there any linking of data between different MultidimensionalDataitems? i.e. can a model in one MultidimensionalDataItem be identified as the same model (but at a different time point) in another MultidimensionalDataItem?
How would you model your ultrasound data you referenced? One MultidimensionalDataItem with N ScalarVolumes and N Transforms?
What if the MultiVolume were extended to support a transform per time point? Does that address your major need?
Whereas the MultiVolume manages multiple time points of a volume, your design is not just a generalization to other data types. It looks to be combining two concepts.
For example, a first level generalization of MultiVolume could be MultiObject which would manage multiple time points of any MRML type, with a time and a transform associated with the time point.
But you are proposing something even more general than that. Correct?
Just trying to wrap my head around what you are proposing relative to where we are now.
Andras
Is there any linking of data between different MultidimensionalDataitems?
No, each item is independent. The same way as sceneviews are independent. Basically we have a list of scenes. The main difference compared to the scene views that you not identify the internal scenes by name but by a set of parameter values (visit, time, ...) and you can quickly get the contents of any sub-scene into the main scene.
How would you model your ultrasound data you referenced? One MultidimensionalDataItem with N ScalarVolumes and N Transforms
N sub-scenes. Each contains a volume and 4 transforms.
What if the MultiVolume were extended to support a transform per time point? Does that address your major need?
It would be enough for one specific problem, today. Would not work for several other projects that we have (dose accumulation, treatment response evaluation, etc). Supporting all node types (also future and non-core nodes) and relationships between them (display, storage, transformation, etc.) may all be needed - and provided if we store scenes instead of just selected nodes
Jim
I am still confused by your Ultrasound example.
I think you are saying for the Ultrasound example that you will have N MultidimensionalDataItems (you threw in the word sub-scenes which is not in the vernacular). Each will have a volume and 4 transforms. So different timepoints are different MultidimensionalDataItems, all within one MultidimensionalData (which represents the entire sequence). Is this correct?
I understand that modifying Multivolume doesn’t solve all of tomorrow’s problems. But at the same time, I am having difficulty seeing how your proposal maps to those same problems.
Can we lay out a few examples (Ultrasound, Dose Accumulation, Treatment Response) where for each example we define
1. How many MultidimensionalData objects there would be
2. How many MultidimensionalDataItems objects would there be in each MultidimensionalData objects
3. What type of visualizations/processing do you want to do between MultidimensionsDataItems?
4. What type of visualization/processing do you want to do between MultidimensionData objects?
Andras
I use “sub-scene” to distinguish the private scenes (stored privately inside the MultiDimensionalData node) from the “main scene” (the usual scene). Note that if we are considering option A, then the sub-scenes are stored as hidden branches in the main scene, not in separate scene objects.
Ultrasound:
See a short screencast about how it works now (implemented with design option A, with 78 time points):
http://screencast.com/t/gXPYTArNgVO
<MRML …> <Crosshair…> <Selection…> <Interaction…> <View…> <Slice…> <Layout…> … <MultidimensionalData Parameters="time,float,sec" > <MultidimensionalDataItem time="10.0" > <Volume id=”vtkMRMLScalarVolumeNode1” … /> <LinearTransform id=”vtkMRMLLinearTransformNode1” … /> <LinearTransform id=”vtkMRMLLinearTransformNode2” … /> <LinearTransform id=”vtkMRMLLinearTransformNode3” … /> <LinearTransform id=”vtkMRMLLinearTransformNode4” … /> </MultidimensionalDataItem> <MultidimensionalDataItem time="10.2" > <Volume id=”vtkMRMLScalarVolumeNode1” … /> <LinearTransform id=”vtkMRMLLinearTransformNode1” … /> <LinearTransform id=”vtkMRMLLinearTransformNode2” … /> <LinearTransform id=”vtkMRMLLinearTransformNode3” … /> <LinearTransform id=”vtkMRMLLinearTransformNode4” … /> </MultidimensionalDataItem> ... <MultidimensionalDataItem time="95.6" > <Volume id=”vtkMRMLScalarVolumeNode1” … /> <LinearTransform id=”vtkMRMLLinearTransformNode1” … /> <LinearTransform id=”vtkMRMLLinearTransformNode2” … /> <LinearTransform id=”vtkMRMLLinearTransformNode3” … /> <LinearTransform id=”vtkMRMLLinearTransformNode4” … /> </MultidimensionalDataItem> </MultidimensionalData> </MRML>
What type of visualizations/processing do you want to do between MultidimensionsDataItems?
- Replay the acquisition: image contents and slice and tools pose is displayed in 2D and 3D views (as shown on the screencast)
- Processing: feature extraction and surface reconstruction, spatial and temporal calibration, etc.
Dose accumulation / Treatment response evaluation:
<MRML …> <Crosshair…> <Selection…> <Interaction…> <View…> <Slice…> <Layout…> … <MultidimensionalData Parameters="patient,string;fraction,int " > <MultidimensionalDataItem patient="pt123" fraction=”1” > <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CT” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”MRI” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CBCT” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CBCT” … /> <Contour id=”vtkMRMLContourNode1” Name=”Prostate” … /> <Contour id=”vtkMRMLContourNode1” Name=”Rectum” … /> … </MultidimensionalDataItem> <MultidimensionalDataItem patient="pt123" fraction=”2” > <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CT” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”MRI” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CBCT” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CBCT” … /> <Contour id=”vtkMRMLContourNode1” Name=”Prostate” … /> <Contour id=”vtkMRMLContourNode1” Name=”Rectum” … /> … </MultidimensionalDataItem> … <MultidimensionalDataItem patient="pt888" fraction=”6” > <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CT” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”MRI” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CBCT” … /> <Volume id=”vtkMRMLScalarVolumeNode1” Name=”CBCT” … /> <Contour id=”vtkMRMLContourNode1” Name=”Prostate” … /> <Contour id=”vtkMRMLContourNode1” Name=”Rectum” … /> … </MultidimensionalDataItem> </MultidimensionalData> </MRML>
What type of visualizations/processing do you want to do between MultidimensionsDataItems ?
1. Quick browsing: switch between patients and fractions with two sliders
2. Motion compensation: register a selected volume (e.g., CBCT) in each the MultiDimensionalDataItem to a reference volume and store the registration result in the same MultiDimensionalDataItem
3. Dose summation: compute weighted sum of motion-compensated dose volumes in all the MultiDimensionalDataItems
What type of visualization/processing do you want to do between MultidimensionData objects?
Each MultidimensionalData node is independent, so I wouldn’t do any joint processing of multiple MultidimensionalData nodes. We can of course imagine modules to split/merge MultidimensionalData nodes, remove selected dimensions, etc.
Jim
In the second example, you combine data from different patients under one MultidimensionalData node. That kinda muddies the water a little.
I see where you are going.
The generality will be a challenge to support at the CLI level. Maybe we can pass a “group” object to the CLI based on this design. Assuming we need to support these at the CLI level.
Let’s ignoring the scene view performance until we get further down the thought process. I presume the benefit of this approach over the existing hierarchy node capabilities is that
1. It is more explicit. Persumably a MultidimensionalDataItem knows what it contains, rather than being implicity defined in the scene graph.
2. It is more explicit in its generality. The parameters over which the “dimension” is defined can be changed. Whereas with existing hierarchy nodes, we would either have separate hierarchy node types or cache things in the key:value dictionary on the hierarchy node.
I got a little lost in the screencast when you started applying the transform. Was the first part of the video just showing the data without being transformed? Then you selected one of the transforms to apply? And it applied the corresponding transform to the appropriate volume as the time axis was changed?
Would you anticipate being able to handle transform hierarchies within a MultidimensionalDataItem? i.e. the transform relationship would already be established in the item container?
Andriy
The description and discussion were very helpful.
I am planning to join the hangout, but here are few more items for the discussion.
1. We should definitely consider that it will be necessary to support presence of multiple MultidimensionalData (MDD) items in the scene at the same time. If each item is a visit, assessment of treatment response will require visualizing individual visits side by side. I assume this could be handled by Csaba's patient hierarchy, and the MDD would be an optional approach for organization, where appropriate as decided by the application? Perhaps we could "pin" individual MDD items to remain in the scene.
2. In applications like DCE MRI processing and visualization, it is necessary to sample data in the image across time. Currently, multivolume node data is organized so that 4th dimension (time or something else) is the fastest. Do you see MDD addressing this use case as well? Or do you see that parallel representations can be created (e.g., there is a multivolume node that mirrors content of MDD along certain dimension)?
3. I do not think that it should be in the design consideration that there can be too many nodes in a node selector. I think we can work around this by creating specialized modules, and restricting the set of permissible nodes for a given selector based on node attributes - I think this is already possible.
4. How is the proposed architecture related to patient hierarchies? Going back to Ron's email, there were two items: MDD and hierarchies. I thought hierarchies are very close to integration, while MDD was an exploratory item. Is your plan to reorganize patient hierarchies and integrate with MDD, or are they somewhat separate? I would prefer if they could be used independently.
Andras, I like your approach of identifying specific use case to evaluate this design. I suggest we add few more applications (DCE MRI is one), more datasets, and try to map them to MDD to get a better hands-on idea how it will work in practice.
Kevin
Hi,
A while ago we had a discussion on 4D data support in slicer and I have provided a few use cases in the discussion along with some design options for 4D data [1]. Just thought it might be helpful for today’s hangout.
The one thing I would like to add to today’s discussion is the option of separation of transform and data (images). The transforms can be seen as a special type of data that can also be organized under the multidimension hierarchy. It can be applied to a multidimension data just as a regular transform is applied to a scalar volume node.
I will not be available for today’s hangout discussion.
Thanks, Kevin
[1] https://www.assembla.com/spaces/slicerrt/wiki/Slicer_4D_data_representation_and_use_cases