Discussion for Data movement, staging and data-aware workflow scheduling with LSF related issues.
Discussion for Data movement, staging and data-aware workflow scheduling with LSF related issues.
Hi Dayanand,
There is lots of scope for research in this area. The types of issues that are of interest include (but are not limited to):
- how does one manage storage that is used by jobs which produce large data sets. This includes both transient and permanent data sets, and includes issues such as space management and selection of storage volumes to enhance efficiency of the job (e.g. do you choose local or remote storage elements).
- in environments where not all storage volumes are shared, and datasets are spread across the infrastructure, how do we select hosts for jobs that use particular datasets. If a dataset is local to a host, that would obviously be preferred to having to stage-in a file from another node, or to access the file over NFS or the like, especially for large datasets.
- if datasets are mostly read-only, how does one manage local "dataset caches" on individual compute nodes to balance the need for space versus the ability to send jobs to multiple compute nodes (thus increasing throughput)
- in the context of workflow, what are the best ways to have users express the requirements for either storage or datasets?
These are just a few issues that exist. One can also imagine a whole new set of issues if you want to manage datasets between multiple data centres and thus need to replicate data over wide area networks.
What research areas are you interested in?
-- Chris
I am also interested in research within this field.