+ Reply to Thread
Results 1 to 4 of 4

Thread: Data movement, staging and data-aware workflow scheduling with LSF

  1. #1
    vbseo's Avatar
    vbseo is offline Member
    Join Date
    March 16th, 2008
    Posts
    79
    Blog Entries
    2
    Downloads
    114
    Uploads
    37

    Default Data movement, staging and data-aware workflow scheduling with LSF

    Discussion for Data movement, staging and data-aware workflow scheduling with LSF related issues.

  2. #2
    dayanandkg is offline Junior Member
    Join Date
    June 11th, 2009
    Posts
    1
    Downloads
    0
    Uploads
    0

    Default Help

    Quote Originally Posted by vbseo View Post
    Discussion for Data movement, staging and data-aware workflow scheduling with LSF related issues.

    Is there any scope to do more research in this area?. Kindly ley me know molre on this. I am exploring the possibilty of doing my research in this area.

    Regards
    Dayanand

  3. #3
    csmith's Avatar
    csmith is offline Junior Member
    Join Date
    March 20th, 2008
    Posts
    26
    Blog Entries
    7
    Downloads
    17
    Uploads
    1

    Default

    Hi Dayanand,

    There is lots of scope for research in this area. The types of issues that are of interest include (but are not limited to):

    - how does one manage storage that is used by jobs which produce large data sets. This includes both transient and permanent data sets, and includes issues such as space management and selection of storage volumes to enhance efficiency of the job (e.g. do you choose local or remote storage elements).

    - in environments where not all storage volumes are shared, and datasets are spread across the infrastructure, how do we select hosts for jobs that use particular datasets. If a dataset is local to a host, that would obviously be preferred to having to stage-in a file from another node, or to access the file over NFS or the like, especially for large datasets.

    - if datasets are mostly read-only, how does one manage local "dataset caches" on individual compute nodes to balance the need for space versus the ability to send jobs to multiple compute nodes (thus increasing throughput)

    - in the context of workflow, what are the best ways to have users express the requirements for either storage or datasets?

    These are just a few issues that exist. One can also imagine a whole new set of issues if you want to manage datasets between multiple data centres and thus need to replicate data over wide area networks.

    What research areas are you interested in?

    -- Chris

  4. #4
    alkalinewater is offline Junior Member
    Join Date
    May 15th, 2010
    Posts
    3
    Downloads
    0
    Uploads
    0

    Default

    I am also interested in research within this field.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts