+ Reply to Thread
Results 1 to 3 of 3

Thread: MapReduce in Symphony

  1. #1
    csmith's Avatar
    csmith is offline Junior Member
    Join Date
    March 20th, 2008
    Posts
    26
    Blog Entries
    7
    Downloads
    17
    Uploads
    1

    Default MapReduce in Symphony

    MapReduce is a programming model that has been around for a while
    (especially in functional style languages), but that has been more
    recently popularized by Google (Google Research Publication: MapReduce)
    and the Hadoop project for processing large data sets in the Grid.

    Given the popularity and the ease of use of the MapReduce programming
    model, we decided to implement a C++ MapReduce API layer on top of
    Platform Symphony's current C++ API, in order to provide a more
    familiar style of programming model for those users who are already
    familiar with the concepts of MapReduce. With this new API, developers
    do not need to write their own data serialization classes with the
    current SOAM API. They can abstract their inputs as key/value
    pairs, and get their outputs back in that format as well.

    For example, in the Symphony API, when you want to send a message
    "Hello Grid", you need to write your own serialization code to create
    the "Hello Grid" message, and create your own connection, session and
    tasks to process the message. With the new MapReduce API, all the steps
    for serialization, connection management, session management and task
    processing will be done automatically by the MapReduce library, so you
    can focus on your own business logic. The code can be written
    something like this:

    Client Code:

    mapMsg.setKey("Skater");
    mapMsg.setValues("Hello Grid!");

    jobconf.setMapMSG(&mapMsg);
    jobconf.setNumMapTasks(numMaps);

    jobconf.setAppName("HelloMapReduceApp");
    jobconf.setMapType("maptype");
    jobconf.setReduceType("reducetype");

    ToolRunner runner;
    runner.run(jobconf);


    Mapper Code:

    class HelloMapper: public MapService
    {
    public:
    HelloMapper(void) { }
    ~HelloMapper(void) { }

    void runMapper(std::string key, std::string values, OutputCollector&
    outPut)
    {
    outPut.collect(key, values + "Response!");
    }
    };


    Additionally, you can write the reducer to process your map output in
    the distributed environment, which may become useful when your output
    from the mapper is large, and you want to enjoy the advantages of
    using the grid.

    Reducer Code:

    class WordCounterReducer: public ReduceService
    {
    public:
    WordCounterReducer(void) { }
    ~WordCounterReducer(void) { }

    void runReducer(std::string key, std::vector<string> values, OutputCollector& outPut)
    {
    long sum = 0;
    while (!values.empty())
    {
    values.pop_back();
    ++sum;
    }
    outPut.collect(key,sum);
    }
    };


    So in the SOA grid, MapReduce is implemented as two separate services:
    Map and Reducer. The inputs and outputs are only key/value pairs,
    which developers can process within their own implementations of these
    services.

    We have made the MapReduce API and examples available in source format
    from the HPC Community download area here.

    There are two examples in the source package: the WordCounter shows
    the classic map reduce example provided by Google in its paper, and
    the PiEstimator is a more traditional Symphony application that
    estimates the value of Pi using monte-carlo.

    We hope that you can use MapReduce for your applications!

  2. #2
    Guillaume is offline Junior Member
    Join Date
    May 21st, 2008
    Posts
    3
    Downloads
    5
    Uploads
    0

    Default

    A big part of Hadoop's implementation of MapReduce is the DFS.
    What plays that role in your API ? I only see the common data as possibly having a similar role ...

  3. #3
    csmith's Avatar
    csmith is offline Junior Member
    Join Date
    March 20th, 2008
    Posts
    26
    Blog Entries
    7
    Downloads
    17
    Uploads
    1

    Default

    At this stage, we just wanted to show that the MapReduce programming model could be implemented over top of Symphony. We're also considering how to bring the data management functionality to the API as well. One option could be to use HDFS from Hadoop itself. Another could be to use the Kosmos DFS as the file system layer. This would be key to making MapReduce in Symphony actually useable IMHO, but it hasn't been done yet.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts