![]() | ||||
| ||||||
| Research Topics and Sponsored Projects Discussion on various research topics and sponsored projects |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| |||
| Add the MapReduce API for SOAM Give a MapReduce API in SOAM. -------------------------------------------------------------------------------- Overview MapReduce is a software framework implemented by Google to support parallel computations over large (multiple petabyte) data sets on unreliable clusters of computers. This framework is largely taken from map and reduce functions commonly used in functional programming, although the actual semantics of the framework are not the same. wikipedia MapReduce There a lot of developers who are familiar with the MapReduce programming model. We want to support this model in SOAM. Work Items 1. Define a MapReduce API in SOAM. 2. Implement the C++ MapReduce API in SOAM. 3. Using the SOAM MapReduce API create some examples for customers. 4. Add the data aware logic in SOAM. |
| |||
|
One of the key aspects of the MapReduce model is the file system capabilities. Google has its GFS and Hadoop has its own Hadoop filesystem. For C++, the Kosmos File System (KFS) might be a useful way of addressing the data problem. See the link http://kosmosfs.sourceforge.net/. Something to consider as you explore this. Last edited by Khalid; August 14th, 2008 at 04:52 PM.. |
| |||
|
We have implemented a MapReduce API based on the current Symphony API. Dowlod Page The benefit of this new API is developers do not need to write their own data serialization classes with the current SOAM API. They can abstract their inputs as key/value pairs, and get their outputs back in that format as well. Additionally, the customers do not need to care about managing the session and connection objects too. Thay all be handled in the Symphony Map reduce API. Client Code mapMsg.setKey("Skater"); mapMsg.setValues("Hello Grid!"); jobconf.setMapMSG(&mapMsg); jobconf.setNumMapTasks(numMaps); jobconf.setAppName("HelloMapReduceApp"); jobconf.setMapType("maptype"); jobconf.setReduceType("reducetype"); ToolRunner runner; runner.run(jobconf); Mapper Code class HelloMapper: public MapService { public: HelloMapper(void) { } ~HelloMapper(void) { } void runMapper(std::string key, std::string values, OutputCollector& outPut) { outPut.collect(key, values + "Response!"); } }; Additionally, you can write the reducer to process your map output in the distributed environment, which may become useful when your output from the mapper is large, and you want to enjoy the advantages of using the grid. Reducer Code class WordCounterReducer: public ReduceService { public: WordCounterReducer(void) { } ~WordCounterReducer(void) { } void runReducer(std::string key, std::vector<string> values, OutputCollector& outPut) { long sum = 0; while (!values.empty()) { values.pop_back(); ++sum; } outPut.collect(key,sum); } }; Last edited by Qiang Xu; August 14th, 2008 at 04:50 PM.. |
| ||||||||||
|
Hello guys: We have done more research in this area. Target • Enhance the performance of Symphony processing large data. – By introducing a DFS in the Symphony, the customers’ data can be maintained in the computing node. This can reduce the time for passing data to SIM by SOAM messages. • Let the task run the data which it preferred. – By enhancing the task and session API, the customers can get the computing node according to their data or resource requirement. Plan
HDFS VS KFS
KFS performance is better than HDFS http://code.google.com/p/hypertable/wiki/KFSvsHDFS The problem is KFS is in its early implementation stage, the system is not stable, and there is not enough sample for how to use it. Last edited by Qiang Xu; August 14th, 2008 at 05:33 PM.. |
| |||
|
While I am not familiar with the specific DFS you are proposing, I think it is an excellent idea to provide a DFS for the service tasks to access large datasets. Otherwise, the client may become a storage and communication bottleneck for some applications. Taking this a bit further, how about a data fabric for the service tasks? Something like GemFire would provide better fault-tolernce and provide caching as well.
__________________ -- Cheers, Peter (Peter Strazdins, Australian National University) |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|