View RSS Feed

Ajith

Using Memory Mapped Files

Rate this Entry
by on July 22nd, 2008 at 09:39 PM (4965 Views)
Use Case

I recently ran into a situation in which I needed my Symphony services to share some intermediate data running on the same host. There was a large amount of data and I didn’t want each service instance to have its own private copy. Having private copies will use up all the free memory on the host. I also wanted all processes to both read and write to this memory. I had heard about memory mapped files but never had a chance to use them. This seemed like a good opportunity to try them out.

What are Memory Mapped Files?

A memory mapped file is just a file that’s mapped into virtual memory (the process’s private window into physical memory). To create a memory mapped file, first create the file, then map all or part of the file using C API functions like UNIX’s mmap() and MS Window’s CreateFileMapping(). Mmap will map the file into virtual memory. This just means that it reserves a block of addresses in virtual memory corresponding to the file's size. No physical memory is allocated until the virtual memory is read or written. Memory mapped files can be used to access files randomly using memory operations or dereferencing pointers.

The operating system typically gives the process a handle to the virtual memory mapped into the kernel’s file cache and does not count against a 32-bit process’s 2 Gb virtual limit. If the file is unmapped from virtual space in one process, it is still cached in physical memory for other processes until something else causes the modified memory to get evicted from the cache and copied back into the file. This works in a similar way to the paging file. The only draw-back to this lazy file write is if the host crashes. The contents of memory that are not flushed to disk explicitly, are lost.

The operating system will read in 4 kb (typical page size) chunks of a file into physical memory at a time. This reading is driven by the process's access to un-paged memory. If virtual memory is read or written that doesn't have physical memory backing it, a page fault occurs, a physical page is assigned and a 4 kb part of the file is read into this location.

Improved performance is achieved through deferred reading of the file and lazy writing of the file. Less physical memory is required if the process only needs to access a few 4 kb chunks in the file. As well, files mapped into memory can be shared by other processes running on the same host.

Using Memory Mapped Files

There are 3 main models to use memory mapped files for memory sharing;
  1. Create one file and grow the file when you need to add data
  2. Create a large file and add data until there’s not enough space left
  3. Create one file for each data entry
The benefit of option 1 is that there is only one file (unless the file size exceeds the available virtual memory). The problem is that other processes that map the memory will need special code to extend the mapping that may cause the whole virtual memory block to get relocated causing all current references into the original memory to become invalid.

The benefit of option 2 is that the virtual memory block is a fixed size and won’t get relocated. So there are no worries about invalidating current references. Creating a fixed buffer is not optimal, as we never know how much memory we need in advance.

Option 3 is the most flexible and memory efficient. We only map the file memory into virtual space when we need it to read or write. The only drawback is that we end up with one backing file for every buffer we create. I chose option 3.

Once memory is shared, we run into synchronization issues. Any process requiring write access will have to hold a process mutex that blocks other reads and writes.

I’ll get into the coding details of the implementation in my next blog.

Summary

Memory mapped files can optimize file access by using a lazy read and write operations. File access through memory operations is faster than doing kernel calls to the file system. Mapped files use the kernel’s virtual space and don’t count against the 32-bit 2 Gb user process memory limit. Performance won’t be very good if the process needs to access memory sequentially, as each read or write of a new 4kb memory page will cause a fault and a file access.

Memory mapped files are useful for sharing memory between processes. Each process will map the same view of a file and can immediately access data written by another process.

References

Memory-mapped file - Wikipedia, the free encyclopedia

Updated July 25th, 2008 at 10:48 PM by Ajith

Categories
SymphonyDE

Comments

Trackbacks

Total Trackbacks 0
Trackback URL: