Adi Levin's Blog for programmers

May 31, 2009

What is a process?

Filed under: Multithreading — Adi Levin @ 9:17 pm
Tags: , , ,

To understand multithreading, it is important to understand the nature of processes and threads. A process is a running program. Each program can have a number of processes running it. For example, a developer may use several instances of Visual Studio in parallel. Each instance runs the same program, but in the context of a different process.

A process contains the resources needed to run a program. The most important things in it are

  1. Virtual address space
  2. Executable code (images of the EXE and DLLs)
  3. Handles to Windows objects (Windows, Controls, Threads, Files, Events etc…)
  4. At least one thread.

A thread is an entity that can be scheduled for execution. Whenever a process starts, it creates one thread called the primary thread. When that thread ends its execution, the process ends. Processes may have lots of threads, all of which share the resources of the process.

Virtual Address Space

I should say a few words about the meaning of virtual address space. For a long time I did not understand the differences between physical memory and virtual memory, and the term address space didn’t mean anything to me. After asking different people for more explanations, I realized that many other people don’t have a better understanding either.

First of all, it is important to distinguish between memory and address space. Address space is a range of addresses that can be used as pointers to data. When you allocate an array of numbers, you allocate a consecutive range of addresses in which you can write and read data. But the data itself is stored either in memory or on the disk (page file), depending on the way that the operating system manages the memory. The operating system maps the address of every valid pointer to a place in memory or in the page file. Moreover, at the hardware level, addresses from the process’ address space can be mapped to the processor cache for quick retrieval. A programmer almost always is unaware of where the data really resides. A programmer works with addresses, and not directly with physical memory.

In Win32, pointers are 32 bits long (4 bytes), which means they can only have 2^32 (=4GB) different values. The upper two 2GB addresses are reserved for executable code (images of EXE files and DLLs that are are loaded by the when they are needed). This means that we only have 2GB of address space left that can be used as data addresses in our program.

The fact that the address space is only 2GB long has nothing to do with the amount of physical memory on the machine. If the physical memory (RAM) is only 256Mb, our program can still allocate arrays whose total volume is 2GB, but accessing the data in these arrays will result in paging (copying data from physical memory to the page file on the disk). If the physical memory is 4GB, our program still won’t be able to exploit more than 2GB of it.

When allocation (malloc command) fails, this doesn’t mean that we ran out of physical memory. It also doesn’t mean that we requested more than 2GB of address space. It only means that we failed to find a consecutive range of addresses inside our 2GB address space, with the required size, that has not been allocated yet. If your program performs a lot of allocations and deallocations, the address space can become fragmented, i.e. even if you’re only taking a total of 1GB of virtual address space, it can be spread over the range of 2GB in a way that doesn’t leave consecutive ranges of more than 1Mb. In this situation, any allocation of an array of size more than 1MB will throw an exception.

When your program suffers from such allocation problems, you don’t have to reduce your memory consumption, but you may be forced to break your long arrays into smaller blocks. Another solution is to move to 64 bits. 64 bit programs (e.g. on Vista 64 operating system) use 64 bits (8 bytes) to represent pointers, and so their address space is of size 2^64, which is a huge number. This means that they will not fail to allocate large consecutive arrays.

A good link about Windows memory management

If you want to learn more about memory-related terms, such as Virtual Memory, Page File, Working-Set of a process etc…, take a look at this link: http://shsc.info/WindowsMemoryManagement. Also, refer to MSDN article on Memory Management in Windows NT.

Advertisements

1 Comment »

  1. Am I back again in operating systems course ?
    Thaks for refreshing my mind.

    Comment by Doron Osovlanski — September 6, 2009 @ 2:16 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: