Adi Levin's Blog for programmers

June 1, 2011

Incremental coding

Filed under: programming — Adi Levin @ 5:15 pm
Tags: ,

As I keep developing my programming skills, I am always looking for ways to make my work more efficient, and to write bug-free code. In particular, I use what I like to call incremental coding.

The idea is that when you modify a feature in your program or add a new feature, you should take small steps, and perform testing and code review, and commit to the source-control after each step. I always prefer this over writing a lot of code that doesn’t work, and then entering a long debugging phase.

When your development is done in small steps, code review becomes much more efficient. When there only few changes in each step, you can almost prove the correctness of the code by reviewing the differences from the previous version. If there are lots of differences in each step, code review is still important, as it can detect certain issues, but it can no longer serve as a proof of correctness. 

The other advantage of incremental coding is that unit testing becomes more efficient, because of the same reason. When the changes are small, it is easier to design unit-tests or integration tests that prove the correctness of the code.

And what I also like about this approach is that your program works all of the time. You don’t enter a long period of time in which the program doesn’t work. You get to see the effect of every little change separately, and this helps find bugs.

I do as much incremental coding as possible, and from my experience, it keeps the number of bugs low, and shortens the development cycle significantly.

Advertisements

December 13, 2010

Determinism

Filed under: Multithreading,programming — Adi Levin @ 6:30 pm
Tags: ,

Multithreaded computations are sometimes non-deterministic, meaning that they produce different results in different runs that have the exact same input. In some cases, this is a result of insufficient synchronization between threads, causing Race Conditions (A Race Condition is actually defined as a critical non-deterministic behavior). In other cases the non-deterministic behavior is not critical, but still we would like to avoid it, because debugging  a non-deterministic program is more problematic.

In the following, I discuss two common reasons for non-detemrinism:

Order of floating-point computations

Let’s consider a multithreaded program that performs summation of an array of floating point numbers. Suppose that each thread computes the sum of a part of the array, and at the end the partial sums are summed together. The order at which the numbers are summed is imporant. In floating-point arithmetic, (a+b)+c is not always equal to a + (b+c). Since the number of running threads can be different from one run to another, and the timing of threads is unpredictable, the order in which the summation is done may be non-deterministic, depending on how the program is written.

One way to avoid this problem, if given the choice, is to use integer arithmetic instead of floating-point.

Another way is to divide the task into a constant number of partial sums that do not depend on the number of threads, then store the intermediate result of each task in an array, and finally sum-up that array in one thread. This way, the order at which numbers meet each-other is not random.

Order of insertion to a container

In a fork-join multithreading model, it is often useful to collect the outcome of tasks in a thread-safe container. After the multithreaded fork-join part is completed, the results in the container are used as input to the next part of the program. The problem is that insertion into the container can happen in an unexpected order. This causes the next part of the program to behave in a non-deterministic way.

One way to solve this problem is to add a sort operation after fork-join part. Simply sort the data in the container, and this will guarantee a consistent input to the next stage of the program.

Another way to solve this is to write the code in a way that each task result knows a-priori its place in the container – which is independent of timing and of the number of running threads.

July 19, 2009

Debugging tips and tricks

Filed under: programming — Adi Levin @ 1:06 pm
Tags: , , ,

What do you do when you encounter a bug that is really hard to debug? Here are a few tips & tricks:

Breakpoints with counter

If your program consistently throws an exception at a specific line, you want to break before the crash happens, and monitor how the program behaves. But just putting a breakpoint is not good, because it will stop whenever you reach that line, and not necessarily in the iteration that throws the exception. To know where to stop, place a breakpoint with a counter just before the suspicious line, set the counter to a high number, and wait for the program to crash. Then, the debugger will show you how many times you passed through the counter, and you’ll use that number to set the target to your breakpoint. In the next run, the breakpoint will stop exactly when you wanted.

Breakpoints with tracing

In Visual Studio, you can instruct the debugger to print values of certain expressions as well as the thread ID and even the call-stack, whenever it reaches a breakpoint. Right-click the breakpoint and choose “when hit…”. The result will appear in the “output” window in the IDE.

Divide and conquer

Suppose your program consistenly throws an exception at a specific line, because of corrupted data (for example, using a pointer that has been deleted). Suppose that you don’t even know the reason, but you know that somewhere before that line, the data got corrupted. So, the reason for the bug could be far away from the line where the exception is thrown. To detect which line of code caused the crash, use a divide-and-conquer approach, if possible.

Let’s regard the program as a series of commands, and suppose the the exception is thrown at line 10,000. You know that the reason for the bug happens somewhere between line 1 and line 10,000. Start by commenting-out lines 5,000-9,999 (assuming it is legal to do so). If the exception is still thrown, you know that the reason for the bug is between line 1 and 4,999. Otherwise, it is between 5,000 and 10,000. Uncomment, and continue by commenting-out half of the suspicious code, until the suspicious area is small enough.

Minidump files in Visual Studio

If your program crashes from time to time, but you cannot reconstruct the bug on your debugger, it is important to get as much information as possible from each crash. You need to write a side application that monitors your application for exceptions, and writes the call-stack and other needed information when an exception occurs.

The best way to do this, is to write a program that attaches to your application as a debugger using the DebugActiveProcess command (see MSDN for documentation). In this debugger program run a function that constantly waits for debug events, using WaitForDebugEvent. A debug event includes a thrown exception, threads being created or deleted, and other events. When you get the debug event, you can check if it is an exception, and get the type of exception (e.g. access violation or division by zero) and thread ID from which the exception was thrown. You should output this information to a text file. Next, you should also output a dump-file that will include the call-stack of all running threads using the command MiniDumpWriteDump.

The result is a “.dmp” file that you can load into Visual Studio, and see the running threads and call-stacks at the time the exception was thrown. To do this, you need to place the dmp file near the executable, and make sure you have PDB files for the relevant executables (EXE and DLL) that you wish to debug. The PDB files should be next to the DLLs. You also need to have the source-code available. Of course, source-code and PDB files should be exactly the ones that were used and created when building the specific version of the application for which the dmp file was created.

If your application does not throw a lot of exceptions on a regular basis, then running such a process that montiors exceptions in your application does not slow it down.It is safe to even distribute your application with this process attached. This way, when a customers complain about a crash, you can ask them to send you the dmp file (or do it automatically), andany other information that your process generates automatically, and you’ll be able to see excatly where your program crashed.

It is important to configure Visual Studio to save debug information (PDB files) even in the release version, for this purpose. It doesn’t slow down the application. You don’t need to distribute the PDB files – you only need to store them on your side. You’ll need one copy of the entire collection of PDB files for each build.

Beeping

Breakpoints are sometimes not useful, because they block execution, and thereby change the timing of the application. I find it useful to place “Beep” commands at certain places in the code, so that I hear a beep, at a different tone, every time certain lines of code are executed. This way I use my hearing to monitor the execution of my application.

Logging

For crashes that are hard to reproduce, it is sometimes necessary to keep a detailed log of the operations that preceeded the crash. Writing a log-file isn’t trivial, especially in a multithreaded application. You need your logger to be able to write lines to the log-file from different threads, without slowing down the application. Simply writing a line to an open file is not good, because when the application crashes, the file will not be closed properly, so the log-file will be missing the last lines, which are the most important ones for debugging.

A good way to write to a log file is via a second process, called a “logger process“. In the application that you wish to debug, perform logging by posting messages to the logger process, using the non-blocking command PostMessage. Use the three parameters of the message – message identifier, wparam and lparam, to send the important information that you want to write to the log file. In the logger process, the window procedure or message loop should handle the incoming messages by writing the appropriate data to the log file.

The data that should appear in a log-file includes the time, thread ID, and the string you want to write. There should be a mapping between message identifier and the string that appears in the log-file. To transfer the mapping to the logger process, you can use WM_COPYDATA.

If you need to pass more information than fits in the three message parameters, consider using named pipes for communicating with the logger proceses.

Alternatively, you can implement a logger not as a different process, but as a dedicated thread inside your application. You can use APCs (the command QueueUserAPC) to send log requests to the logger thread. A dedicated thread is more efficient than accessing the log-file from different threads, because it doesn’t require explicit synchronization (i.e. mutex, semaphore or critical section). The downside of this approach is that you have to close the log-file every time you add a line to it (because you want to close properly upon crash), and this means slowing the application.

Pageheap

If you suspect data corruption due to memory buffer overrun, I propose that you run your application with the pageheap flag turned on. Pageheap is a flag that Windows can apply globally or to a specific executable. It tells Windows to change the way it does heap allocations. It allocates buffers on the heap such that they are follows by a reserved page. If your program tried to access an entry outside the valid range of the buffer, it will throw an exception of type access violation, and you can use your debugger to understand why it happened.

To use it, you will need gflags.exe, which is part of Windows Debugging Tools. You can download a 32 bit version and a 64 bit version for free. After installing the Winbdows Debugging Tools, run gflags.exe (as in global flags), to see the flags that can be turned on and off.

Turning on Pageheap will slow-down your application, but not as much as other memory monitoring applications do (such as Bounds Checker and Rational Purify).

June 17, 2009

What makes one programmer better than others

Filed under: programming — Adi Levin @ 6:53 pm
Tags: , ,

Programming speed and quality

What is the goal of a programmer? It is to create high quality software. Even though it is hard to define quality, it is not hard to assess the value of any module, but it can only be done over time, by observing how useful that module is and how well it responds to the varying requirements. Therefore, an experienced programmer or team-leader working closely with other programmers can easily assess the contribution of each programmer based on the modules that they have written. It is not uncommon to find that a certain programmer is 10 times more productive than his/her colleague – in some cases even 100 times more!

When facing a complaint regarding the poor quality of their programs, many programmers blame their managers or the circumstances for not giving them enough time. “If I had more time – I could do a much better job”, suggesting that the speed of programming conflicts with quality; “It’s true that we should have done that, but there was no time – we had much more urgent problems that needed to be solved” – admitting failure to foresee problems and prevent them from happening. Managers who are not familiar with programming often fail to correctly assess the value of a programmer. The simple claim that speed conflicts with quality appears to be obviously correct.

Many do not realize that speed of programming goes hand in hand with software quality. A fast programmer will finish his project sooner than expected, leaving more time for certain activities that will improve software quality: Testing, adding desired functionality, and doing infrastructure work from which other programmers may benefit as well. A fast programmer can quickly make progress and show partial results, thereby enabling decision makers to evaluate the design and to change it to better fit the requirements, before it is too late to change the design. This way, the final product is better suited to the requirements.  Writing reusable code and investing on infrastructure enhances both the speed of programming (because less code needs to be written for new modules) and the quality of the program (because code that is being used a lot is easier to debug – since bugs in it will appear more often than in code that is rarely being used).

It is not surprising, then, to see a programmer that is exceptionally fast and at the same time known for producing high quality code. It is also common to meet a programmer who is exceptionally slow, and at the same time produces really bad code, that tends to be thrown away in time, and replaced by better code.

That said, I should also stress that a programmer should not hurry too much. Each programming project has its appropriate pace, which is not dictated by product dead-lines. As explained above, writing code too slowly is not good for quality. But writing code too fast is also dangerous, if the programmer skips certain critical stages in the process, such as unit-testing, documentation, analysis and understanding of the requirements.

In conclusion, high quality of software requires to understand the right pace in which the project should develop. A skilled programmer should require more time when needed, and make the optimal use of the given time, by working efficiently and not too slowly.

Sucessful programming

Having said that, I am not suggesting that a programmer’s value should be measured by the speed of programming, in the sense of number of lines of code written per day. This is not a good measure because certain modules contribute much more than other modules, without proportion to the size of their code. A routine that is used by many other routines can have a high degree of contribution; A routine that requires less maintenance (or none at all) contributes more than a routine that needs a lot of maintenance; A routine that contains bugs can have a negative contribution to the product.

Over time, a better measure of the success of a programmer or a programming team, is the ratio between the time they spend putting out fires (treating emergency situations), and the quality-time spent working on actual enhancements. It is very important to monitor this ratio. If it deteriorates, it signifies a problem that may get worse, until it becomes extremely difficult and costly to make changes in the software.

The real challenge is not to write a program, but to write a program that will prove useful and satisfying for many years to come. This is very demanding, because it requires to foresee requirement changes and to enable easy modification of working code. It is challenging, but can be achieved by a talented and professional team of programmers.

Since a program grows and changes over time, a programmer should take measures to make it as easy as possible to understand it and modify it in the future. Therefore, communication among programmers (through good naming, coding conventions and documentation) is as important as writing code that works. Your program should be understood by people – not only by the computer.

The principle of proximity

This can be achieved by adopting certain habits and conventions. In particular, the principle of proximity – related things in the code should be as close as possible to one-another, or easy to find. For example, a declaration of a class or variable should be as close as possible to where it is being used. Functions should be kept short, such that they won’t require endless scrolling to go through them. The principle of proximity also says that documentation inside the source code is the most important part of the system documentation, because it is always there when you need it. You don’t need to look for it. It is also much easier to update comments inside the source code, when making modifications, than to update the related description of the software (such as SRS, SDD) which is written in a separate document and in different terms. In an ideal situation, the code does not require explanations at all, because it is self-documenting, due to the wise choice of names and an intuitive choice of functions, classes and modules.

The principle of visibility

Another important principle in code construction is the principle of visibility, which says that all important information should be made explicit. You don’t get a higher score for keeping secrets. On the contrary – a program should be written in a way that makes the motivation and the meaning in it as visible as possible. Good naming is crucial here. A variable that represents an angle in degrees can be called “ang_degrees”. It is then obvious for a programmer using that variable, that it needs to be converted to radians before computing its cosine. Similarly, a point in screen space and a point in 3D model coordinates should be distinguished from one-another by their names (or even their types) in a way that makes it obvious that they are different creatures, and cannot be added, subtracted or compared to one-another. Such explicit naming saves a lot of time on searching and debugging. In the same spirit, coding conventions are also helpful, because they make it easier for team members to share their code, and find their way around the code of their colleagues.

Team work and Interdependence

Effective team work is a crucial factor in the efficiency of individual programmers. Many programmers strive to be independent – to be able to work on their own with as little support as possible from their colleagues. A more effective team work is achieved when programmers are interdependent. Interdependence means that team members have mutual access to the resources (time, knowledge, source code) of their colleagues. “Interdependence is a higher value than Independence”. Programmers double and triple their efficiency by working together. As an interdependent programmer, you should:

  • Share information and discuss problems informally. Tell other team members about things you’ve learned and how you solved problems. Even after completing a project, tell others about it – even if it is outside their area of responsibility.
  • Ask others to review your code – don’t just present a block diagram. People should have access to your code and have the capability of modifying it if needed.
  • Do not defend yourself against bug reports. Don’t make excuses, saying that a certain “bug” is a “feature”. The people who report bugs to you are really helping you, and you should thank them.
  • Assign the highest priority to interaction with co-workers. Do not arrive late to meetings. Postponing meetings, not showing up, or leaving in the middle of a meeting wastes the time of several people.
  • Be accessible (available) to co-workers. If you help them willingly, they will help you when you need them.
  • Give credit to others for good work that they have done. This will reduce the tension of competition inside the team.

Programmers should share information freely, and not treat pieces of code as their private territory. The important thing is to provide the best service – not to prove that you never make mistakes. Make it your goal to provide the best service to others.

Effective team work is at its best when the team is as small as possible. At the same time, the knowledge of team members should be as diverse as possible. When hiring a new programmer – prefer one that also brings with him knowledge that is different from the knowledge of other team members. It is often preferable to hire a programmer that has good programming experience, but has no experience in the specific domain or technologies of the company. Such a programmer will be highly motivated, and will contribute a fresh point of view.

The value of knowledge

The software industry differs from other engineering fields in the speed in which ideas turn into products. Because of that, the knowledge of programmers is the most important resource. However, the knowledge of a programmer when he is hired to the job is less important to his success than his learning capabilities. Software technologies change all of the time. It is a never ending challenge for a team of programmers to always evolve – never stop studying and learning. Good programmers Study and learn all of the time; Express themselves clearly, in a way that is easy for people to understand; Distinguish between what they know and what they don’t know. There is no one who knows everything. People should feel safe to say “I don’t know” and not be lazy to seek the appropriate knowledge.

Quailty-time with the computer

The work of a programmer includes design, review, writing code, testing and debugging. But this is not enough for really good programming. You need to get to know your program, and you need to get to know the computer. Why is that? Because in many cases, the programs are so complicated, that you don’t really know what goes on inside of them.

Even if you fully understand the routine that you wrote – do you know how many times it is called, and at what circumstances? Do you know what parameters are sent to it?  Every line of code that you write should be stepped-through using the debugger, at least once. Follow the execution of your code with your eyes, and see if it proceeds as expected. Insert breakpoints in every branch of the code, to make sure that you visit it at least once. Use a profiler to measure the performance and to build the call-graph of your functions. A profiler Profiling is extremely important for code optimization. Once you find the bottle-necks of your application, you know what you need to do in order to make it run faster. If you’ve never used profiling before, you will be surprised by what it can make you find out about your own code.

When you use library functions, or functions that were writtens by other people in your company, do you really know what they are capable of ? Or are you just “copying-and-pasting”? You should look at their documentation, or, in some cases, even go through some of their source code.

Quality-time with the computer is necessary for learning new technologies. If you hear of some kind of functionality or function library that interests you, you should find the time to play around with it. Download the library, write a program that uses it, get it to work. This is the best way to learn.

Another opportunity for quality-time occurs when you find yourself looking at old pieces of code that you wrote. Perhaps you’ll find out that some of it is unnecessary, or can be replaced by more modern tools, or needs documentation. Don’t hesitate to make changes in old and running code if it makes it better.

Quality-time is when you sit alone infront of your computer, paying attention to the finest details, absorbing information, playing around.

June 16, 2009

4 things worth doing at work

Filed under: programming — Adi Levin @ 3:41 pm
Tags: , , ,

A programmer should invest his working hours, doing these four things:

1. Develop software

This includes programming, debugging, designing, reviewing code, etc… – all the activities for which you get paid at the end of the month.

2. Learn

You have to learn new technologies, as well as study carefully the old technologies that you are currently using without a full understanding. In software engineering, the distance between an idea and its usage in the product, is very small. Learning quickly pays off.

3. Teach

Anything you’ve learned, before you got this job, or during this job, could be of interest and of use to your colleagues. Some people think it is strategically better to keep information and knowledge to themselves – making them irreplaceable. This is a bad policy, because of three reasons: (a) This is disrespectful, and people want to get respect. Also, it shows that you don’t trust other people – why should they trust you then? (b) It is bad for the organization. A company’s best interest is to increase the knowledge of all programmers – not just one. (c) Everyone is replaceable.

4. Build relationships with colleagues

If you want to be able to strengthen your influence on the product and on the methods that other people use at work, or on your work-place in general, you need to have a good relationship with people around you. Be a friend. Listen, and understand others, before you expect them to listen to you. Find opportunities to accept the ideas of colleagues or subordinates – let them know you trust them.

Some programmers do not pay enough to the three last items in this list. Do you?

Create a free website or blog at WordPress.com.