Having done much of my programming in Basic, I have been sweetly oblivious to the importance of case in naming variables. In Basic, NumNodes is the same as numnodes, numNodes and NUMNODES. As a result, you can type any combination that suits you and all is well. Your eye doesn’t even notice the differences after awhile. That all changes when you program in C. I am translating some meshing code from PowerBasic to C for my next blog entry and the gcc compiler is giving me tons of errors like:
HalfMesher.c: In function `createMesh': HalfMesher.c:242: error: `OldNF' undeclared (first use in this function) HalfMesher.c:254: error: structure has no member named `Faces'
And that is after going through the code line by line looking for such things!
CompoundNames
This got me thinking about naming conventions for variable names and functions. There is an amazing amount written on the topic and feverous religious battles have been fought. Still, a few patterns emerged. In the C world, there is a tendency to use all lower case for simple variable and function names. For compound names, like CellVolume, there are several options:
- Run together: cellvolume
- Underscores: cell_volume
- Lower mixed case: cellVolume
- Upper mixed case: CellVolume
Mixed case is one of several names for capitalizing the first letter of each word. (Others are camel case, medial capitals, InterCaps.) In lower mixed case, this is applied only to the words after the first. In the list above, the first two combining methods seem the most common for C programmers, followed by lower and finally upper mixed case.
I have to say that I don’t care for the run together approach because it can be ambiguous for some word combinations. For instance consider the variable cellslip. It could be interpreted as cell_slip or cells_lip. As you read the code, you need to stop and decipher this variable name. That may only take a moment, but it is an extra burden on the reader. The underscore method addresses this issue nicely, but adds to the bulk of variables, and can clutter the code. Mixed case gives you the benefits of word separation, without the extra bulk. Because C programmers have tended to prefer lower case, lower mixed case seems like the best approach for compound names.
Other Issues
It turns out there are many more issues that come up with name conventions. For instance, terseness. The C examples in Kernighan and Ritchie are filled with one and two letter variable names. Their code is compact, but it can be hard to read. At the other extreme, there is the natural language approach of Kari Laitinen. Here is a snippet of code taken from his site:
cout << "\n Please, type in two integers separated " << "with spaces: " ; cin >> first_integer_from_keyboard >> second_integer_from_keyboard ; sum_of_two_integers = first_integer_from_keyboard + second_integer_from_keyboard ;
It is very readable, but that is partly because it is also very simple. If you are solving complex equations, with lots of variables in them, your code will be highly cluttered if you follow this approach. Which is easier to read:
y = a*x*x + b*x + c;
or
Answer_to_problem_7 = First_Coefficient*x_variable*x_variable + Second_Coefficient*x_variable + Third_Coefficient;
The example above introduces the importance of idioms. Even very terse names can convey meaning if they follow a common idiom. Any mathematically aware persion will recognize the above equation as that of a second order polynomial. For this case single letter names are fine. Similarly, using i and j as index names for loops is a common idiom in programming, and so is fine too.
What is the optimum length for variable names? Steve McConnell discusses this in Code Complete. He references a 1990 study by Gorla, Benander and Benander that suggests names in the range of 10 to 16 characters where the easiest to debug.
Another important point is that care is required with global variables. Local variables can be understood in the context of the function they reside in. Global variables are generally without context. As you read code, they pop up randomly, and you need to understand them. Thus, their names must be descriptive enough to standalone, within reason.
Naming Convention
Here are the rules I have come up with for myself when programming in C. Following these should both make my life easier, and hopefully make my code easier to read.
- Single word variable names are lower case: face
- Compound variable names use lower mixed case: theLongName
- Functions follow same convention as variables
- Macros are upper case: MAX
- Typedefs are upper case and end in _TYPE for complex types: FACE_TYPE
- Names for lists or collections typically end in ’s’: cells
- Global variables start with g and are more detailed: gDiscretizationScheme
- If it helps readability, pointers can be prefixed with p: pFace
- Abbreviations are to be minimized.
- Idioms are used to shorten names where it makes sense