Algebraic Mesher in Python

July 14th, 2008

This year I have tried to make it a habit to track how much time I spend on numerical programming. I’d like to put in 10 hours or so a week and this lets me know if I am doing that. Anyway, I went back to my records from February when I was developing the Algebraic Mesher in C. It took me 20.9 hours to complete, with about half of that time spent in a frustrating effort to track down numerous memory problems. The promise of scripting languages is they allow you to develop programs much more efficiently, at the price of slower run time performance. So I decided to create the mesher in Python and see how long it took me.

Method

Four months had elapsed since I finished the C version and I couldn’t recall how I had programmed it. All I remembered is it was a miserable experience. Thus it seemed that I could create a Python version from scratch without undue influence from the previous effort. To get started, I reviewed the equations linking the ID numbers for vertices, faces and cells from earlier, but nothing more. I didn’t look at the C code or my follow up blog entry on the experience.

So I opened up the IDLE editor and looked at a perfectly blank page. I wasn’t sure where to start. Then I thought: I need a cell object. So I made a very simple one:

class cellclass:
    def __init__(self,CID):
        self.CID = CID

All it does is store the ID number of the cell. I made a list of cells and stored some numbers. So far so good. I then made a vertex object, and created a list of vertices:

verts=[]
for i in range(NumRows+1):
    for j in range(NumCols+1):
        VID = (NumCols+1)*i + j
        x = j*delx; y = i*dely
        temvert = vertclass(VID,x,y)
        verts.append(temvert)

In this way I added features incrementally, in baby steps, without really any planning. Proceeding along in this fashion I had the mesh generator finished in 1.5 hours. I couldn’t believe how fast it went. The code wasn’t especially pretty – nearly everything is lumped in one routine – but it seemed to run fine.

Testing

I wanted to make sure I tested the code as thoroughly as I had tested the C version, so I opened up the C code at this point, reviewed the tests there, and replicated them, one by one in Python. With the C version, every test revealed new bugs which had to be laboriously tracked down and fixed. With the Python version, the code passed the first four tests the first time. The fifth test found a bug where I was writing multiple copies of vertices to the face objects. I fixed that in a couple of minutes.

In a little more than 2 hours I had a working mesher. And it was fun to write. The code just seemed to flow out of my finger tips (and I am not especially skilled at Python). Yes, this code probably should be rewritten into a more modular style. And perhaps it could be made more ‘Pythonic’. So maybe another hour or two are needed to really polish it up. But there is something extremely satisfying about getting a rough version working with so little effort.

One major disappointment with the C version of the program was the memory problems. I had spent some time developing a generic list structure in C, so I could make my programming in that language less painful. The list and FOR_EACH macro were supposed to make C more like Python. The thought was good, but the experience was not.

Stats

The table shows the time and lines of code needed for each version of the code. From Prechelt’s study, we expect Python to be 3.8x faster and 3.6x shorter than C/C++. Roughly the speed up in development time mirrors that in lines of code. In my case, the speed up was nearly 9x, while the needed lines of code was about 2x less. This may be due to the inordinate amount debugging time the C code required. In any case the general trend holds: Python is a lot faster to develop in.


.            Time            --Lines of code --
Lang      (hrs)   Mesher   Test Code  Total
-------------------------------------------------------------------------------------
 C         20.9    299      163         462
Python      2.4    124       89         213
-------------------------------------------------------------------------------------
Ratio       8.7x   2.4x     1.8        2.2x

Run Times

The big downside of fast development time is slow execution time. Python does all kinds of nice things for you (like memory management), but that comes at a price. To measure this, I had both the C and Python meshers create a million cell mesh and timed them. Here are the results:


.            Time      RAM
Lang      (sec)     (GB)
------------------------------------------------------------------------------------
 C            6.9   0.366
Python      344     1.33
------------------------------------------------------------------------------------
Ratio       50x     3.6x

This test was run on an Intel Core2 Duo (T7300) at 2 GHz, with 2GB of RAM. I think much of the time, for both programs, went into memory allocation. This was corroborated when I tried compiling the C version with various optimizations and got no change in the timing. There may some ways to allocate memory in bigger chunks, and thus speed up the code.

There are tools for speeding up the execution of Python programs. Tools like Psyco and Pyrex can offer dramatic improvements, at least under some conditions. That might be a good experiment for next time.

File: PyAlgMesher.py

Programmer Efficiency

April 14th, 2008

Programming is a slow process. You write a few lines, test them, find some errors and write a few more lines. The C programming language seems especially slow to me. Being statically typed, you must declare every variable. This takes time. Being a low level language, each statement typically doesn’t do a whole lot. And since its compiled, you periodically must leave the editor and compile and test it. But what really eats the time is the very ugly mistakes you can easily make in C. If you forget to allocate some memory, there is no pleasant reminder to set you back on track. Instead you get a cryptic message and mysterious crash. Thus a lot of time is spent tracking such issues.

Another thing that eats a lot of time is the whole pass by value convention of C. Do I pass var, *var, **var, &var, etc? When the data structures are complex and interconnected it can be darn confusing. You pretty much have to draw it all out on a piece of paper.

When caught up in the frustrations of C programming I often think: there must be a better programming language. Recently I did an internet search to see if anyone has studied programmer efficiency as a function of programming language. While I didn’t find much, there is a very interesting study by Lutz Prechelt at Karlsruhe University: Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and Tcl against C, C++, and Java

Prechelt’s Study

In this study Prof. Prechelt gave programmers a specific programming task. They had to convert phone numbers into words from a German dictionary using a particular encoding scheme. The programmers had different levels of experience and came from different backgrounds, but Prechelt was able to extract meaningful results from this experiment.

Prechelt looked at many different things, like memory consumption, run time, and reliability of the programs. In those measures, the scripting languages did surprisingly well. Of course a well written C program will be faster than a well written Python program, but in this study there was a huge variability in the results and many of the Python programs were as fast as many of the C/C++ or Java programs. However, our interest is in programmer efficiency. Look at this graph:

Line of Code

It shows that the scripting languages (TCL, Rexx, Python and Perl) required an average of 98 lines of code to complete the program (not including comments). The non-scripting languages (C, C++, Java) needed around 277 lines. That is a huge difference. If you look at the shortest programs, the results are even more dramatic. The shortest C/C++ program was 150 lines, while the shortest Python script was 42 lines. That is 3.6 times more code for the C/C++ program.

There is a rule of thumb in programming that it takes about the same amount of time to program a line of code regardless of language. If true, this suggests that the script writers completed their task more quickly than the compiled language programmers. That was indeed the case. Here is the plot from Prechelt’s paper:

Programming Time

Sure enough, the same trend is seen. The average scripting time was 3.8 hours while the non-scripting languages are at 14.3 hours. The minimum reported times were 1.5 hours for Python and 10.5 hours for C/C++. We probably shouldn’t put much weight on the minimum values, since the times were self reported. But the averages are worth comparing. The scripting programs were developed 3.8x faster. That is amazing when you think about it. A task that might take a week of slogging in C could be done in a little over a day with Python.

Amazing, but True?

So how believable are these results? If we trust Prof Prechelt’s methodology, they are believable for the phone-code problem, but do they apply to numerical programming? There is a hint in the paper:

The shorter program length of the script programs can
be explained by the fact that most of the actual search is
done simply by the hashing algorithm used internally by
the associative arrays. In contrast, the non-script programs
require most of the elementary steps of the search process
to be coded explicitly by the programmer. This is further
pronounced by the effort (or lack of it) for data structure
and variable declarations.

This suggests that the scripting languages were more efficient because they had high level tools, like associative arrays, that were well suited to the phone-code problem. One could argue that had suitable tools existed in the non-scripting languages, they would have enjoyed similar efficiencies.

There is a slight problem with this argument. Namely some of the non-scripting languages do have such tools, but the programmers didn’t use them:

It is an interesting observation that despite the existence
of hash table implementations in both the Java and
the C++ class libraries none of the non-script programmers
used them (but rather implemented a tree solution by hand),
whereas for the script programmers the hash tables built
into the language were the obvious choice.

The implication of this is that it is not just the presence of high level tools that gives scripting languages their efficiency; it is also the ease with which you can use those tools. Python is a good example of this. After you import a module, you can access that module’s documentation from the command line:

>>>import somemodule
>>>help(somemodule)

Not only is this very handy, it also encourages one to explore what is available in modules. In addition, the language is easy to read, so you can peruse the source code for additional insights. This greatly lowers the threshold for reusing other people’s code, and that is a great productivity booster.

What about Speed?

Scripting languages are slow, typically 10x to 100x slower than compiled languages, in terms of execution speed. Is it really helpful to create a numerical program in record time if it takes two months to run? Scripting advocates point to the Pareto principle: 20% of the routines take 80% of the execution time. Thus it makes sense to code everything in the scripting language, see where the bottlenecks are, and rewrite those in C, C++ or Fortran. Sounds good. I would like to try some experiments to see for myself.

Equation Tee Shirts

March 26th, 2008

Last summer, my daughter got a job at CafePress, a company that enables you to make custom tee shirts and other items, and sell them online. There are literally thousands and thousands of designs people have come up with. It is fascinating to scroll through all of the offerings. Whatever your political, religious or philosophical bend, there is a tee shirt for you. Need a humorous saying? They have them, by the truck load. Need a pithy commentary on the cruelty of life? It’s there. Just about anything you can imagine is there.

With this huge assortment, I figured there would be tons of interesting equation tee shirts available. There are, if you are a fan of Maxwell’s equations. You can find many variations of this classic:

Maxwell’s Equations

but there wasn’t much else. I didn’t see any fluid dynamics or CFD offerings, so I decided to create my own. So far I have made five of them. Here are the images:

White pink Blue

Green Yellow

I have created an “online store” to house these offerings. You can reach it at www.cafepress.com/basicnumerics. Once you upload an image, you can put it on a wide variety of products. I can chose several styles of shirt (both men and women), a coffee mug and a couple of bags.

Making Equations

When I decided to make the shirts, I figured it would take 30 minutes. I would fire up Microsoft Word, create an equation with Equation Editor, put a cute saying under it, color it, save it as an image and upload it. It didn’t work out that way. I hit snags at every step. First, I couldn’t resize the equations to the appropriate size without Word running out of memory. I am still puzzled by this, but I was able to repeat it several times. The continuity equation would display fine with a 12 point font, but when I scaled it up to 72 point, Word would start displaying out of memory errors and lock up. Additionally, I couldn’t color the equations using Word’s Equation Editor.

It is possible that the more feature rich MathType would work better, but I didn’t have access to that. Instead, I downloaded OpenOffice v2.3 and used the Math module to create my equations. There were no memory problems, but I still couldn’t colorize the equations. Instead, I used Paintshop Pro for this. CafePress recommends that tee shirt images be 2000 by 2000 pixels, with a resolution of 200 dpi. The math module has no way to export a graphic image, so I made the equations quite large on the screen and did a “PrintScreen” to copy the image to the clipboard. On a large monitor, you can get a 1900 pixel wide image this way. I then used Paintshop Pro for the rest.

Recipe

Here is the recipe I used to prepare the images:

  • Make the equation in Open Office Math using 24 pt font.
  • Open equation on a monitor with at least 1900 pixels of width. Use the magnifier in Open Office to blow it up to full screen. Hit PrintScreen
  • Open Paintshop Pro and create a 2000×2000 image at 200 dpi, with transparent background.
  • Paste the equation in as a New Image, crop it, copy it and paste as a Layer into the 2000×2000 image
  • Use the Background Eraser tool to remove the white background around the equation
  • Use the Color Replacer tool to replace the black in the equations with blue at an RBG value of (32, 72, 254)
  • Create a vector layer and use Text tool to create a saying to accompany the equations. Set it to a font of 72, 108 or more (whatever fits). Use the font Freestyle Script. (This is set while the text is selected in the text window.)
  • Set text to the same color. (Set background color to the same RGB of 32, 72, 254).
  • Place the phrase about halfway in window and the equation about ¼ of the way down.
  • Save file as a pspimage file (native Paintshop format) and as a png file.
  • Upload the png file to CafePress.

The recipe is a bit convoluted, and I am sure there are more efficient ways, but it does work.

Creating the Shirts

Once the images are uploaded to CafePress, its easy to put them on products. You simply click on the items you want and add them to your “store”. I chose a variety of shirts, a mug and a couple of bags. Check them out at www.cafepress.com/basicnumerics.

Algebraic Mesher

March 13th, 2008

In looking for an alternative to the half edge meshing scheme, I hit upon the idea of using algebraic relations between cells, faces and vertices in order to build a mesh. Those relationships were described in the previous post. Here I want to describe how I implemented this in C.

Data Structures

Like my Half Edge mesher, the MESH_TYPE data structure consists of lists of all the vertices, faces and cells in the mesh.

struct mesh_type {       //contains all mesh info
    ALIST cells;   //cell list
    ALIST faces;   //face list
    ALIST verts;   //vert list
};

ALIST is the generic array I had defined earlier. It consists of a small data structure that includes a pointer to an array on the heap containing the actual data. In this way, the array can be resized and bounds checking can be done. The figure shows the mesh data structure.

Mesh Data Structure

Up to this point, the structures are the same as in the Half Mesh program. Where things change is in the data structures for the cells, faces and vertices. For instance a face structure looks like

struct face_type {
    ALIST verts;  //list of ptrs to face verts
    ALIST cells;  //list of ptrs to faces 1 or 2 cells
};

Each face in the list of faces contains lists of pointers to elements in these three master lists. Trying to draw all the relationships can get confusing. This pictures gives an idea for the faces of a cell

Cell Data Structure

Each cell in the master list contains a list of four face pointers. These pointers point to the four faces in the master list where this cell’s faces reside. Imagine how confusing that picture would be if I were to try and draw all the connections between cells, vertices and faces?

Accessing the Data

Suppose you want to average the x and y coordinates all the vertices of a cell, to make sure it matches the cell center. (This is one of the tests I used to check the meshing code.) This can be done easily using the new improved FOR_EACH macro from the generic list include file:

double totX, totY,dx,dy;
VERT_TYPE **ppVert;
FOR_EACH(pCell, pMesh->cells)
{
    totX = 0; totY = 0;
    FOR_EACH(ppVert, pCell->verts)
    {
        pVert = *ppVert;
        totX += pVert->x[0];
        totY += pVert->x[1];
    }
    //Find diff betw cell center and vert ave
    dx = pCell->x[0] - totX/4;
    dy = pCell->x[1] - totY/4;

The outer loop is over the cells in the master list of cells for the mesh. The way the FOR_EACH works, is it puts a pointer to the current cell into pCell on each pass through the loop. This allows us to dereference the fields of the cell. It is the inner loop, over the cell vertices, where confusion can set in. The vertex list in a cell structure holds pointers to the actual vertices residing in the master vertex list. Since FOR_EACH returns a pointer to whatever is in the list, it returns a pointer to a pointer. That is why I defined the variable ppVert. In order to gain access to the x and y position of the vertex, we dereference ppVert, to get a pointer to the vertex. From their we can get the position. These dereferencing subtleties were not at all clear to me as I wrote this. In addition, I hadn’t always allocated memory for my lists. The result was the memory debugging nightmare I mentioned previously.

Assessment

The entire motivation for this project was to generate a meshing program that had some of the elegance of the half edge mesh, while being better suited for CFD applications. Was it successful? It is hard for me to say at this point. In terms of code length, the two implementations are nearly identical. Thus one is not a lot clumsier than the other. One thing I find troubling about the algebraic meshing approach is you can’t look at the code and tell if its right. When you see a line like

pVert = getItem(verts,CID+i+1);  //SE vert

how do you know that pVert is really the south east vertex of cell CID? You need to sketch out the grid and work out the relationships. Also, I am still reeling a bit from all the memory problems I had. The half edge mesh routine came together a lot more easily.

All in all, I would say the jury is still out on this one. But now that I have created it, I hope to use it to do some work with Immersed Boundaries.

Download

If you want to play with the code to generate meshes, you can download it here. Put the files ArrayType.inc and AlgMesher.c in the same folder and compile and run:

gcc -Wall AlgMesher.c
./a.exe

When main runs, it will call a test function that tests five different aspects of a simple 100×100 mesh. It writes out the results to a text file. In this way, I tried to make the code relatively bug free. Enjoy.

Meshing by the Numbers

March 8th, 2008

We saw earlier that the half edge data structure is a very elegant way to create a mesh. However, as I pointed out recently, this approach does have a couple of drawbacks for general CFD applications. Are there other elegant approaches worth considering? One interesting possibility is to use the algebraic relationships between vertices, faces and cells to generate the connections between them. If the mesh is rectangular, and you number the components systematically, you can work out these relationships.

Cell and Vertex Numbering

Cells & Vertices

Lets say we number our cells from left to right, starting with the bottom row, as in the figure. Any given cell can be expressed in terms of its column number and row number:

CID = N_cols*i + j

The Cell ID (CID) is found by multiplying the number of columns in the grid (N_cols) by the current row number i, and adding the current column number j. We are following the C convention of starting our numbering from zero. We can number the vertices in the same way, starting in the lower left corner. Without too much trouble, we can see that if you know the ID of the cell, you can work out the ID of the south west vertex of that cell:

VID_SW = CID + i

where VID_SW is the ID number of the southwest vertex. The equation works out this way because any row of N cells has N+1 vertices.

The ID for the southeast vertex is just one more bigger than the southwest:

VID_SE = CID + i + 1

To get the index for the northern vertices, we need to add a row of vertices to the index of the corresponding southern vertex:

VID_NW = CID + i + (N_Cols + 1)
VID_NE = CID + i + 1 + (N_Cols + 1)

Face Numbering

Faces

The numbering scheme for the faces is a bit trickier. You can just name them sequentially starting from the south west corner, but I think its a bit easier if you create two lists: a list of horizontal faces (i.e., the north and south faces) and a list of vertical faces (i.e., the east and west faces). The figure shows the idea, with the horizontal faces numbered in black and the verticals in blue. With this approach, the southern faces have the same numbering as the cells, thus

FID_S = CID

The northern face ID numbers can be found by taking the southern face ID and adding the number of columns in the grid:

FID_N = CID + N_cols

For the east and west faces, each row has one more face than there are cells. This is the same as the vertex numbering, thus

FID_W = CID + i
FID_E = CID + i + 1

In practice, we don’t really want to have two lists of faces. It is more convenient if we can loop through a single list. This can be achieved by appending the east-west list onto the north-south list. Once that is done, the indices for the east and west faces will need to be increased by the number of horizontal faces:

FID_W = N_HFaces + CID + i
FID_E = N_HFaces + CID + i + 1

where N_HFaces = N_Cells + N_cols.

Implementation

I have written a C program to implement these ideas, but it was a torture. I made heavy use of my generic list structure and in the process hit a bunch snags. C is notorious for giving you the power to shoot yourself in the foot, and I dare say that I don’t have many toes left. I made a ton of memory mistakes and it took me more than 10 hours of debugging to find and correct them all. In the process, I made a number of improvements to the generic list routines (including writing a bunch of documentation), so hopefully this won’t happen again. I’ll discuss the implementation more in the next post.

Rethinking the Half Edge Mesh

March 1st, 2008

In earlier discussions (here and here), I extolled the benefits of the half edge mesh. It is an elegant way of compactly storing 2d mesh information. Moreover, it allows you to easily do adjacency tests (e.g., what are all the edges touching this vertex?). However, upon further reflection, two weaknesses have occurred to me:

  1. It is lacking for the finite volume method.
  2. Its not extendable to 3d

I don’t know that these are show stopper considerations, but they do give one pause. Lets look at them in more detail.

Finite Volume

The key tenet of the finite volume method is the balance of fluxes leaving and entering each cell face. This is what makes this approach intrinsically conservative, unlike the finite element method.

At some point in the execution of a finite volume code, we loop over the faces in the domain and calculate the fluxes crossing those faces. Because each pair of cells share one face, we know that the material leaving cell A must flow into cell B. But the half edge data structure doesn’t have faces; it has half faces. There is no single entity that connects both cells. So how do we achieve this conservation? The simplest way is probably to create a full face data structure that each half face points to. We can then calculate the fluxes on this face as normal. While this adds additional complexity to the scheme and reduces its elegance, it should be perfectly doable.

3d

What really makes me doubt the viability of the half edge approach is in generalizing it to 3d. In 2d, the vertices, edges and cells are connected together in a simple fashion. For instance, the edges around a cell are connected, one to the next, in a chain. This makes it easy to create a linked list of edges to go around the cell.

Face linkage in 2d, 3d

In 3d, each cell is surrounded by six faces. There are many ways to connect these faces together in a linked list. The figure shows one of them. That is workable, but how do you get an adjacency test out of it? For instance, lets say you want to know all the faces that surround a vertex. I can’t figure out any clever way of looping through the half faces to achieve this.

Next Steps

I am working on another mesh generator that uses algebraic relationships between cells, faces and vertices to establish the connections between these entities. It has a certain elegance to it as well. Unfortunately, in implementing it, I hit all kinds of memory and pointer problems that took awhile to work out.

Imprinting a Polygon

February 9th, 2008

Lets say we have a rectangular mesh, perhaps created with our half mesh tool, and we want to imprint upon it some polygon shape. For instance, we may want to model the temperature or fluid flow in such a polygon and need a mesh in that shape. By imprinting the polygon onto the mesh, we can create a simple stair step mesh that can be used for modeling.

Stair Step Mesh

What is the criteria for determining if a cell is inside the polygon or not? Clearly any cell that is fully inside the polygon would be considered “in”. What about one where one of the vertices is outside? Or two are outside? For the figure above, the imprinting follows a simple rule: if the cell center is in the polygon, then the cell is inside.

Shooting Rays

To create our stair step mesh, we need to cycle through the cells in the rectangular mesh and check each cell center to see if its inside. The classic way to determine if a point lies inside a polygon is to shoot a ray from this point and see how many times it crosses the polygon. If its an even number, the point is outside. An odd number of crossings means its inside. The Wikipedia has a nice article on this here.

Even-Odd Rule

Hitting a vertex – A rare Event?

There is a subtlety with this scheme. Your ray may strike one of the vertices of the polygon. In that case, you will likely get double counting. It will appear that your ray has crossed both polygon segments N and N+1. As a result, you may conclude exactly the wrong result. My initial reaction to this situation was: it could happen, but will be extremely rare. Wrong! Both meshes and polygons are created by humans. And humans tend to like round numbers. It is much more likely you will have a vertex point at (1,1) than at (1.0001, 0.9999). Similarly for the cell positions. In my own testing, I hit this problem right away.

The solution is found in how we think about a segment. Rather than thinking of a segment as two points and the line that connects them, we can think of it as just one point and the line. The figure shows the idea.

Vertex Problem

This can be implemented with a judicious use of the > and >= operators.

Implementations

A quick search of the internet reveals many implementations of this odd-even crossing approach. Bob Stein’s article compares a few implementations, looking at lines of code, number of if statements and so on. The routines he looks at range from 44 to 91 lines of code and include from 2 to 19 if statements, and 0 to 2 divides. When I first implemented this, some years ago, my code ran 70 lines and included 7 if statements and 3 divides. In my defense, my code also checked how close a point was to a segment and included the point if it was within some tolerance. Still, it seems a bit bulky compared to some of the implementations Stein looked at. He was justifiably proud of his implementation, which is only 44 long with no divides.

A bit more looking on the web reveals the incredibly tiny 12 line routine of W. Randolph Franklin. It contains only one if and one divide. I tried it out for a couple of cases and it does seem to capture the essense of the method without a single wasted character. Here is the Franklin’s routine:

int pnpoly(int npol, float *xp, float *yp, float x, float y)
{
  int i, j, c = 0;
  for (i = 0, j = npol-1; i < npol; j = i++) {
    if ((((yp[i]<=y) && (y<yp[j])) ||
         ((yp[j]<=y) && (y<yp[i]))) &&
        (x < (xp[j] - xp[i]) * (y - yp[i]) / (yp[j] - yp[i]) + xp[i]))

      c = !c;
  }
  return c;
}

(Note this code is Copyright © 1970-2003, Wm. Randolph Franklin )

Stair Step Mesh

With a routine like this, we can easily create a stairstep mesh. You can envision a program something like this:

int main()
{
    MESH_TYPE *pMesh;
    CELL_TYPE *pCell;
    int i;

    //Make a triangle
    float xp[10] = {0., 1.0, 1.0};
    float yp[10] = {0., 0., 1.0};

    //Create mesh
    pMesh = createMesh(0., 0., 1.0, 1.0, 0.1, 0.1);

    //Determine which cells in mesh are 'live'
    FOR_EACH(i, pCell, *(pMesh->cells))
    {
        pCell->live = pnpoly(3, xp, yp, pCell->x, pCell->y,)
    }
    return 0;
}

For this to work, we need to add a few more fields to the cell structure defined previously. Specifically, we need the cell center coordinates and a flag stating whether the cell is live or dead.

Some additional work would be required to actually use this stair step mesh in a CFD code. For instance, it is not efficient to keep dead cells in our cell list. Every time we loop through the cells, we will need an if statement: if pCell->live .... Also, we have not identified the boundary faces or made any effort to transfer the boundary conditions from the polygon segments to the edge faces of the stair step mesh. These are tasks for another day.

Half Edge Mesh Implementation

February 2nd, 2008

In a previous post, I discussed the elegance of the Half Edge data structure for storing mesh information. This time, we will implement such a scheme in C. The goal is simple: create a program that meshes a 2d rectangle with a structured grid, while storing the information in the Half Edge data structure.

If we know the length and width of the rectangle, and the cell size, we can compute the number of rows and columns in our mesh. We can then loop over these rows and columns, creating cells, faces and vertices and linking them together. As we create these mesh elements, we will store them in lists, using the generalized arrays we developed earlier.

We will build the mesh row by row. The first step is to create the south most row of vertices. As we construct a row of cells, we will be building them on top of the vertices of the previous row. So for the very first row cells, we must first create a row of verts. Once this is done, we enter a pair of nested for loops where we loop over the rows and columns of the mesh. Here is the basic idea


for (i=0; i<numRows; i++)
    {
        Prepare the row
        for (j=0; j<numCols; j++)
        {
           Set corner vertices of cell
           Create new cell
           Create faces of cell
           Make remaining connections
           Hook up the pairs of WF and NF faces
           Save a few items to help with next cell or next row
         }
        wrap up this row
    }

Once the first row of verts is created, we begin creating cells. For the first cell of a row, we must create a north west vertex. For all other cells on that row, the north east vertex of the previous cell becomes the north west vertex for the new cell. We then need only create one vertex per cell, specially the north east one. This is shown in Step 2 of the figure. With the four vertices of the cell defined, we can then create the four faces of the cell (step 3), then create the cell and link everything together. Its then on to the next cell. In this step by step fashion we build up the mesh.

MeshSteps

A subtlety

There is a subtlety I didn’t mention in the previous post. There are 8 half faces that touch a vertex. How do you know which one to store in the vertex? For an interior vertex it doesn’t really matter, as long as its consistent with the scheme for cycling through the faces around the vertex. Specifically, if you store a face pointing toward the vertex, then you cycle through the faces with the command


face = face->pair->nextF;

If you pick a face that points toward the vertex, then you use


face = face->nextF->pair;

For vertices at the edge of the domain, greater care is needed. Not every face has a pair near the edge, so a command like


face = face->pair->nextF;

can lead to errors. Also, we need to pick the edge that allows us to go fully around the vertex. In the picture below, this can only be satisfied for vertex V by picking face F.

Faces around a vertex

In general, we need to pick the most counter clockwise edge, relative to the vertex. (That is because our circular linked list of faces is organized in a counter clockwise direction.)

More on Lists

In writing this simple mesher, I made use of the generalized arrays developed earlier. In the process of using these, I came across various things that didn’t work that well, so I changed them. For instance, the listAppend function originally returned an index to the data array where the new data was written. It turned out to be more convenient to return a pointer to that data.

I also realized that the creation and destruction functions were overly restrictive. They assume the user has already created the list object and only needs the data array allocated. There are times however, when the user would like both allocated and deallocated for him. So there are now two sets of creation and destruction functions:


void createListData(ALIST *list, long maxEl, size_t elSize)
void destroyListData(ALIST *list)

void createList(ALIST **list, long maxEl, size_t elSize)
void destroyList(ALIST *list)

This provides greater flexibility, but also adds complexity. Speaking of complexity, the astute reader will note in the source code that I did not use these flexible arrays for the lower and oldNF arrays. Instead I used fixed arrays, dimensioned to 1000 elements. A flexible array would be perfect here, since it will auto-expand to however many rows are needed. But the extra hassle didn’t seem worth it for some something so simple.

If I had used these flexible arrays, setting and getting values would have looked like:


LL = *(VERT_TYPE *)getItem(lower, j);
setItem(lower, j) = UL;

instead of


LL = lower[j];
lower[j] = UL;

That is a serious loss of readability. When getting the item from the list, you have to remember that getItem returns a pointer to the data element, so you must dereference it. To be proper, you need to cast the pointer first. That makes for some pretty darn ugly code. I could use a macro to clean it up. Perhaps something like:


#define GET_ITEM(pList, index, datatype) *(datatype *)getItem(pList, index);

I will have to experiment with it.

Wrap Up

If you want to play with the code to generate a half edge mesh, you can download it here. Put the files ArrayType.inc nd HalfMesher.c in the same folder and compile and run:


gcc -Wall HalfMesher.c
./a.exe

When main runs, it will call a test function that tests different aspects of a simple 10×10 mesh. It writes out the results to a text file. This is how I made sure the code was relatively bug free.

Half Edge Mesh - Intro

January 26th, 2008

When I created the mesh generator for BFlow, I used a straightforward but unimaginative set of data structures for storage. A mesh is made up of cells, faces, edges and vertices. Unstructured meshes have arbitrary relationships between these (unlike a structured mesh), and thus those relationships must be stored.

In a 2d, a mesh looks like this:

mesh

My BFlow data structures to hold the information for a mesh look like this:


TYPE CellType
    node(4) AS LONG   '1=SW,2=SE,3=NE,4=NW (1st 4 are the corners)
    Face(4) AS LONG   '1=East, 2=west, 3=North, 4=South
    nb(4) AS LONG     'Neighbor cells. 1=East, 2=west, 3=North, 4=South
END TYPE

TYPE FaceType
    node(2) AS LONG  'ptrs to two nodes of face
    cell(2) AS LONG  'ptrs to neighbor cells
END TYPE

TYPE VertType
    face(4) AS LONG  'pointer to touching faces, 1=East,2-West,3=North,4=South
    cell(4) AS LONG  'pointer to touching cells, 1=SW,2=SE,3=NE,4=NW
END TYPE

(These are Basic declarations, but they would look very similar in C.) You can see that everything is stored. A cell can have four faces and four nodes. Pointers to them are stored in that cell structure. A vertex can be touched by four faces. Pointers to all those faces are stored for each vertex.

There is a very elegant alternative storage scheme called the Half Edge Mesh. Amazingly little information is stored, but it provides everything you need! The key idea is that rather than store edges (or faces in 2d), you store half edges (half faces). While an ordinary face is defined by its two end points, a half face is directed and defined by just a single point. The picture shows the idea.

mesh2

This probably looks like a step in the wrong direction, if we are trying to simplify things. Yet, this change is really helpful. The data structure for a half face looks like:


struct face_type {
    VERT_TYPE *vert;  //vertex at the end of the half-face
    FACE_TYPE *pair;  //Oppositely oriented adjacent half-face
    CELL_TYPE *cell;  //cell that this face belongs to
    FACE_TYPE *nextF;  //next half-face around the cell
};

The vertex and cell structures are given below:


struct vert_type {
    double x;
    double y;
    FACE_TYPE *face;  //Ptr to a touching face
};

struct cell_type {
	FACE_TYPE *face;   //south face of cell
};

The cell structure seems especially spartan, at least compared with what I started with. How do you access the faces of the cell? Or its vertices? It turns out to be very easy:


FACE_TYPE* face = cell->face;
do
{
    // ...some face calcs...
    face = face->nextF;
} while (face != cell->face);

The cell faces are effectively organized into a circular linked list. Thus the above code works equally well for a quad mesh as a polyhedral mesh.

Accessing all of the vertices around the cell is nearly as easy:


VERT_TYPE *vert;
FACE_TYPE *face = cell->face;
do
{
    vert = face->vert;
    // ...some vert calcs...
    face = face->next;
} while (face != cell->face);

Now lets try something trickier. Say you needed to loop over all of the cells surrounding a vertex? This could come up in a CFD model, for instance, when interpolating a field value from the cell centers to the vertices. Here is the code:


CELL_TYPE *cell;
FACE_TYPE* face = vert->face;
do
{
    cell = face->cell;
    // ...cell calcs...
    face = face->nextF->pair;
} while (face != vert->face);

This takes advantage of the fact that pairs of faces are oriented in opposite directions. In the figure below, face F1 is pointed upward. Its pair face, F2, is pointed downward. Each face stores a pointer to the vertex that it is pointing to. F1 stores a pointer to V1, while F2 stores a pointer to V2.

Mesh3

You can trace out logic by picking one of the faces pointing toward vertex V2, then following the steps. It is very elegant.

Compared to my original scheme, the Half Edge approach is both more efficient (in terms of memory use) and more flexible (since its not limited to quad meshes). In my next entry, I will explore this in more detail.

Naming Conventions in C

January 12th, 2008

Having done much of my programming in Basic, I have been sweetly oblivious to the importance of case in naming variables. In Basic, NumNodes is the same as numnodes, numNodes and NUMNODES. As a result, you can type any combination that suits you and all is well. Your eye doesn’t even notice the differences after awhile. That all changes when you program in C. I am translating some meshing code from PowerBasic to C for my next blog entry and the gcc compiler is giving me tons of errors like:


HalfMesher.c: In function `createMesh':
HalfMesher.c:242: error: `OldNF' undeclared (first use in this function)
HalfMesher.c:254: error: structure has no member named `Faces'

And that is after going through the code line by line looking for such things!

CompoundNames

This got me thinking about naming conventions for variable names and functions. There is an amazing amount written on the topic and feverous religious battles have been fought. Still, a few patterns emerged. In the C world, there is a tendency to use all lower case for simple variable and function names. For compound names, like CellVolume, there are several options:

  • Run together: cellvolume
  • Underscores: cell_volume
  • Lower mixed case: cellVolume
  • Upper mixed case: CellVolume

Mixed case is one of several names for capitalizing the first letter of each word. (Others are camel case, medial capitals, InterCaps.) In lower mixed case, this is applied only to the words after the first. In the list above, the first two combining methods seem the most common for C programmers, followed by lower and finally upper mixed case.

I have to say that I don’t care for the run together approach because it can be ambiguous for some word combinations. For instance consider the variable cellslip. It could be interpreted as cell_slip or cells_lip. As you read the code, you need to stop and decipher this variable name. That may only take a moment, but it is an extra burden on the reader. The underscore method addresses this issue nicely, but adds to the bulk of variables, and can clutter the code. Mixed case gives you the benefits of word separation, without the extra bulk. Because C programmers have tended to prefer lower case, lower mixed case seems like the best approach for compound names.

Other Issues

It turns out there are many more issues that come up with name conventions. For instance, terseness. The C examples in Kernighan and Ritchie are filled with one and two letter variable names. Their code is compact, but it can be hard to read. At the other extreme, there is the natural language approach of Kari Laitinen. Here is a snippet of code taken from his site:


   cout <<  "n Please, type in two integers separated "
        <<  "with spaces:  " ;

   cin  >>  first_integer_from_keyboard
        >>  second_integer_from_keyboard ;

   sum_of_two_integers  =  first_integer_from_keyboard  +
                           second_integer_from_keyboard  ;

It is very readable, but that is partly because it is also very simple. If you are solving complex equations, with lots of variables in them, your code will be highly cluttered if you follow this approach. Which is easier to read:


y = a*x*x + b*x + c;

or


Answer_to_problem_7 = First_Coefficient*x_variable*x_variable
                    + Second_Coefficient*x_variable
                    + Third_Coefficient;

The example above introduces the importance of idioms. Even very terse names can convey meaning if they follow a common idiom. Any mathematically aware persion will recognize the above equation as that of a second order polynomial. For this case single letter names are fine. Similarly, using i and j as index names for loops is a common idiom in programming, and so is fine too.

What is the optimum length for variable names? Steve McConnell discusses this in Code Complete. He references a 1990 study by Gorla, Benander and Benander that suggests names in the range of 10 to 16 characters where the easiest to debug.

Another important point is that care is required with global variables. Local variables can be understood in the context of the function they reside in. Global variables are generally without context. As you read code, they pop up randomly, and you need to understand them. Thus, their names must be descriptive enough to standalone, within reason.

Naming Convention

Here are the rules I have come up with for myself when programming in C. Following these should both make my life easier, and hopefully make my code easier to read.

  • Single word variable names are lower case: face
  • Compound variable names use lower mixed case: theLongName
  • Functions follow same convention as variables
  • Macros are upper case: MAX
  • Typedefs are upper case and end in _TYPE for complex types: FACE_TYPE
  • Names for lists or collections typically end in ’s’: cells
  • Global variables start with g and are more detailed: gDiscretizationScheme
  • If it helps readability, pointers can be prefixed with p: pFace
  • Abbreviations are to be minimized.
  • Idioms are used to shorten names where it makes sense