Memory Management in C

Two things that really drive me crazy when coding in C are how to handle errors and memory management. I want to discuss memory management techniques.

When I first started working in C lifetimes ago, I didn't worry about memory allocation too much. I probably used fixed size allocation. In fact, the thought of having to use a malloc probably scared the pants off me. But malloc isn't the only way to allocate memory, is it? There's allocation that happens when you compile, and variable length arrays and alloca (which are nearly the same thing). Every time you need to allocate some memory or call a function, you need to think about these things. Who is going to allocate the memory -- the caller or the callee? Is it going to be on the stack or the heap? Who's going to free it? What kind of data structure will hold the memory? How do you deal with freeing the memory when there's an error?

The very first thing that you should know is what not to do. When you write a function and allocate some memory on the stack, do not ever keep a reference to that memory once the function goes out of scope.

char *foo(void)
{
    char b[]="hello world";
    return b;
}

That string is on the stack. When the function returns, you will have a pointer to memory that isn't yours. I had to do this a few times before I learned this, and it still tempts me. The trouble is, introductory texts on programming don't tell you these things. When you're learning a language that's close to the hardware, it's important to understand what's really going on. So, what are your options here? You could try this.

char *foo(void)
{
    char *b = malloc(sizeof(char)*(11+1));
    strcpy(b,"hello world");
    return b;
}

This memory is ours until the process quits or calls free on it. But now the caller needs to free it. It's also unsafe because I didn't check to see if malloc returned NULL. What if you need to allocate memory for multiple variables? You can't return multiple pointers, but you might try this.

void foo(int *a, char *b)
{
    *a = 10;
    strcpy(b,"hello world");
}

int main(void)
{
    int *a;
    char *b;
    
    if(!(a=malloc(sizeof(int))))
	return 1;
    if(!(b=malloc(sizeof(char)*(256))))
	return 1; //a will be freed on return
    foo(a,b);
    free(a); //just for demonstration
    free(b);
    return 0;
}

It seems less likely now that we would cause a memory leak since both malloc and free are both called in the same function. Of course, there is still the opportunity. Not to mention we have made the higher level code ugly. I would prefer to relegate the ugliness to the lower level functions and keep the high level ones clean since people might be reading and using them. You might clean it up a little.


int main(void)
{
    int a;
    char b[256];
    
    foo(&a,b);
    return 0;
}

Now that memory is on the stack. But we can still use it in foo as it remains valid until we exit from main. There is nothing wrong with this, but you have to be very careful about putting too much memory on the stack. You can request as much heap space as you want until the OS tells you there is no more available. You won't get a warning when you run out of stack space. And I believe programs are generally only given a relatively small amount by default. Not to mention you might accidentally assign a pointer to that memory and later try to access it when the function that allocated it has gone out of scope. Of course, you could always allocate the memory in the callee as I did before if you're really determined.

int foo(int **a, char **b)
{
    if(!(*a=malloc(sizeof(int))))
        return -1;
    if(!(*b=malloc(sizeof(char)*(11+1))))
        return -1;
    **a = 10;
    strcpy(*b,"hello world");
    return 0;
}

int main(void)
{
    int *a;
    char *b;
    
    foo(&a,&b);
    return 0;
}

What have I done here? If you declare a pointer in the calling function and allocate the memory there, you can access the memory that is being pointed to in the callee. But you can't change what the pointer is pointing to in the callee. This is what would happen if you tried.

char *b_caller=NULL; //b_caller --> NULL
char *b_callee;
b_callee = b_caller; //b_callee --> NULL
b_callee = malloc(sizeof(int));
/* Now b_callee --> NULL (still) and b_caller --> memory for integer */

You see? The b_callee pointer did not change the b_caller pointer. All this did was allocate some memory for an integer and point b_callee to it. The pointer in the caller is still NULL. You need pointers to pointers.

char *b_caller=NULL; //b_caller --> NULL
char **b_callee;
b_callee = &b_caller; //b_callee -->  b_caller --> NULL

*b_callee = malloc(sizeof(int));
/* You now need to dereference b_callee everywhere it is used so that
you are referring to b_caller.  What you have after this statement is
//b_callee --> b_caller --> memory for integer 
Thus you have changed b_caller so that it no longer points to NULL. */

As another example look at it this way.

b_callee --> b_caller1 --> 100

I can change the number using either variable.

*b_caller1 = 99;  //b_caller1 --> 99
**b_callee = 98;  //b_caller1 --> 98

Or I could change what b_callee points to. But that's not what we want.

b_callee --> b_caller1 --> 100
             b_caller2 --> 80

b_callee = &b_caller2;

b_callee -|   b_caller1 --> 100
          |-> b_caller2 --> 80 //**b_callee --> 80

That was fun. But we need to add another variable, c. Oops, and another, d. Darn it, foo is used 97 times. We now have a maintenance nightmare. What is needed is a data structure. A carefully thought out data structure will solve our problems. And even a not carefully thought out one will help a lot. Let the code be a slave to the data. The data (and the interface) will dictate the structure of the code.

struct foovar {
    int *a;
    char *b;
};

struct foovar *new_fv(void)
{
    struct foovar *self;

    if(!(self=malloc(sizeof(struct foovar))))
        return NULL;
    if(!(self->a=malloc(sizeof(int))))
        return NULL;
    if(!(self->b=malloc(sizeof(char)*(11+1))))
        return NULL;
    *(self->a) = 10;
    strcpy(self->b,"hello world");
    return self;
}

int main(void)
{
    struct foovar *fv;
    
    if(!(fv=new_fv()))
        return 1;
    printf("%d %s\n",*(fv->a),fv->b);
    free_fv(fv); //frees a foovar
    return 0;
}

And now we have something reminiscent of object oriented code. Notice, I can add c and d to the structure (object) without changing anything else. The memory allocation is all handled in one place. The ugliness is hidden. I didn't put lots of memory on the stack. And you can easily write more functions that can be applied to struct foovar.

fv = new_fv();
res = action1(fv);
res = action2(fv);
res = action3(fv);
free_fv();

If you like, you can still pass a foovar to new_fv and have the memory allocated this way. You will still need to use pointers to pointers unless you allocate the structure on the stack (which will be less memory than allocating a and b on the stack). But this will allow you to return error codes from the functions. Now you can do all sorts of crazy things with foovar, like make it a hash table that can store linked lists, trees, and hash tables of pointers to arrays. And main doesn't care about the memory allocation. As far as it's concerned, it might as well be using the simple malloc/free idiom. The object foovar handles its own memory allocation. And this brings me to the next topic.

Memory Management in C Two things that really drive me crazy when coding in C are how to handle errors and memory management. I want to discuss memory management techniques. When I first started working in C lifetimes ago, I didn't worry about memory allocation too much. I probably used fixed size allocation. In fact, the thought of having to use a malloc probably scared the pants off me. But malloc isn't the only way to allocate memory, is it? There's allocation that happens when you compile, and variable length arrays and alloca (which are nearly the same thing). Every time you need to allocate some memory or call a function, you need to think about these things. Who is going to allocate the memory -- the caller or the callee? Is it going to be on the stack or the heap? Who's going to free it? What kind of data structure will hold the memory? How do you deal with freeing the memory when there's an error? The very first thing that you should know is what not to do. When you write a function and allocate some memory on the stack, do not ever keep a reference to that memory once the function goes out of scope. char foo(void) { char b[]="hello world"; return b; } That string is on the stack. When the function returns, you will have a pointer to memory that isn't yours. I had to do this a few times before I learned this, and it still tempts me. The trouble is, introductory texts on programming don't tell you these things. When you're learning a language that's close to the hardware, it's important to understand what's really going on. So, what are your options here? You could try this. char foo(void) { char b = malloc(sizeof(char)(11+1)); strcpy(b,"hello world"); return b; } This memory is ours until the process quits or calls free on it. But now the caller needs to free it. It's also unsafe because I didn't check to see if malloc returned NULL. What if you need to allocate memory for multiple variables? You can't return multiple pointers, but you might try this. void foo(int a, char b) { a = 10; strcpy(b,"hello world"); } int main(void) { int a; char b; if(!(a=malloc(sizeof(int)))) return 1; if(!(b=malloc(sizeof(char)(256)))) return 1; //a will be freed on return foo(a,b); free(a); //just for demonstration free(b); return 0; } It seems less likely now that we would cause a memory leak since both malloc and free are both called in the same function. Of course, there is still the opportunity. Not to mention we have made the higher level code ugly. I would prefer to relegate the ugliness to the lower level functions and keep the high level ones clean since people might be reading and using them. You might clean it up a little. int main(void) { int a; char b[256]; foo(&a,b); return 0; } Now that memory is on the stack. But we can still use it in foo as it remains valid until we exit from main. There is nothing wrong with this, but you have to be very careful about putting too much memory on the stack. You can request as much heap space as you want until the OS tells you there is no more available. You won't get a warning when you run out of stack space. And I believe programs are generally only given a relatively small amount by default. Not to mention you might accidentally assign a pointer to that memory and later try to access it when the function that allocated it has gone out of scope. Of course, you could always allocate the memory in the callee as I did before if you're really determined. int foo(int a, char b) { if(!(a=malloc(sizeof(int)))) return -1; if(!(b=malloc(sizeof(char)(11+1)))) return -1; a = 10; strcpy(b,"hello world"); return 0; } int main(void) { int a; char b; foo(&a,&b); return 0; } What have I done here? If you declare a pointer in the calling function and allocate the memory there, you can access the memory that is being pointed to in the callee. But you can't change what the pointer is pointing to in the callee. This is what would happen if you tried.
char b_caller=NULL; //b_caller --> NULL char b_callee; b_callee = b_caller; //b_callee --> NULL b_callee = malloc(sizeof(int)); /* Now b_callee --> NULL (still) and b_caller --> memory for integer */
You see? The b_callee pointer did not change the b_caller pointer. All this did was allocate some memory for an integer and point b_callee to it. The pointer in the caller is still NULL. You need pointers to pointers.
char b_caller=NULL; //b_caller --> NULL char b_callee; b_callee = &b_caller; //b_callee --> b_caller --> NULL b_callee = malloc(sizeof(int)); /* You now need to dereference b_callee everywhere it is used so that you are referring to b_caller. What you have after this statement is //b_callee --> b_caller --> memory for integer Thus you have changed b_caller so that it no longer points to NULL. / As another example look at it this way. b_callee --> b_caller1 --> 100 I can change the number using either variable. b_caller1 = 99; //b_caller1 --> 99 **b_callee = 98; //b_caller1 --> 98
Or I could change what b_callee points to. But that's not what we want. b_callee --> b_caller1 --> 100 b_caller2 --> 80 b_callee = &b_caller2; b_callee -\| b_caller1 --> 100 \|-> b_caller2 --> 80 //*b_callee --> 80 That was fun. But we need to add another variable, c. Oops, and another, d. Darn it, foo is used 97 times. We now have a maintenance nightmare. What is needed is a data structure. A carefully thought out data structure will solve our problems. And even a not carefully thought out one will help a lot. Let the code be a slave to the data. The data (and the interface) will dictate the structure of the code. struct foovar { int a; char b; }; struct foovar new_fv(void) { struct foovar self; if(!(self=malloc(sizeof(struct foovar)))) return NULL; if(!(self->a=malloc(sizeof(int)))) return NULL; if(!(self->b=malloc(sizeof(char)(11+1)))) return NULL; (self->a) = 10; strcpy(self->b,"hello world"); return self; } int main(void) { struct foovar fv; if(!(fv=new_fv())) return 1; printf("%d %s\n",*(fv->a),fv->b); free_fv(fv); //frees a foovar return 0; } And now we have something reminiscent of object oriented code. Notice, I can add c and d to the structure (object) without changing anything else. The memory allocation is all handled in one place. The ugliness is hidden. I didn't put lots of memory on the stack. And you can easily write more functions that can be applied to struct foovar. fv = new_fv(); res = action1(fv); res = action2(fv); res = action3(fv); free_fv(); If you like, you can still pass a foovar to new_fv and have the memory allocated this way. You will still need to use pointers to pointers unless you allocate the structure on the stack (which will be less memory than allocating a and b on the stack). But this will allow you to return error codes from the functions. Now you can do all sorts of crazy things with foovar, like make it a hash table that can store linked lists, trees, and hash tables of pointers to arrays. And main doesn't care about the memory allocation. As far as it's concerned, it might as well be using the simple malloc/free idiom. The object foovar handles its own memory allocation. And this brings me to the next topic. next