3

I'm just starting to program in c, so i don't have much experience with manual memory management. Coming from python, syntax like the following is very pleasant:

import numpy as np

b = np.array([1,2,3,4,5])
a = np.zeros(b.size)

c = np.sum(np.dot(a,b), np.add(a,b))

So I'm trying to recreate the ease of use to make a library akin to numpy, in c. I know this is in no way practical, but its in part to learn c, and for a numerical methods class I'm currently taking. So far, this is what I've come up with:

/*
array type with size and data
*/
typedef struct array
{
    size_t size;
    float data[];
} array;

/*
array constructor
*/
array *array_init(size_t size, float *to_arr)
{
    array *arr = malloc(
        sizeof(array) + sizeof(float) * size
    );

    if (arr==NULL){
        printf("Couldn't allocate memory");
    }

    if (to_arr!=NULL)
    {
        for (size_t i = 0; i < size; i++)
        {
            arr->data[i]=to_arr[i];
        }   
    }
    
    arr->size=size;
    return arr;
}

/*
array constructor
filled with a float n
*/
array *array_like(size_t size, float n)
{
    array *arr = malloc(
        sizeof(array) + sizeof(float) * size
    );

    if (arr==NULL){
        printf("Couldn't allocate memory");
    }

    for (size_t i = 0; i < size; i++)
    {
        arr->data[i]=n;
    }   
    
    arr->size=size;
    return arr;
}


/*
array destructor
*/
void array_del(array *arr)
{
    free(arr);
}

alongside several functions of a similar style

/*
element-wise array multiplication
*/
array *a_mul(array *a, array *b)
{
    array *tmp = array_init(a->size, NULL);

    for (int i = 0; i < a->size; i++)
    {
        tmp->data[i] = a->data[i] * b->data[i];
    }

    return tmp;
    
}

But I've come to realize that when doing something like

array *a = array_init({3,(float[]){1.0,2.0,3.0}});
array *b = array_init({3,(float[]){1.0,2.0,3.0}});

array *result = a_mul(a_mul(amul(a,b),b),a);

Memory is leaked, as the pointers I've allocated are lost after the function goes out of scope. So the question is, how do i go about it? Is there any best practice going forward? Or do I have to accept that c code is just going to be way more tedious to write?

I've tried looking at libraries like Eigen or even something simpler such as linmath, but it's all header files, and i can't seem to find anywhere where something like malloc() is used.

So far, I've thought of freeing each of the inputs of the function after the operation is done, like

/*
element-wise array multiplication
*/
array *a_mul(array *a, array *b)
{
    array *tmp = array_init(a->size, NULL);

    for (int i = 0; i < a->size; i++)
    {
        tmp->data[i] = a->data[i] * b->data[i];
    }
    
    free(a);
    free(b);
    return tmp;
    
}

I've also looked into the concept of an arena, but i do not know if it would be a better choice. I've also checked another post where a dynamic stack is suggested.

So if anyone has had a similar problem, or knows of a memory management pattern that would be better suitable, it would help a lot.

5
  • 1
    'Or do I have to accept that c code is just going to be way more tedious to write'. Well, C doesn't have a garbage collector, so yes, it will be more tedious to write. But that doesn't mean you will need to do complicated thing. In this situation, what you want is your fonction "a_mul" knowing if he need to clean up the argument he receive. You can do a new function "a_mul_free" that will alway free the arg he receive, or add a new parameter. Commented Feb 7 at 8:40
  • Be careful : your function doesn't take into account if a and b have different size. Currently, if b is smaller than a, you will do out-of-bound access of b. Commented Feb 7 at 8:43
  • Be careful (bis) : Your 'array_init' function doesn't return anything when malloc fail. It will lead to an access ("arr->size=size") while "arr" is NULL, so it will sigsev. If you want to chain your function call, you will have to take into account that maybe your argument are NULL. Commented Feb 7 at 8:52
  • Also, understand that (float[]){1.0,2.0,3.0} is a Compound Literal. There is no memory being allocated. It is a temporary object used to initialize to the data member of the struct. The problem is there is no storage allocated for the struct or the data member when initialization takes place. The problem with a_mul() is that no memory was ever allocated for either a or b so calling free() on each will cause bad things to happen. Commented Feb 7 at 9:01
  • @DavidC.Rankin sorry, I realized the array_init function was not used in my example, corrected it in the post Commented Feb 7 at 18:15

2 Answers 2

3

I like this problem - as a general rule I like to allocate and deallocate memory in the same scope, othervise it becomes hard to determine which piece of code has the responsibility to deallocate.

Looking at your float array I see that you are trying to allocate an entire structure using one malloc, This will not work

/*
array type with size and data
*/
typedef struct array
{
    size_t size;
    float *data; // Change the type to a generic pointer to a float
} array;

/*
array constructor
*/
array array_init(size_t size, float *to_arr)
{

    // Allocate the object on the stack
    array arr = {.size = size}; 

     // Allocate the float array on the heap
    arr.data = malloc(sizeof(arr.data[0]) * arr.size); 

    // this could be done with one call to memcpy
    for (size_t i = 0; i < size; i++)
    {
        arr.data[i]=to_arr[i];
    }   
    
    return arr; 
}

/*
array constructor
filled with a float n
*/
array array_like(size_t size, float n)
{
    // Allocate the object on the stack
    array arr = {.size = size}; 

     // Allocate the float array on the heap
    arr.data = malloc(sizeof(arr.data[0]) * arr.size); 

    // This could be done using memset
    for (size_t i = 0; i < size; i++)
    {
        arr.data[i]=n;
    }   
    
    return arr;
}

These calls now return a structure, which will be returned by value, and the structure in return contains a float array which is passed arround by reference.

When I look at:

/*
element-wise array multiplication
*/
array a_mul(array *a, array *b)
{
    array tmp = array_init(a->size, 0.0);

    for (int i = 0; i < a->size; i++)
    {
        tmp->data[i] = a->data[i] * b->data[i];
    }
    
    return tmp;
    
}

I would change the return value to once again be parsed arround by value, and the call site is responsible for cleaning up the array.

I hope this made some sense:

  • Allocate the structs and the containing array seperatly(when the arrays have arbitrary sizes)
  • Let the calling code 'Own' the lifetime of the allocations, do not call "free" on its arguments.
Sign up to request clarification or add additional context in comments.

5 Comments

Mmmm, technically, float data[]; is a FAM (flexible array member), which can be allocated for along with the scruct in a single allocation. See C11 Standard - 6.7.2.1 Structure and union specifiers(p18) But note the limitations in Paragraph 3 of that section.
And how would one go about chaining the functions and freeing the array initialized inside when defining the operations like that? Even using something like a_mul((array[]){a_mul(&a,&a)}, &a), memory is still leaked. Or is chaining bad practice?
@AlbertoMéndez-DettmerMuñoz I'm not sure, I think you would have to do some hackary to get away with that. maybe keep a list of allocations in a context that you track or something, essentially you would need some thing like a reference counting to be able to track who has the pointer, and when to let go of it.
@DavidC.Rankin, that sounds interesting, and like the sort of thing you should never do :)
@MartinKristiansen Is it inevitable that tracking the allocations would be hacky? Or is there an established way to do so? Would it be too bad for performance?
2

You are thinking along the correct lines, but you must preserve a pointer to every block of memory allocated in order to free() that memory when done with it. You have identified the problem, you just haven't figured out how to fix it.

In addition to the problem with preserving pointers, you also have the following issues:

  • You cannot use a brace-initializer containing a Compound-Literal to initialize a pointer. array *a is a pointer, not a struct. You are attempting to initialize the pointer address (the value held by a pointer) to 3 and the compound-literal (float[]){1.0,2.0,3.0} is just a meaningless excess-initializer as far as the compiler is concerned.
  • Your float data[] member is a FAM (flexible array member). See C11 Standard - 6.7.2.1 Structure and union specifiers(p18) , but also see limitations on its use in C11 Standard - 6.7.2.1 Structure and union specifiers(p3).
  • You can allocate for the struct + FAM in a single allocation, but you must preserve the resulting pointer.
  • You cannot assign NULL to the data[] member individual elements in a_mul() by calling array *tmp = array_init(a->size, NULL);, just use 0. instead of NULL with array_like() to zero initialize the memory for data[], e.g. array *tmp = array_like (a->size, 0.);,
  • use size_t in your for() loop instead of int to prevent comparison between signed and unsigned values.

Correcting those issues, you can do what you are attempting similar to the following:

#include <stdio.h>
#include <stdlib.h>

/*
 * array type with size and data
 */
typedef struct array
{
    size_t size;
    float data[];
} array;

/*
 * array constructor
 */
array *array_init(size_t size, float *to_arr)
{
    array *arr = malloc (sizeof(array) + sizeof(float) * size);

    if (arr == NULL){
        fputs ("Couldn't allocate memory\n", stderr);
        return NULL;
    }

    for (size_t i = 0; i < size; i++)
    {
        arr->data[i]=to_arr[i];
    }   
    
    arr->size = size;
    
    return arr;
}

/*
 * array constructor
 * filled with a float n
 */
array *array_like(size_t size, float n)
{
    array *arr = malloc (sizeof(array) + sizeof(float) * size);

    if (arr == NULL){
        fputs ("Couldn't allocate memory\n", stderr);
        return NULL;
    }

    for (size_t i = 0; i < size; i++)
    {
        arr->data[i] = n;
    }   
    
    arr->size=size;
    
    return arr;
}

/*
 * array destructor
 */
void array_del(array *arr)
{
    free (arr);
}

/*
 * element-wise array multiplication
 */
array *a_mul(array *a, array *b)
{
    array *tmp = array_like (a->size, 0.);
    
    if (tmp == NULL) {
      perror ("a_mul-malloc");
      return NULL;
    }

    for (size_t i = 0; i < a->size; i++)
    {
        tmp->data[i] = a->data[i] * b->data[i];
    }

    return tmp;
}

void prn_array (array *a)
{
  for (size_t i = 0; i < a->size; i++) {
    printf (i ? ", %4.1f" : "[ %4.1f", a->data[i]);
  }
  puts (" ]");
}

int main (void) {

  array *a = array_init (3, (float[]){1.0,2.0,3.0});
  array *b = array_init (3, (float[]){1.0,2.0,3.0});
  
  /* validate every allocation */
  if (a == NULL || b == NULL) {
    return 1;
  }
  
  /* output initial arrays */
  puts ("a and b:\n");
  prn_array (a);
  prn_array (b);
  
  /* multiply individual results to save pointer to free */
  array *res1   = a_mul (a,b);        /* validation omitted for brevity */
  array *res2   = a_mul (res1, b);    /*   "  */
  array *result = a_mul (res2, a);    /*   "  */
  
  /* output intermediate and final result */
  puts ("\nres1, res2 and result:\n");
  prn_array (res1);
  prn_array (res2);
  prn_array (result);
  
  /* now free the structures and data */
  array_del (a);
  array_del (b);
  array_del (res1);
  array_del (res2);
  array_del (result);
}

note: with single-allocation single-free, just typing free(a) is a lot shorter than array_del (a), but that is completely up to you.

Note in main() each of the pointers are preserved, e.g.

int main (void) {

  array *a = array_init (3, (float[]){1.0,2.0,3.0});
  array *b = array_init (3, (float[]){1.0,2.0,3.0});
  ...
  /* multiply individual results to save pointer to free */
  array *res1   = a_mul (a,b);        /* validation omitted for brevity */
  array *res2   = a_mul (res1, b);    /*   "  */
  array *result = a_mul (res2, a);    /*   "  */
  ...

You must do incremental multiplication, but that's the price you pay to preserve the pointers. (note: technically you could previously declare res1 and res2 and then do assignment within parenthesis in nested calls, e.g. a_mul((res2 = a_mul((res1 = amul(a,b)),b)),a), but that becomes an unreadable error-prone mess quickly, but it can be done...)

Example Use/Output

Compiling the code above and running it will produce the following output:

$ ./array-data-struct
a and b:

[  1.0,  2.0,  3.0 ]
[  1.0,  2.0,  3.0 ]

res1, res2 and result:

[  1.0,  4.0,  9.0 ]
[  1.0,  8.0, 27.0 ]
[  1.0, 16.0, 81.0 ]

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.


$ ./array-data-struct
==70268== Memcheck, a memory error detector
==70268== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==70268== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==70268== Command: ./bin/array-data-struct
==70268==
a and b:

[  1.0,  2.0,  3.0 ]
[  1.0,  2.0,  3.0 ]

res1, res2 and result:

[  1.0,  4.0,  9.0 ]
[  1.0,  8.0, 27.0 ]
[  1.0, 16.0, 81.0 ]
==70268==
==70268== HEAP SUMMARY:
==70268==     in use at exit: 0 bytes in 0 blocks
==70268==   total heap usage: 6 allocs, 6 frees, 1,124 bytes allocated
==70268==
==70268== All heap blocks were freed -- no leaks are possible
==70268==
==70268== For lists of detected and suppressed errors, rerun with: -s
==70268== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have questions.

3 Comments

And what about arenas or a dynamic stack? The thing is, i would really like to not use intermediate variables, as they make the math itself less readable. I wouldn't mind spending a lot more time designing the memory management part, if there is a more complex approach suitable, and not too inefficient
Also, valgrind hasn't been able to detect the leak, but the -fsanitize=address flag in clang always finds them.
In your case there was no leak because there was no allocation on a or b (only failed initialization, and if it got that far a crash when you attempted to free() a memory address not previously allocated by malloc, calloc or realloc). You cannot get by without intermediate variables. You can come up with schemes to minimize there visibility, like a global queue of pointers that hold allocated pointers and you call free on when done. You would add to the queue internally in your functions, like a_mul(), but you have to preserve the pointer in order to free() it later -- period.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.