7

I need to get a string checksum or hash (or something equivalent) using just the C preprocessor, if possible.

The use case is as follows: I'm doing error logging on an embedded device with very limited memory and cpu. I would like to define a LogError() macro which inserts hash(__FILE__) and __LINE__ in a circular buffer (as 16bit numbers). But hash(__FILE__) needs to be compiled to a constant; if the actual filenames are stored as strings in the program that would use too much memory. The hash can be calculated using any method.

It is possible to #define FILE_ID with some unique number at the top of every file, and use that when logging, but that is not the preferred solution, it has a bit of a maintenance cost. Is there a better method?

10
  • 2
    In C, expressions like "foo.c"[0]+9*"foo.c"[1] are not constant expressions, but when used in code, may still actually compile to constants. Commented Feb 22, 2015 at 5:08
  • What if two files have the same hash? Commented Feb 22, 2015 at 5:12
  • @immibis: I guess I'd have to take my chances; there aren't anywhere near 62k files. Less worried about that than about someone accidentally defining FILE_ID to the same number in several files, if using the manual method... I would probably write a python script to update the IDs in that case. Commented Feb 22, 2015 at 5:15
  • 3
    This is not possible. Your compiler may or may not fold your hash function invocation into a constant; I would rather bet on it not being able to do so. The solution with FILE_ID is actually very low maintenance if you use something like -DFILE_ID=$$(myhashpgm $<) in your makefile. Commented Feb 22, 2015 at 5:39
  • 2
    This question is quite awkward since using the pre-processor to hash a string is useful in some cases - but almost certainly NOT a good solution for this use-case. I added own answer, but wonder if your detailed use-case should be a separate question on calculating logging ID's for primitive architectures. Commented Oct 4, 2017 at 10:41

3 Answers 3

7

The question "How to calculate the hash of a string literal using only the C preprocessor?" is valid, however I think you're adding a red-herring by including details about __FILE__ and logging ID's.

This means anyone answering needs to solve the problem you describe, or answer the question on hashing a string with the pre-processor (which may not be a good solution in your particular case!).

As it happens, __FILE__ expands to variable, not a literal string (GCC at least), so you will need to define the filename as a constant. You could use the build system to pass along a define for each for example.

As others have pointed out you can calculate the hash and pass this in via the build-system, although this avoids the question about hashing a sting literal.


Whatever the case, this question comes up when I searched for using the pre-processor for hashing, and none of the answers cover this, so heres is an answer that covers the string hashing part.

This is possible, albeit quite verbose

/**
 * Implement compile-time string hashing on string literals.
 *
 * This macro implements the widely used "djb" hash apparently posted
 * by Daniel Bernstein to comp.lang.c some time ago.  The 32 bit
 * unsigned hash value starts at 5381 and for each byte 'c' in the
 * string, is updated: ``hash = hash * 33 + c``.  This
 * function uses the signed value of each byte.
 *
 * note: this is the same hash method that glib 2.34.0 uses.
 */

#define SEED 5381

#if 0
// correct but causes insane expansion
#  define _SH(e, c) (((e) << 5) + (e) + (unsigned char)(c))
#elif defined(__GNUC__)
// Use statement-expression extension
#  define _SH(e, c) ({ unsigned int _e = (unsigned int)(e); (_e << 5) + _e + (unsigned char)c; })
#else
// use an inline function, the compiler will be able to optimize this out.
static inline unsigned int _SH(unsigned int e, unsigned char c)
{
    unsigned int _e = (unsigned int)e;
    return (_e << 5) + _e + (unsigned char)c;
}
#endif

#define _SH_1(a) _SH(SEED, (a)[0])
#define _SH_2(a) _SH(_SH_1(a), (a)[1])
#define _SH_3(a) _SH(_SH_2(a), (a)[2])
#define _SH_4(a) _SH(_SH_3(a), (a)[3])
#define _SH_5(a) _SH(_SH_4(a), (a)[4])
#define _SH_6(a) _SH(_SH_5(a), (a)[5])
#define _SH_7(a) _SH(_SH_6(a), (a)[6])
#define _SH_8(a) _SH(_SH_7(a), (a)[7])
#define _SH_9(a) _SH(_SH_8(a), (a)[8])
#define _SH_10(a) _SH(_SH_9(a), (a)[9])
#define _SH_11(a) _SH(_SH_10(a), (a)[10])
#define _SH_12(a) _SH(_SH_11(a), (a)[11])
#define _SH_13(a) _SH(_SH_12(a), (a)[12])
#define _SH_14(a) _SH(_SH_13(a), (a)[13])
#define _SH_15(a) _SH(_SH_14(a), (a)[14])
#define _SH_16(a) _SH(_SH_15(a), (a)[15])
#define _SH_17(a) _SH(_SH_16(a), (a)[16])
#define _SH_18(a) _SH(_SH_17(a), (a)[17])
#define _SH_19(a) _SH(_SH_18(a), (a)[18])
#define _SH_20(a) _SH(_SH_19(a), (a)[19])
#define _SH_21(a) _SH(_SH_20(a), (a)[20])
#define _SH_22(a) _SH(_SH_21(a), (a)[21])
#define _SH_23(a) _SH(_SH_22(a), (a)[22])
#define _SH_24(a) _SH(_SH_23(a), (a)[23])
#define _SH_25(a) _SH(_SH_24(a), (a)[24])
#define _SH_26(a) _SH(_SH_25(a), (a)[25])
#define _SH_27(a) _SH(_SH_26(a), (a)[26])
#define _SH_28(a) _SH(_SH_27(a), (a)[27])
#define _SH_29(a) _SH(_SH_28(a), (a)[28])
#define _SH_30(a) _SH(_SH_29(a), (a)[29])
#define _SH_31(a) _SH(_SH_30(a), (a)[30])
#define _SH_32(a) _SH(_SH_31(a), (a)[31])

// initial check prevents too-large strings from compiling
#define STRHASH(a) ( \
    (void)(sizeof(int[(sizeof(a) > 33 ? -1 : 1)])), \
    (sizeof(a) == 1) ? SEED : \
    (sizeof(a) == 2) ? _SH_1(a) : \
    (sizeof(a) == 3) ? _SH_2(a) : \
    (sizeof(a) == 4) ? _SH_3(a) : \
    (sizeof(a) == 4) ? _SH_3(a) : \
    (sizeof(a) == 5) ? _SH_4(a) : \
    (sizeof(a) == 6) ? _SH_5(a) : \
    (sizeof(a) == 7) ? _SH_6(a) : \
    (sizeof(a) == 8) ? _SH_7(a) : \
    (sizeof(a) == 9) ? _SH_8(a) : \
    (sizeof(a) == 10) ? _SH_9(a) : \
    (sizeof(a) == 11) ? _SH_10(a) : \
    (sizeof(a) == 12) ? _SH_11(a) : \
    (sizeof(a) == 13) ? _SH_12(a) : \
    (sizeof(a) == 14) ? _SH_13(a) : \
    (sizeof(a) == 15) ? _SH_14(a) : \
    (sizeof(a) == 16) ? _SH_15(a) : \
    (sizeof(a) == 17) ? _SH_16(a) : \
    (sizeof(a) == 18) ? _SH_17(a) : \
    (sizeof(a) == 19) ? _SH_18(a) : \
    (sizeof(a) == 20) ? _SH_19(a) : \
    (sizeof(a) == 21) ? _SH_20(a) : \
    (sizeof(a) == 22) ? _SH_21(a) : \
    (sizeof(a) == 23) ? _SH_22(a) : \
    (sizeof(a) == 24) ? _SH_23(a) : \
    (sizeof(a) == 25) ? _SH_24(a) : \
    (sizeof(a) == 26) ? _SH_25(a) : \
    (sizeof(a) == 27) ? _SH_26(a) : \
    (sizeof(a) == 28) ? _SH_27(a) : \
    (sizeof(a) == 29) ? _SH_28(a) : \
    (sizeof(a) == 30) ? _SH_29(a) : \
    (sizeof(a) == 31) ? _SH_30(a) : \
    (sizeof(a) == 32) ? _SH_31(a) : \
    (sizeof(a) == 33) ? _SH_32(a) : \
    0)
// last zero is unreachable

// only for comparison
unsigned int strhash_func(const void *str)
{
    const signed char *p;
    unsigned int h = 5381;

    for (p = str; *p != '\0'; p++) {
        h = (h << 5) + h + (unsigned int)*p;
    }

    return h;
}

/* -------------------------------------------------------------------- */
#include <stdio.h>

#define TEST_STR1 "Hello World"
#define TEST_STR2 "Testing 123"
int main(void)
{
    unsigned int A = STRHASH(TEST_STR1);
    unsigned int B = STRHASH(TEST_STR2);

    printf("String hash: const %u <- '%s'\n", STRHASH(TEST_STR1), TEST_STR1);
    printf("String hash: const %u <- '%s'\n", STRHASH(TEST_STR2), TEST_STR2);
    printf("String hash: dyn %u <- '%s'\n", strhash_func(TEST_STR1), TEST_STR1);
    printf("String hash: dyn %u <- '%s'\n", strhash_func(TEST_STR2), TEST_STR2);

#if defined(__GNUC__)
    printf("Is this known at compile time?, answer is: %d\n", __builtin_constant_p(A));
#endif
}

Note, for some reason Clang 5.0 prints answer is: 0, however on closer inspection is does in fact know the value at compile time, __builtin_constant_p just doesn't seem to work as GCC does.

Sign up to request clarification or add additional context in comments.

Comments

0

You're asking too much of the preprocessor.

Better just build each file with a 16bit checksum of its name defined as preprocessor macro, as @n.m suggests.

One trifling difficulty for this solution is laying your hands on a 16bit checksum utility. The GNU cksum tool is just 32-bit.

But FreeBSD has a better one that lets you choose 16 or 32 bits. If your development system is a Debian derivative then you can get it by:

sudo apt-get install freebsd-buildutils

Then run:

dpkg-query -L freebsd-buildutils

to see where the FreeBSD cksum has been installed. In my case, it is:

/usr/bin/freebsd-cksum

but you might find differently.

You direct FreeBSD cksum to produce a 16bit checksum by passing the option -o 1. You can check its man page.

Take care when generating the preprocessor macro that defines a filename checksum that you define the checksum as a 16 bit unsigned int, as that's what you want. Should it come out as a plain decimal numeral, for instance, it would default to signed int, which could cause you surprises.

Here's a sketch of the solution with GNU make:

main.c

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    printf("%hu\n",FILE_CHKSUM);
    return 0;
}

Makefile

.phony: all clean

SRCS = main.c
OBJS = $(SRCS:.c=.o)
# Take the 1st word of the output of `echo <filename> | freebsd-cksum -o 1`
FILE_CHKSUM = $(word 1,$(shell echo $(1) | freebsd-cksum -o 1))

all: prog

%.o:%.c
    $(CC) $(CPPLAGS) $(CFLAGS) -DFILE_CHKSUM='((uint16_t)$(call FILE_CHKSUM,$<))' -c -o $@ $< 

prog: $(OBJS)
    $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LDLIBS)

clean:
    rm -f $(OBJS) prog

Running make:

cc   -DFILE_CHKSUM='((uint16_t)3168)' -c -o main.o main.c 
cc   -o prog main.o

Run prog:

./prog
3168

Comments

0

How to do this in CMake ? I can not figure out how to run script for every source file, and the result place as a value for -DFILE_CHEKSUM

Method with STRHASH, has a big limitation - it takes maximum of 32 characters as input, and when I have longer names, it just does not compile. I think that computing it before build and passing it as -D is beter approach, but I just can not manage to do it in cmake :(

1 Comment

never mind, I used ``` static inline unsigned int strhash() { const char *p = FILE; // ignore clang-diagnostic-static-local-in-inline - expected behaviour in this case static unsigned int h = 5381; // NOLINT if(h!=5381) { return h; } for (;*p != '\0'; p++) { h = (h << 5) + h + (unsigned int)*p; } return h; } ``` so after calculating it once in a file, I will just return fast

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.