19

A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.

I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.

#include <stdio.h>
#include <string.h>

void printx(void *rec) { // I know this should have been a **
    char str[1000];
    memcpy(str, rec, 1000);
    printf("%*.s\n", 1000, str);
    printf("Whoa..!! I have not crashed yet :-P");
}

int main(int argc, char **argv) {
    void *x = 0; // you could also say void *x = (void *)10;
    printx(&x);
}
4
  • 10
    This is undefined behaviour, so not crashing is a perfectly fine result. Use a proper memory checking tool if you want to debug this sort of thing. Commented Jul 25, 2013 at 7:59
  • It just that I have passed a pointer to pointer and when the memcpy tries to dereference the pointer in printx() function and tries to copy some garbage 1000 bytes, it should have crashed Commented Jul 25, 2013 at 8:00
  • Memory checker such as valgrind try to report those kind of things. Otherwise there is no waranty. Commented Jul 25, 2013 at 8:11
  • See: Hotel. Commented Jul 25, 2013 at 12:27

5 Answers 5

29

I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.

Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.

All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.

The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.

This code would not be so harmless if any of the C runtime conventions are not true:

  • The architecture uses a stack
  • A local variable (void *x) is allocated on the stack
  • The stack grows toward lower numbered memory
  • Parameters are passed on the stack
  • Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)

In all mainstream modern implementations, all of these are generally true.

Sign up to request clarification or add additional context in comments.

8 Comments

@pavan.mankala: You are welcome. Indeed I have dabbled writing and maintaining several compilers, and spent quite a bit of time dealing with the interface to calling main(), mostly to streamline a limited memory environment.
+1: dereferencing a null pointer is guaranteed to give a segfault in most operating systems (. There's software that depends on this.
@DevSolar, what is purely academic is stating that what ever happens is undefined behavior, with no explanation given to actual observed behavior. I would claim that it's good engineering and good computer science to be able to explain what a machine does in any given situation, even when or especially when the language specification leaves the responsibility of some decision to the implementors of the compiler and the operating system. Our programs don't execute in a vacuum isolated from real issues.
@Joni: So you think it's a good explanation saying that pointers are 32 or 64 bit, parameters passed on stack, stacks extend downwards, that there's a pointer to environment variables alongside argc / argv, return address saved on the stack, yadda yadda, without even so much as qualifying that statement as "...on Linux and Windows"? Blissfully assuming that there is such a thing as SIGSEGV on the target machine, and that an illegal access won't crash the whole OS? For every single one of those statements, I know a system where that assumption won't hold.
@DevSolar, Many of those things can be deduced from the question. For example, the OP already mentions SIGSEGV, implying familiarity with a Unix system of some kind, and later mentions two examples: Linux and OS X. As to pointers being 32/64 bit, it doesn't seem a necessary assumption, but tends to be the case for machines where you can run Windows and OS X. As to there actually being a stack and stacks extending downwards, that's how compilers on these platforms usually organize memory, again a fairly safe thing to assume. But yes, the answer could be better if it qualified the assumptions.
|
17

Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.

(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )

Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):

The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.

4 Comments

No illegal memory access occurs here. Though the program carelessly accesses stack memory in bulk, there is nothing particularly wrong with this as long as there is enough stack data. Your answer is correct as far as it goes, but it does not address the code above.
@wallyk: I don't know about C11, but the C99 standard does not mention "stack" anywhere. Depending on how your implementation handles stack, reading 1000 bytes will either make you head off beyond x, argv and argc into nothingness (illegal), or youre trespassing from x` into str, and using memcpy() on overlapping memory areas is by definition undefined behaviour. Either way, illegal code. And you, sir, no offense intended, are just the type coder meant in the "formatting hard drives" line: The type that doesn't find immediate fault with code like this because it might work.
I am pretty sure the C11 and C99 standards address language features, not implementation details. However, stacks are pretty tried and true. Mainframes I used long ago did not have them, but structure languages implemented stacks for their convenience. There is no memory overlap for memcpy(): the 1000 byte destination is allocated and its source is elsewhere. (Indeed, I have written disk formatting code, but it was always predictable and intentional.)
@wallyk: Again, this code might work on machine A and machine B, and together A and B might account for 99% of all systems worldwide, but the code posted by the OP does invoke undefined behaviour, which is defined as "behaviour not defined by the language standard". Because it's not defined, a cautious developer should never rely on what actually happens on system X, because it will break some day. I respect your hands-on experience, but I strongly feel that too many aspiring coders are shown too much "under the hood" stuff, with too little "don't touch this" warnings applied.
9

If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.

Comments

4

When you have

int main(int argc, char **argv) {
    void *x = 0; // you could also say void *x = (void *)10;
    printx(&x);
}

You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with

memcpy(str, rec, 1000);

you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.

Comments

2

It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.

2 Comments

@pavan.mankala: That is not what is happening. Please see my answer.
int main(void) { return *(int *)rand(); } – segmentation fault.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.