Anda di halaman 1dari 11

WHEN 4 + 1 EQUALS 8: AN

ADVANCED TAKE ON
POINTERS IN C
by: Sven Gregori
101 Comments
April 19, 2018

In our first part on pointers, we covered the basics and common pitfalls of
pointers in C. If we had to break it down into one sentence, the
main principle of pointers is that they are simply data types storing
a memory address, and as long as we make sure that we have
enough memory allocated at that address, everything is going to
be fine.

In this second part, we are going to continue with some more


advanced pointer topics, including pointer arithmetic, pointers with
another pointer as underlying data type, and the relationship
between arrays and pointers. But first, there is one particular
pointer we haven’t talked about yet.

The one proverbial exception to the rule that pointers are just
memory addresses is the most (in)famous pointer of all:
the NULL pointer. Commonly defined as preprocessor macro (void
*) 0 , we can assign NULL like any other pointer.

1 // regular referencing, ptr1 points to address of value


2 int *ptr1 = &value;
3 // regular pointer, ptr2 points to address of value as well
4 int *ptr2 = ptr1;
5 // uninitialized pointer, ptr3 points to unknown location
6 int *ptr3;
7 // NULL pointer, ptr4 points to (void *) 0
8 int *ptr4 = NULL;

While it looks like NULL is just pointing to address zero, in reality, it


is a special indicator to the compiler that the pointer isn’t pointing
to any valid data, but is quite literally pointing to nothing.
Dereferencing such a pointer will most certainly fail, but it will fail
predictably. If we kept the pointer uninitialized, anything could
happen when we dereference it, with a segmentation fault being
one of the better outcomes.

It is always good practice to initialize otherwise uninitialized


pointers with NULL to let the compiler know, but it helps us too.
Checking if (ptr != NULL) lets us easily determine whether a
pointer has a valid value yet or not. And since any value other
than 0 is evaluated as true in C, we can write it even shorter
as if (ptr) .

POINTER ARITHMETIC
Other than NULL , the concept remains that pointers are simply
memory addresses — in other words: numbers. And like any other
number, we can perform some basic arithmetic operations with
them. But we wouldn’t talk about it if there wasn’t more to it, so
let’s see for ourselves what happens when we add 1 to a couple of
different pointer types.
1 char *cptr = (char *) 0x1000;
2 int *iptr = (int *) 0x2000;
3 struct foo *sptr = (struct foo *) 0x3000;
4
5 printf("char 0x%02lx %p %p\n", sizeof(char), cptr, (cptr + 1
6 printf("int 0x%02lx %p %p\n", sizeof(int), iptr, (iptr + 1)
7 printf("struct 0x%02lx %p %p\n", sizeof(struct foo), sptr, (sp

We have three different pointer types, and we print each type’s


size as a hexadecimal number, its pointer variable’s current
address, and the pointer variable’s address incremented by one:
1 char 0x01 0x1000 0x1001
2 int 0x04 0x2000 0x2004
3 struct 0x10 0x3000 0x3010

Unlike regular numbers, adding 1 to a


pointer will increment its value (a
memory address) by the size of its
underlying data type. To simplify the
logic behind this, think of pointer
arithmetic the same way you think
about array indexing. If we declare an
array of ten integers int numbers[10] , we have a variable that
has reserved enough memory to hold ten int values.
With int taking up 4 bytes, numbers is 40 bytes in total, with each
entry 4 bytes apart. To access the fifth element, we simply
write numbers[4] and don’t need to worry about data type sizes or
addresses. With pointer arithmetic, we do the exact same thing,
except the array index becomes the integer we add to the
pointer, (numbers + 4) .

Apart from adding integer to a pointer, we can also subtract them,


and as long as they’re the same type, we can subtract a pointer
from another pointer. In the latter case, the result will be the
number of elements of the pointer’s underlying data type that fully
fit in the memory area between the two pointers.
1 int *iptr1 = 0x1000;
2 int *iptr2 = 0x1008;
3 printf("%ld\n", (iptr2 - iptr1));
4 printf("%ld\n", sizeof(iptr2 - iptr1));

Since an int was four bytes, we can fully fit two of them in the 8
bytes offset, therefore the subtraction will output 2 . Note that
the sizeof operator is one exception that doesn’t follow pointer arithmetic rules, but only deals in
bytes. As a result, the second output will show the full 8 bytes of the offset. The result of such
a subtraction will be of type ptrdiff_t , a platform dependent
integer type defined in stddef.h . The sizeof operator will output
its size accordingly, for example 8 bytes.

That’s pretty much all there is to know about the basics of pointer
arithmetic. Trying anything other than addition with an integer, or
subtraction with either an integer or another pointer of the same
type will result in a compiler error.

POINTER CAST AND


ARITHMETIC
The beauty of pointers is that we can cast them to any other
pointer type, and if we do so during an arithmetic operation, we
add plenty of flexibility in comparison to array indexing. Let’s see
how the rules apply if we cast an int * to a char * and add 3 to
it.
1 int value = 123;
2 int *iptr = &value;
3 char *cptr1 = (char *) (iptr + 3);
4 char *cptr2 = (char *) iptr + 3;
5 printf("iptr %p\ncptr1 %p\ncptr2 %p\n", iptr, cptr1, cptr2);

For simplicity, let’s pretend value is located at address 0x1000 , so


we will get the following output:
1 iptr 0x1000
2 cptr1 0x100c
3 cptr2 0x1003

We can see a clear difference between those two additions, which


is caused by C’s operator precedence. When we assign cptr1 , iptr is
still an int * at the time of the addition, resulting in an address
offset to fit three int s, i.e. 12 bytes. But when we assign cptr2 ,
we don’t use parentheses, and the operator precedence leads to a
higher priority for the cast operation. By the time the addition is
performed, iptr is already a char * , resulting in a three byte
offset.

Keep in mind that we don’t have any allocated memory


beyond value ‘s size, so we shouldn’t dereference cptr1 .
Dereferencing cptr2 on the other hand will be fine, and will
essentially extract the fourth byte of value . If for some reason you
wanted to extract whatever resides 11 bytes into a struct array’s
third element and turn it into a float , *((float *) ((char *)
(struct_array + 2) + 11)) will get you there.

Incrementing While Dereferencing


Another typical thing we do with pointers is dereference them. But
what happens if we increment and dereference a pointer in the
same expression? Once again, it’s mostly a question of operator
precedence and how generous we are with parentheses. Taking
both prefix and postfix increment into account, we end up with four
different options:
1 char buf[MUCH_BYTES];
2 char *ptr = buf;
3
4 // increment ptr and dereference its (now incremented) value
5 char c1 = *++ptr; // ptr = ptr + 1; c1 = *ptr;
6 // dereference ptr and increment the dereferenced value
7 char c2 = ++*ptr; // *ptr = *ptr + 1; c2 = *ptr;
8 // dereference current ptr value and increment ptr afterwards
9 char c3 = *ptr++; // c3 = *ptr; ptr = ptr + 1;
10 // dereference current ptr value and increment the dereferenc
11 char c4 = (*ptr)++; // c4 = *ptr; *ptr = *ptr + 1;

If you’re not fully sure about the operator precedence, or don’t


want to wonder about it every time you read your code, you can
always add parentheses and avoid ambiguity — or enforce the
execution order as we did in the fourth line. If you want to sneak
subtle bugs into a codebase, leaving out the parentheses and
testing the reader’s attention to operator precedence is a good bet.

A common use case for incrementing while dereferencing is


iterating over a “string”. C doesn’t really know the concept of an
actual string data type, but works around it by using a null-
terminated char array as alternative. Null-terminated means that the
array’s last element is one additional NUL character to indicate the
end of the string. NUL , not to be confused with the NULL pointer, is
simply ASCII character 0x00 or '\0' . As a consequence, a string
of length n requires an array of size n + 1 bytes.

So if we looked through a string and find the NUL , we know we


reached its end. And since C evaluates any value
that’s 0 as false , we can implement a function that returns the
length of a given string with a simple loop:
1 int strlen(char *string) {
2 int count = 0;
3 while (*string++) {
4 count++;
5 }
6 return count;
7 }

With every loop iteration, we dereference string ‘s current


memory location to check if its value is NUL , and
increment string itself afterwards, i.e. move the pointer to the
next char ‘s address. For as long as dereferencing yields a
character with a value other than zero, we increment count and
return it at the end.

As a side note, the string manipulation happens and stays inside


that function. C always uses call by value when passing parameters to
a function, so calling strlen(ptr) will create a copy of ptr when
passing it to the function. The address it references is therefore
still the same, but the original pointer remains unchanged.

POINTERS AND ARRAYS


Coming back to arrays, we’ve seen earlier how pointer arithmetic
and array indexing are closely related and how buf[n] is identical
to *(buf + n) . The reason that both expressions are identical is
that in C, an array decays internally into a pointer to its first
element, &array[0] . So whenever we pass an array to a function,
we really just pass a pointer of the array’s type, which means the
following two function declarations will be identical:
1 void func1(char buf[]);
2 void func2(char *buf);

However, once an array decays into a pointer, its size information


is gone. Calling sizeof(buf) inside either of those two functions
will return the size of a char * and not the array size. A common
solution is to pass the array size as additional parameter to the
function, or have a dedicated delimiter specified
like char[] strings.

Multi-dimensional Arrays and Pointers


Note that the array-to-pointer decay happens only once to the
outermost dimension of the array. char buf[] decays to char
*buf , and char buf[][] decays to char *buf[] , but not char
**buf . However, if we have an array to pointers declared in the
first place, char *buf[] , then it will decay into char **buf . As
example, we can declare C’s main() function with either char
*argv[] or char **argv parameter, there is no difference and it’s
mainly a matter of taste which one to choose.

Note that all this applies only to already declared arrays. Once an
array is declared, pointers give us an alternative way to access
them, but we cannot replace the array declaration itself with a
simple pointer because the array declaration also reserves
memory.

POINTERS TO POINTERS
As we have well established, pointers can point to any kind of data
type, which includes other pointer types. When we declare char
**ptr , we declare nothing but a pointer whose underlying data
type is just another pointer, instead of a regular data type. As a
result, dereferencing such a double pointer will give us a char
* value, and dereferencing it twice will get us to the actual char .

The other way around, &ptr gives us the pointer’s address, just
like with any other pointer, except the address will be of type char
*** , and on and on it goes. As stated earlier, C uses call by value when
passing parameters to a function, but adding an extra layer of
pointers can be used to simulate call by reference.

Double Pointer Memory Arrangements


Return to main() ‘s argv parameter, which we use to retrieve the
command line arguments we pass to the executable itself. In
memory, those arguments are stored one by one as null-
terminated char arrays, along with an additional array of char
* values storing the address to each of those char arrays. To
illustrate this, let’s print each and every address we can associate
with argv .
1 int main(int argc, char **argv) {
2 int i;
3
4 for (i = 0; i < argc; i++) {
5 printf("&argv[%d] %p with argv[%d] at %p len %ld '%s
6 i, &argv[i], i, argv[i], strlen(argv[i]), arg
7 }
8 // print once more to see what is stored after the argume
9 printf("&argv[%d] %p with argv[%d] at %p\n", i, &argv[i],
10
11 return 0;
12 }

Along with argv , we get argc passed to main() , which tells us the
number of entries in argv . And as a reminder about array
decay, argv[i] is equal to &argv[i][0] .

Simplifying the addresses, the output will look like this:


1 $ ./argv some arguments
2 &argv[0] 0x1c38 with argv[0] at 0x2461 len 6 './argv'
3 &argv[1] 0x1c40 with argv[1] at 0x2468 len 4 'some'
4 &argv[2] 0x1c48 with argv[2] at 0x246d len 9 'arguments'
5 &argv[3] 0x1c50 with argv[3] at (nil)
6 $

We can see that argv itself is


located at address 0x1c38 ,
pointing to the argument
strings, which are stored one
after another starting from
address 0x2461 . Since
incrementing a pointer is
always relative to the size of
its underlying data type,
incrementing argv adds the size of a pointer to the memory offset,
here 8 bytes.

Another thing we can see is a NULL pointer at the very end


of argv . This follows the same principle as the null-termination of
strings, indicating the end of the array. That means we don’t
necessarily need the argument counter parameter argc to iterate
through the command line arguments, we could also just loop
through argv until we find the NULL pointer.

Let’s see how this looks in practice by rewriting our previous


example accordingly. To leave argv itself unaffected, we copy it to
another char ** variable.
1 int main(int argc, char **argv) {
2 int i;
3 char **ptr = argv;
4
5 for (i = 0; *ptr; i++, ptr++) {
6 printf("&argv[%d] %p with argv[%d] at %p len %ld '%s
7 i, ptr, i, *ptr, strlen(*ptr), *ptr);
8 }
9 printf("&argv[%d] %p with argv[%d] at %p\n", i, ptr, i, *
10 return 0;
11 }

Whether we access argv via array indexing or pointer arithmetic,


the output will be identical.

TO BE CONTINUED
To summarize our second part on pointers in C: pointer arithmetic
happens always relative to the underlying data type, operator
precedence needs to be considered or tackled with parentheses,
and pointers can point to other pointers and other pointers as deep
as we want.

In the next and final part, we are going to have a look at possibly
the most exciting and most confusing of pointers: the function
pointer.

Anda mungkin juga menyukai