0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)
42 tayangan14 halaman
The document discusses implementing a basic version of the printf function in C to demonstrate how pointers can be used to extract parameter values from the stack. It describes how the first version prints the format string literally. The second version adds the ability to print one decimal argument by using a pointer to locate the argument's memory location on the stack and print its value when the corresponding format specifier is encountered. Future versions will handle additional format specifiers and multiple arguments. Pointers are the key to extracting values from the stack to implement printf's core functionality.
The document discusses implementing a basic version of the printf function in C to demonstrate how pointers can be used to extract parameter values from the stack. It describes how the first version prints the format string literally. The second version adds the ability to print one decimal argument by using a pointer to locate the argument's memory location on the stack and print its value when the corresponding format specifier is encountered. Future versions will handle additional format specifiers and multiple arguments. Pointers are the key to extracting values from the stack to implement printf's core functionality.
The document discusses implementing a basic version of the printf function in C to demonstrate how pointers can be used to extract parameter values from the stack. It describes how the first version prints the format string literally. The second version adds the ability to print one decimal argument by using a pointer to locate the argument's memory location on the stack and print its value when the corresponding format specifier is encountered. Future versions will handle additional format specifiers and multiple arguments. Pointers are the key to extracting values from the stack to implement printf's core functionality.
Goal: Reinforce our understanding of pointers by solving a real problem that requires them (can't be solved without pointers). Overview: In this case study we'll exploit our knowledge of stack frames on Visual Studio for x86 and write some C code that reads parameter values directly from the stack. This C code will be specific to Visual Studio on x86 platforms (i.e., it is NOT portable code and is NOT a good example of how to solve this type of problems generally), but it will very nicely demonstrate how we can draw diagrams of the internal state of our program variables and then use those diagrams to write correct, non-trivial code using pointers. C/C++ language constructs and concepts demonstrated: character strings, int* pointers, char* pointers, char** pointers
Background Printf is a function we use all the time and we rarely (if ever) give any thought to how it works. We know it performs output (to the screen), and that it will format and output just about anything we want it to. For example, we can output something simple like: printf("Hello World!\n");
Or, we can output something more complex like: printf("The circuit's impedance is %f + %fj\n", real_imp, imag_imp);
Somehow, printf has to determine both what values we want to display and how we want those values formatted. As programmers we know to use the % sequences (such as %f in the example above) to describe what and how we want things formatted, but how does printf do its thing? In this case study well write our own printf function. Well only implement a small fraction of the functionality provided by the Standard C librarys printf, but well cover most of the important aspects of printf. As we go through this case study well discover that pointers are the key to being able to extract the values from the stack. Well do some pointer arithmetic to calculate the address of the values we want on the stack, well declare pointers of the correct type to ensure that we read these values correctly and well even spend a little bit of time (very little) discussion how we format those values for output. The putchar function Were not going to build then entire functionality of printf from scratch. Our starting point will be the putchar function. Putchar is a very simple function (from the Standard C library) that formats and outputs a single ASCII character. For example, if you invoke putchar(65), then you will get the letter A displayed on the screen. Thats because in the ASCII table, row 65 is assigned the character A. Of course, displaying a character actually requires a fairly complex bit of hardware and software to light up all the right pixels on your display, not to mention the esoteric art of designing character fonts cool stuff, but way outside the bounds of what we need to learn in EE312. Printf version 1, just the basics Were ready to jump in and get started. Printf, in its most basic usage is actually quite simple. Given an invocation like: printf("Hello World!\n");
All we need to do is output each of the ASCII characters one-at-a-time using putchar. A loop will take care of that easily. void printf_v1(char fmt[]) { uint32_t k = 0;
while (fmt[k] != 0) { putchar(fmt[k]); k += 1; } }
Codereview In this incarnation of printf, we have just a single parameter. Ive named the parameter fmt, which is shorthand for format string. I like that name because the first argument provided to printf is the formatting instructions for this output operation it tells us what characters are to be displayed, and also contains the %f, %d and similar formatting instructions for all the other outputs. Ive followed my own personal style of declaring fmt using the array declaration syntax (i.e., the []), even though I know full well that fmt is actually a pointer. I use the [] syntax whenever I have a parameter that points to an array. I use the * syntax to declare parameters that point at single variables. Of course, the compiler doesnt care, and both char fmt[] and char* fmt mean the same thing. Youll also note that Im using a while loop rather than a for loop. Thats another element of my style where I try to use for when the iteration takes place over a well defined range (e.g., from 1 to 10), and where I try to use while when the iteration continues until something special happens. In this case, I dont know in advance how many characters are in the format string, so I use a while loop that continues until it detects the terminating zero at the end of the format string. Finally, you may have noticed that I describe the terminating zero as 0. That, of course, is what it is the number zero. Some programmers prefer to write that zero using the syntax \0. Please dont get confused, 0 and \0 are the same thing. For that matter, 0 and 0x0 are the same thing too. Just be careful not to confuse \0 and 0 which are quite different (0 is actually the number 48). This version of printf is very limited. Printf_v1 simply prints the format string verbatim. If you try something like: printf("the number is %d\n", 42);
then youll see the number is %d as your output. Note that the \n is handled just fine. Thats because \n is really just the number 10 (row 10 in the ASCII table is line feed i.e., new line). Thats worth repeating, just to be sure. \n is NOT a \ character followed by the letter n. It is a single character representing the new line operation. That character happens to be 10 in the ASCII table (line feed). However, the %d formatting code is completely ignored by printf_v1. Thats because we didnt even attempt to take care of this case. Time to move on to version 2. Printf version 2 one decimal argument Our first big challenge comes when we try to make printf extract a value from the stack. Consider the invocation: printf_v2("the number is %d\n", 42);
In this invocation we have two arguments passed to printf. The first (as always) is a format string. The second is the value 42 (on our platform this is a 32-bit signed integer). We understand how stack frames work, so we can certainly imagine how these two arguments would be arranged on the stack.
In the diagram, Ive illustrated the stack frame for a myPrintf function with one formal parameter (i.e., one parameter that is declared in the parameter list), but has been supplied with two actual arguments. Ive also illustrated the stack frame so that it contains two local variables, the variable k which is used as before to index into the format string, and a new variable p which is a pointer. This diagram corresponds with the following function: void printf_v2(char* fmt, ...) { uint32_t k; int32_t* p; }
Note that the parameter list for printf_v2 has one formal argument (declared as a pointer this time, but as Im sure you remember array parameters and pointer parameters are the same thing Im using pointer syntax in this case to remind you that the actual argument will be an address). After the declaration of fmt I have the C/C++ ellipses expression: , The ellipses means that printf_v2 is a function that can accept extra arguments. If you declare a function with ellipses, then you can call that function with as many extra arguments as you wish. The extra arguments can be any type (characters, integers, strings, floats, etc.). You can also have zero extra arguments. For this case, the extra argument is 42. Now that weve become familiar with the terrain, we have three problems we have to solve. (1) We need to locate the memory location where the extra argument is stored, (2) We need to determine when the argument is supposed to be printed (i.e., where the %d is inside the format string), and (3) we need to actually format the output in decimal. The first problem is by far the most interesting, finding the memory location with the 42 in it. Our diagram actually makes this pretty easy. There are actually a couple of ways I can go about finding this address. In the first method, Im can make the pointer p point at the variable k, and then Ill increment p by 4. In the second case, Ill make the pointer p point at the Stack Frame for main function Stack Frame for myPrintf function parameter fmt and increment p by 1. I actually like the second strategy better, but lets start with the first. In the diagram below, Ive gone ahead and removed the stack from main (its really not interesting to us), and Ive added the actual array of characters for the format string. Note that (just as it always is), the array argument is not actually on the stack with the parameters. The real argument is an address to the first character in the array (in our case fmt is a pointer to the letter t). Youll notice that the fmt parameter points to the first character in our array, and also that the array ends with two numbers. The first number is 10, which is the actual value of \n (newline). In ASCII, the new line command code is the 10 th entry in the ASCII table. Sometimes students will get confused and think that \n is actually two characters its not. The new line character is just that, a single character, which happens to have ASCII value 10. The second number is the zero which marks the end of the string. Ive also assumed that Ive executed the statement p = &k; setting p to be equal to the address of k, in other words, making p point to k.
The diagram shows the addresses that result from the pointer arithmetic p + 1, p + 2, etc. Recall that Visual Studio uses eight bytes of storage on the stack to implement function return (four bytes for the return address plus another four bytes to store the copy of the old frame pointer). Based on the diagram we can clearly see that p + 3 points at the first parameter (fmt), and p + 4 points at the memory location which contains the second argument, 42. Since this argument is one of the extra arguments that are permitted by our printf(char fmt[], ) declaration, the argument has no name. The ONLY way we can access this argument is by calculating its address. Using a diagram, we can easily calculate the pointer arithmetic expression to find this address, resulting in the code shown below.
the number is %d 10 0 p p+1 p+2 p+3 p+4 void printf_v2(char* fmt, ...) { uint32_t k = 0; int32_t* p = &k; p = p + 4;
while (fmt[k] != 0) { if (fmt[k] != '%') { putchar(fmt[k]); k += 1; } else { // fmt[k] is the beginning of an escape sequence, e.g., %d /* I'm just going to assume %d for now */ int32_t x = *p; displayDecimal(x); k = k + 2; // we add 2 to skip the % and the d, and then resume our loop. } } }
Theres a couple of things worth noting. First of all, this version of printf is far from done. One big mistake is that it always assumes that % is followed by d. As a result, the function doesnt work for %c, %f, %s or any other escape sequence. Also, the function is limited to working with only a single extra argument. If theres more than one %d in the format string, then the function just prints the same argument over and over (the pointer p never moves, so each time we go to print an argument we always print the same one). Still the function works just fine for our simple example printf_v2("the number is %d\n", 42); One other thing worth noting is the use of the function displayDecimal. This function takes the integer argument and converts that argument into a sequence of ASCII characters. As humans we often forget that this step is even necessary, we instinctively think of something like 42 being a number, even when it quite clearly is a sequence 4 followed by 2. In our computer program, we have to actually manually identify and then output each character that makes up the number. A simple function to do that is shown below. void displayDecimal(int32_t x) { if (x == 0) { // special case for 0 putchar('0'); return; }
if (x < 0) { // special case for negative values putchar('-'); x = -x; // fall through and display the absolute value }
/* we can now assume x > 0 */ /* extract the digits in x from least to most significant */ char digits[10]; //int32_t is at most 2Billion so, at most 10 characters uint32_t num_digits = 0; // the actual number of digits while (x != 0) { uint32_t d = x % 10; // least significant digit char c = d + '0'; // ASCII representation of d /* store the characters in an array so we can reverse them */ digits[num_digits] = c; num_digits += 1;
/* continue to the next digit of x */ x = x / 10; }
/* now print the digits in reverse order */ while (num_digits > 0) { num_digits -= 1; putchar(digits[num_digits]); }
} Summary (printf_v2) Printf has only one formal parameter (in our case, we call this parameter fmt). However, printf can have extra arguments. These arguments do not have names and can only be accessed using their address. Calculating the address of a variable in memory requires that you have a diagram showing you the location of that variable relative to other variables. In our case, we used our detailed knowledge of Visual Studios stack frame to draw a diagram illustrating the position of the unnamed extra argument 42 relative to the named variables k and fmt. We chose to read the argument from the stack using a pointer (named p). By referring to our diagram we concluded that p = &k + 4 was the correct arithmetic. Note that since p is declared to be an int32_t* pointer, the +4 in our arithmetic is actually going to increase the address stored inside p by 16 the addition is scaled by the size of int32_t, i.e., multiplied by four.
Printf version 3 a string argument and %s Of course decimal is not the only format we want to use when producing output, and %d is far from the only escape option provided by printf. Lets consider the escape sequence %s which will format and display a string argument. Consider: printf_v3("Hello %s\n", "Craig"); In this case we have two string arguments. The first string argument is bound to the formal parameter fmt. The second string argument, Craig, will be an unnamed extra argument. To access this argument we will need to calculate its address (just like we did with the 42 in printf_v2). Before jumping into the pointer arithmetic, it is worthwhile to remind ourselves exactly what the string argument Craig is. In the C programming language, strings are arrays (arrays of characters with a zero at the end). Furthermore arrays, when used as arguments to functions, are passed using the address of the first character of that array. So, in this case, the unnamed argument is actually going to be the address of the ASCII C in an array of six characters, C, r, a, i, g, 0. Like all addresses in 32-bit Windows, this address in Visual Studio will be four bytes long. The following diagram shows the stack frame.
Since weve not changed the number of arguments from the previous example, and all the arguments are coincidentally the same size, we can continue to use the same code to extract the extra argument from the stack. Naturally, we dont want to format this argument in decimal anymore, so well use the function displayString instead. void displayString(char str[]) { uint32_t k = 0; while (str[k] != 0) { putchar(str[k]); k += 1; } }
The other than changing the function we use to format the output, printf itself is not changed. Hello %s 10 0 p p+1 p+2 p+3 p+4 Craig 0 void printf_v3(char* fmt, ...) { uint32_t k = 0; int32_t* p = &k; p = p + 4;
while (fmt[k] != 0) { if (fmt[k] != '%') { putchar(fmt[k]); k += 1; } else { // fmt[k] is the beginning of an escape sequence, e.g., %s /* I'm just going to assume %s for now */ int32_t x = *p; displayString(x); k = k + 2; // we add 2 to skip the % and the s, and then resume our loop. } } }
Conceptually, this version of printf does the right thing for printf(Hello %s, Craig); However, the compiler balks at our invocation of displayString(x); The compiler is concerned that we declared x to be an int32_t (i.e., a number) and yet the function displayString needs an argument that is an address (i.e., a pointer). In other words, the compiler thinks we made a mistake. Actually, its the compiler thats mistaken here. We know our code is correct because the code matches precisely our diagram (and our diagram is correct). After p = p + 4, our pointer p points at the location on the stack where the second (extra) argument is stored. We know that this memory location contains the address of the letter C in our string Craig. So, by reading from *p and storing the result in the variable x, we are storing the address of the letter C in the variable x. This address is precisely the address that displayString needs in order for displayString to print out Craig. So, were right, the compiler is wrong. What do we do? The situation calls for a type cast expression. In this case, Im going to declare an additional variable (q) and specify that q is type char*. Then Ill use a type cast to convert the value of x into an address and store that address in q. int32_t x = *p; char* q; q = (char*) x; // type cast expression displayString(x);
Type casts in C/C++ allow you to explicitly convert from one type to another. In our case, we want to convert from an integer (x) to an address. We know that addresses really are numbers, after all, so this conversion isnt actually a conversion at all the value in q is going to be precisely the same value that was in x. However, since x and q are different types, the language considers them to be different. The type cast is required in order to satisfy the languages type system, but that type cast doesnt do anything. q = (char*) x; means exactly what q = x; means, copy the number in x and store that number in the variable q. IMPORTANT: Any type cast expression involving pointers in the C programming language will not do any actual conversion. In fact, if you want to understand what is happening, its best to completely ignore the type cast when reviewing the code. Now that we can display both %s and %d we should add the case-selection code to our program so that it correctly selects between strings and decimals. While the switch keyword can be used, I actually prefer to stick with the more general if-then-else for most of my case selection. So, printf_v3 looks like this: void printf_v3(char* fmt, ...) { uint32_t k = 0; int32_t* p = &k; p = p + 4;
while (fmt[k] != 0) { if (fmt[k] != '%') { putchar(fmt[k]); k += 1; } else { // fmt[k] is the beginning of an escape sequence, e.g., %s
if (fmt[k+1] != 'd') { // %d case int32_t x = *p; displayDecimal(x); } else if (fmt[k+1] != 's') { // %s case int32_t x = *p; char* q = (char*) x; displayString(q); else { // default case (an error!) /* do nothing */ }
k = k + 2; // we add 2 to skip the % and the s, and then resume our loop. } } } As you can see, we have three cases currently in our code. The first case is for %d sequences, the second is for %s sequences. We can distinguish between these two cases by examining the value of fmt[k+1]. Since fmt[k] is the % character, then fmt[k+1] will be either a d or an s. Well, I suppose its possible that fmt[k+1] is neither d nor s. For now, thats an error and since we dont know what to do, Im going to structure the code so that it ignores that error. Printf version 3 summary A string argument is a pointer the address of the first character in an array of characters. In our platform, addresses are the same size (and same binary encoding) as numbers. We can extract the string extra argument using the same code that we used to extract the integer extra argument in version 2. The C programming language considers the types of our variables to be very important, and consults the type of each variable before determining if an expression is legal. Using an integer variable where an address (pointer) is expected is illegal in C, even if the number stored in the variable is the correct address. To get around this problem, we can use type casts. A type cast will often not do anything other than tell the compiler that the operation should be legal and to compile it as written. In the case of type casts using pointer types (e.g., type casting to char*) this is always the case and a type cast using a pointer will never actually do anything. The type cast essentially just becomes the manual override button that the programmer presses to tell the compiler to shut up and just generate the machine code. Printf version 4, cleaning up the code In our last version of printf, I want to accomplish two things. First, the code is incredibly ugly. Most importantly by declaring the variable p as an int32_t* the code is incredibly misleading. We dont know that p actually points to an integer. It might point to an address (%s) or it might even point to a floating point number (%f). I want to correct this and declare p using a type that documents only what I know about that address (and at the same time, Im going to give this variable a new name). The second thing I want to do is to improve the functionality of printf so that it will print multiple arguments. To make that happen, Ill need to add some pointer arithmetic to increment p each time we extract an argument. As long as were working on yet another version of printf, I might as well give the code a thorough cleaning and add in the additional cases for %c and %f. A heads up though, Im not going to bother actually writing displayFloat as a function. Extracting the binary encoding for IEEE floating point and creating a sequence of ASCII characters to represent that number is way outside of the goals for this example. First up on the docket is to replace the variable p with a new variable, next_arg. In our program next_arg will always be the address of the next extra argument (if there is one). So, well initialize next_arg to be the address of the first extra argument, and each time we see a valid % sequence, well increment next_arg so that it becomes the address of the next argument. Id like to give next_arg the correct type, which for this case is quite clearly void*. In C/C++ the type void* is a generic pointer. We use that type when we have an address, but we dont know what type of information is stored at that address. Thats perfect for this case where I know that next_arg is the address of the next argument, but I dont yet know whether that argument is an integer, a float, a character or a string. As part of my code cleaning, Im going to initialize next_arg to be &fmt + 1 rather than &k + 4. As we can tell from our diagram either bit of arithmetic calculates the correct address. I prefer &fmt + 1 since this will still be the correct address even if I create additional local variables (&k + 4 is correct only as long as k is the first local variable declare a local variable before k in the program and the whole thing breaks). void printf_v3(char* fmt, ...) { void* next_arg = &fmt + 1; uint32_t k = 0; The main loop for printf is slightly more complicated because Im adding cases for %f and %c (more on that later). The biggest change to the main loop is caused by the fact that in C/C++ I cannot legally de- reference a void* pointer. Specifically in this case, even though next_arg is the correct address, I cant read from that address using *next_arg. The reason I cant read from that location is that since next_arg is a generic pointer, the compiler has no idea how many bytes I want to read (or how to interpret the bits contained inside those bytes). For example, next_arg could be the address of a character, or next_arg could the address of a float. We dont know (yet), which is why we declared the pointer to be void* in the first place. Well, the compiler doesnt know either, so it cannot possibly create machine code for an expression like *next_arg. To get around this problem, Im going to resurrect my variable p. Actually, Im going to create a whole bunch of variables, each named p, and each with exactly the correct type to match the. Heres the final code. void printf(char* fmt, ...) { void* next_arg = &fmt + 1; // address of next "extra" argument uint32_t k = 0;
while (fmt[k] != 0) { if (fmt[k] != '%') { putchar(fmt[k]); k += 1; } else { // fmt[k] is the beginning of an escape sequence, e.g., %d if (fmt[k + 1] == 'd') { // %d case int32_t* p = (int32_t*) next_arg; next_arg = p + 1; displayDecimal(*p); } else if (fmt[k + 1] == 's') { // %s case int32_t* p = (char**) next_arg; next_arg = p + 1; displayString(*p); } else if (fmt[k + 1] == 'f') { // %f case double* p = (double*) next_arg; next_arg = p + 1; displayFloat(*p); } else if (fmt[k + 1] == 'c') { // %c case int * p = (int *) next_arg; next_arg = p + 1; putchar(*p); } else { // either %% or error putchar('%'); } k += 2; // we add 2 to skip the % and the d, and then resume our loop. } // end of %? escape sequence } }
The first escape sequence case in the code is for %d. In this case, next_arg will be the address of an integer. Accordingly, I declare a variable named p of type int32_t* and I copy the address from next_arg to p. The C/C++ programming language mandates that I use a type cast when I copy this address. However, the type cast doesnt do anything, it just tells the compiler to go ahead and copy the address into the new variable. Once I have p pointing at the right location (and declared with the correct type), I can do my pointer arithmetic to calculate the correct address for the next extra argument. The expression p + 1 is precisely the correct address because the 1 will be scaled by the size of the current argument (i.e., multiplied by 4 since the current argument is an int32_t). I can also read the extra argument using the expression *p and send that value directly to displayDecimal to handle the output. The case for %s is almost verbatim a copy of the %d case. Thats not surprising since our diagram illustrated how similar the two cases actually are. Again, I declare a pointer p and copy the address from next_arg into p (with a type cast). Whats different this time is that p is declared to be char**. That type means a pointer to a pointer to a character. That is, of course, precisely what next_arg is in this case. Consider this diagram from printf_v3.
The address stored in next_arg is the address of the extra argument. That extra argument is itself an address, specifically the address of the C in the string Craig. In our diagram, next_arg is a pointer that points to a pointer that points to a character. In our code, as soon as we know were processing the case for %s we know we have a diagram like this one. Consequently we know that next_arg is really a char**. So, we create a new variable (p) of type char**, copy the address from next_arg into p and proceed as always. We assign next_arg the incremented address p + 1 and we send *p to our output function displayString. It is incredibly important to be able to recognize why char** is the correct type, and why all the code around p is Hello %s 10 0 Craig 0 correct (p + 1 and not *p + 1 or &p + 1 for example). It takes a little while to sink in, but the code is correct because the code precisely matches the diagram (and the diagram is correct). Finally we have two cases, one for %c and one for %f. Both these cases match the case for %d with the obvious substitution of displayFloat instead of displayDecimal for %f and putchar instead of displayDecimal for %c. There is one odd thing going on, and thats that for %f I used a double* pointer (instead of float*) and for %c I used an int * pointer instead of char*. The reason I used these pointer types is because of an obscurity in the C standard. The C standard states that float cannot be used as a parameter (or argument) type. Instead, the compiler always substitutes double. Even if you declare the parameter as a float, the compiler will actually use the double-precision type instead. A similar thing happens with characters. In C and C++, character parameters (and arguments) are always promoted to int. Since the argument for %f is going to be a double, I have to use double* to read this argument (otherwise Id only read half the bytes). Since the argument for %c is going to be int, I have to use int* to read this argument. Note that I used int here instead of the more specific int32_t. The C standard doesnt say that char is promoted to 32-bit ints, only that its promoted to int.