Anda di halaman 1dari 1670

C Programming

C Programming
The notes on these pages are for the courses in C Programming I used to teach in the Experimental
College at the University of Washington in Seattle, WA. Normally these notes accompany fairly
traditional classroom lecture presentations, but they are intended to be reasonably complete (more so, for
that matter, than the lectures!) and should be usable as standalone tutorials.
I originally designed the first, Introductory course around The C Programming Language (2nd Edition)
by Kernighan and Ritchie, and the notes were designed to complement that text, highlighting important
points and explaining subtleties which might be lost on the general reader. Later, I rewrote the notes to
stand on their own (in part because, in spite of the first set of notes, too many of my students found K&R
a bit too technical for an informal, introductory course). Finally, I occasionally teach an Intermediate
course, which covers the topics which tend to be skipped or glossed over in introductory courses (bitwise
operators, structures, file I/O, etc.). The Intermediate course has its own set of notes.
All three sets of notes are available here. If you have a copy of K&R2 and would like a thorough
treatment of the language, read K&R and the ``Notes to Accompany K&R'' side by side. If you're just
getting your feet wet and would like a somewhat simpler introduction, read the ``Introductory Class
Notes.'' If you have had an introduction to C (either here or elsewhere) and are now looking to fill in
some of the missing pieces, read the ``Intermediate Class Notes.''
Of course, just reading a book or these notes won't really teach you C; you will also want to write and run
your own programs, for practice and so that the language concepts will make some kind of practical
sense. Most of my programming assignments (including review questions) are here as well, along with
their solution sets. (No peeking at the answers until you've given the problems your best shot!)
These notes are arranged for the web in the usual hierarchy by section and subsection. If you want to read
through all of them, without keeping track of your own stack to implement a depth-first tree traversal,
just follow the ``read sequentially'' links at the bottom of each page.
Depending on your background, you might want to read one or both of the two preliminary handouts:
one on programming in general, and one which reviews some math which is relevant to programming.
(And there are some other miscellaneous handouts, too.)
One note about the HTML: these pages were produced automatically from the base manuscripts for my
class notes, using a program of my own devising which is, all too typically, not (yet?) perfect. I apologize
in advance for any formatting glitches. In particular, when you see <sup>...</sup> or
<sub>...</sub> in the text, these do not represent bugs in your browser or accidental bugs in my
markup; instead, these are my interim compromise way of representing superscripts and subscripts to
you, since there's no way to do so in portable HTML.
http://www.eskimo.com/~scs/cclass/cclass.html (1 of 2) [22/07/2003 5:07:43 PM]

C Programming

Finally, I realize that reading these notes on the net is not always as convenient as it might be,
particularly when the net is slow. Please realize, though, that the net is what it is, and that I have gone to
a certain amount of effort to place these notes here at all. Please do not ask me to send you a set of these
notes for browsing on your own machine, as I am currently unable to do so.

Handout: A Short Introduction to Programming


Handout: A Brief Refresher on Some Math Often Used in Computing
Readings: Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'')
Readings: Introductory C Programming Class Notes (standalone)
Readings: Intermediate C Programming Class Notes
Assignments: (questions, exercises, and solutions)
introductory class
intermediate class
Other Handouts

This page by Steve Summit // Copyright 1996-9 // mail feedback

http://www.eskimo.com/~scs/cclass/cclass.html (2 of 2) [22/07/2003 5:07:43 PM]

Experimental College

Experimental College
The Experimental College is (I think) Washington's oldest and largest alternative educational resource,
typically offering hundreds of classes serving thousands of students each quarter. Experimental College
classes are taught by local members of the community who love what they're doing and want to teach
you to love it, too. Most classes are conducted in the Seattle area. For much more information, visit the
official Experimental College home page.

http://www.eskimo.com/~scs/expcoll/ [22/07/2003 5:07:44 PM]

C Programming Notes

C Programming Notes
Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'')
Steve Summit

The C Programming Language, or K&R as it is affectionately known, is widely praised by experienced


C programmers as one of the best books on C there is. (It was also the first; it also happens to be a bestseller.) The only real criticism K&R ever receives is that it may not be the best tutorial for beginners; it
seems to assume a certain amount of programming savvy and familiarity with computers. Actually, if
you read it carefully, you'll find that is constantly dispensing wisdom about programming in general,
from basic concepts to deep insights to impeccable commentary on imponderable topics such as
programming style, at the same time it teaches the specifics of the C language. Therefore, the
fundamental criticism may simply be that K&R is not suitable for those who read carelessly.
The authors are not out to save the world or to convert it to their philosophy of programming. When they
say something, they say it once, without theatrics or undue emphasis. If you read the book too quickly, or
skim it, or look only for specific answers to what you think you're trying to learn today, you will miss
much of the excellent advice which the authors have to offer.
These notes were prepared (beginning in Spring, 1995) for the University of Washington Experimental
College course in Introductory C Programming. They are meant to supplement K&R for the reader who
is new to C and perhaps to programming, and who wants a slightly more detailed, less pithy presentation.
I'll add insights from my own experience, in particular by pointing out those areas where people
traditionally misunderstand something about C or K&R's presentation of it. I'll also call out a few of the
very deep sentences, which you might overlook at first even if you're not skimming (perhaps because
their significance only becomes apparent once you've begun writing bigger or more complicated
programs), but which contain advice which is absolutely vital to successful real-world programming and
which, if you can take it to heart early, will save you from a lot of misery out in the school of hard
knocks later on.
Note that most of these notes merely amplify on the things K&R is saying; there isn't much to say that it
doesn't already say, usually better. In particular, many of the things that I'll comment on in the early
chapters are discussed in more detail in the later chapters; by barging in with my know-it-all comments,
I'm partially destroying the authors' careful progression from an initial, slightly superficial overview to a
more detailed, complete presentation. If these notes present more detail than you want to see at first, don't
worry (but please do let me know); just come back to them later to see if they clear up anything you're
still uncertain on. (Also, if you find the description in K&R adequately clear, you don't have to read all of
these notes, but do take note of the highlighted ``deep sentences.'')

http://www.eskimo.com/~scs/cclass/krnotes/top.html (1 of 2) [22/07/2003 5:07:46 PM]

C Programming Notes

Preface
Preface to the First Edition
Introduction
Chapter 1. A Tutorial Introduction
Chapter 2: Types, Operators, and Expressions
Chapter 3: Control Flow
Chapter 4: Functions and Program Structure
Chapter 5: Pointers and Arrays
Chapter 6: Structures
Chapter 7: Input and Output

Read Sequentially

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/top.html (2 of 2) [22/07/2003 5:07:46 PM]

Preface

Preface
page ix
You'll get some hint here that C has become a bit more formal as it has ``grown up.'' That formality is
appropriate, and for the second edition of K&R to acknowledge it is appropriate, and for any modern
course in C programming to teach it is appropriate. Personally, I learned C before it had become quite so
formalized, and occasionally my traditional biases will leak through. I'll try to admit it when they do.
As the authors note, C is a relatively small language, but one which (to its admirers, anyway) wears well.
C's small, unambitious feature set is a real advantage: there's less to learn; there isn't excess baggage in
the way when you don't need it. It can also be a disadvantage: since it doesn't do everything for you,
there's a lot you have to do yourself. (Actually, this is viewed by many as an additional advantage:
anything the language doesn't do for you, it doesn't dictate to you, either, so you're free to do that
something however you want.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx1.html [22/07/2003 5:07:48 PM]

Preface to the First Edition

Preface to the First Edition


page xi
This Preface, in a few spare paragraphs, sums up much of the philosophy of C and the authors'
philosophy about programming in general. Their comments in C's size and scope are fundamental, and
though no one may have fully recognized it at the time (or yet), this unassuming approach to the design
of the language is surely a significant factor behind C's success. I didn't have the first paragraph of the
Preface to the First Edition in front of me when I wrote my notes (just above) on the Preface to the
Second Edition, but it's not surprising that they're similar.
As the authors say, they assume some familiarity with basic programming concepts; other notes in this
series will give you a bit of help with those concepts if you need it. The authors also anticipate another
theme of theirs, which is that they will stress learning by doing. (I'll have more to say about this as the
learning begins.)
Deep sentence:
Besides showing how to make effective use of the language, we have tried where possible
to illustrate useful algorithms and principles of good style and sound design.
The authors' advice on style is good, and their design is sound. Pay attention to the things they say which
go beyond the nuts-and-bolts details of C: there's a lot to learn here about programming in general.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx2.html [22/07/2003 5:07:50 PM]

Introduction

Introduction
page 2
Deep sentence:
...C deals with the same sort of objects that most computers do, namely characters,
numbers, and addresses.
C is sometimes referred to as a ``high-level assembly language.'' Some people think that's an insult, but
it's actually a deliberate and significant aspect of the language. If you have programmed in assembly
language, you'll probably find C very natural and comfortable (although if you continue to focus too
heavily on machine-level details, you'll probably end up with unnecessarily nonportable programs). If
you haven't programmed in assembly language, you may be frustrated by C's lack of certain higher-level
features. In either case, you should understand why C was designed this way: so that seemingly-simple
constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine
language constructions when compiled. If you write a C program simply and succinctly, it is likely to
result in a succinct, efficient machine language executable. If you find that the executable resulting from
a C program is not efficient, it's probably because of something silly you did, not because of something
the compiler did behind your back which you have no control over. In any case, there's no point in
complaining about C's low-level flavor: C is what it is.
Next we see a more detailed list of the things that are not ``part of C.'' It's good to understand exactly
what we mean by this. When we say that the C language proper does not do things like memory
allocation or I/O, or even string manipulation, we obviously do not mean that there is no way to do these
things in C. In fact, the usual functions for doing these things are specified by the ANSI C Standard with
as much rigor as is the core language itself.
The fact that things like memory allocation and I/O are done through function calls has three
implications:
1. the function calls to do memory allocation, I/O, etc. are no different from any other function calls;
2. the functions which do memory allocation, I/O, etc. do not know any more about the data they're
acting on than ordinary functions do (we'll have more to say about this later); and
3. if you have specialized needs, you can do nonstandard memory allocation or I/O whenever you wish,
by using your own functions and ignoring the standard ones provided.
The sentence that says ``Most C implementations have included a reasonably standard collection of such
functions'' is historical; today, all implementations conforming to the ANSI C Standard have a very
http://www.eskimo.com/~scs/cclass/krnotes/sx3.html (1 of 2) [22/07/2003 5:07:52 PM]

Introduction

standard collection.
page 3
Deep sentence:
...C retains the basic philosophy that programmers know what they are doing; it only
requires that they state their intentions explicitly.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that C is not a good
teaching language. C aficionados love this aspect of C because it means that C does not try to protect
them from themselves: when they know what they're doing, even if it's risky or obscure, they can do it.
Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.
This is another aspect of the language which it's fairly pointless to complain about. If you take care and
pay attention, you can avoid many of the pitfalls. These notes will point out many of the obvious (and not
so obvious) trouble spots.
page 4
The last sentence of the Introduction is misleading: as we'll see, it's risky to defer to any particular
compiler as a ``final authority on the language.'' A compiler is only a final authority on the language it
accepts, and the language that a particular compiler accepts is not necessarily exactly C, no matter what
the name of the compiler suggests. Most compilers accept extensions which are not part of standard C
and which are not supported by other compilers; some compilers are deficient and fail to accept certain
constructs which are in standard C. From time to time, you may have questions about what is truly
standard and which neither you nor anyone you've talked to is able to answer. If you don't have a copy of
the standard (or if you do, but you discover that the standardese in which it's written is impenetrable),
you may have to temporarily accept the jurisdiction of your particular compiler, in order to get some
program working today and under that particular compiler, but you'd do well to mark the code in
question as suspect and the question in your head as ``don't know; still unanswered.''

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx3.html (2 of 2) [22/07/2003 5:07:52 PM]

Chapter 1. A Tutorial Introduction

Chapter 1. A Tutorial Introduction


page 5
I completely agree with the authors that writing real programs, and soon, is the best way to learn
programming. This way, concepts which would otherwise seem abstract make sense, and the positive
feedback you get from getting even a small program to work gives you a great incentive to improve it or
write the next one.
Diving in with ``real'' programs right away has another advantage, if only pragmatic: if you're using a
conventional compiler, you can't run a fragment of a program and see what it does; nothing will run until
you have a complete (if tiny or trivial) program. You can't learn everything you'd need to write a
complete program all at once, so you'll have to take some things ``on faith'' and parrot them in your first
programs before you begin to understand them. (You can't learn to program just one expression or
statement at a time any more than you can learn to speak a foreign language one word at a time. If all you
know is a handful of words, you can't actually say anything: you also need to know something about the
language's word order and grammar and sentence structure and declension of articles and verbs.)
The authors list a few drawbacks of this ``dive in and program'' approach, and I must add one more. It's a
small step from learning-by-doing to learning-by-trial-and-error, and when you learn programming by
trial-and-error, you can very easily learn many errors. When you're not sure whether something will
work, or you're not even sure what you could use that might work, and you try something, and it does
work, you do not have any guarantee that what you tried worked for the right reason. You might just
have ``learned'' something that works only by accident or only on your compiler, and it may be very hard
to un-learn it later, when it stops working. (Also, if what you tried didn't work, it may have been due to a
bug in the compiler, such that it should have worked.)
Therefore, whenever you're not sure of something, be very careful before you go off and try it ``just to
see if it will work.'' Of course, you can never be absolutely sure that something is going to work before
you try it, otherwise we'd never have to try things. But you should have an expectation that something is
going to work before you try it, and if you can't predict how to do something or whether something
would work and find yourself having to determine it experimentally, make a note in your mind that
whatever you've just learned (based on the outcome of the experiment) is suspect.
section 1.1: Getting Started
section 1.2: Variables and Arithmetic Expressions
section 1.3: The For Statement

http://www.eskimo.com/~scs/cclass/krnotes/sx4.html (1 of 2) [22/07/2003 5:07:54 PM]

Chapter 1. A Tutorial Introduction

section 1.4: Symbolic Constants


section 1.5: Character Input and Output
section 1.5.1: File Copying
section 1.5.2: Character Counting
section 1.5.3: Line Counting
section 1.5.4: Word Counting
section 1.6: Arrays
section 1.7: Functions
section 1.8: Arguments--Call by Value
section 1.9: Character Arrays
section 1.10: External Variables and Scope

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4.html (2 of 2) [22/07/2003 5:07:54 PM]

section 1.1: Getting Started

section 1.1: Getting Started


page 6
Deep sentence:
With these mechanical details mastered, everything else is comparatively easy.
The claim that a program as simple as ``hello, world'' is a big hurdle may seem outrageous, but it's really
quite true. It is a hurdle: on an unfamiliar computer, it can be arbitrarily difficult to figure out how to
enter a text file containing program source, or how to compile and link it, or how to invoke it, or what
happened after (if?) it ran. The most experienced C programmers immediately go back to this one, simple
program whenever they're trying out a new system or a new way of entering or building programs or a
new way of printing output from within programs. As they say, everything else is comparatively easy.
One hurdle which the authors don't mention but which many of you may find yourself facing is the
choice of an appropriate compiler. On many Unix machines, the cc command which the authors describe
is an older compiler which does not recognize modern, ANSI Standard C syntax. An old compiler will
accept the simple program on page 6, but it will not accept many of the other programs in the book. If
you find yourself getting baffling compilation errors on programs which you've typed in exactly as
they're shown in the book, it probably indicates that you're using an older compiler. On many machines,
another compiler called acc or gcc is available, and you'll want to use it, instead.
Deep sentence:
main will usually call other functions to help perform its job, some that you wrote, and
others from libraries that are provided for you.
We heard about this already in the Introduction, but here it is again: as far as the compiler and the
language definition are concerned, there's no difference between a function that you write and a function
someone else wrote for you, including a function like printf which seems to be part of the language.
There's nothing magic about printf; there's nothing that it can do that one of your functions couldn't.
(Well, actually, there are a few magic, or at least surprising, things about printf, but they're magic in
ways that your functions can be, too.)
There is one slight problem with the simple ``hello, world'' program in the book. The problem will
usually be ignored (that is, the program will usually work correctly), but if you receive any warning or
error messages or have any problems having to do with the ``value returned from main,'' jump forward
to page 26 to learn why main ought to end with the line
return 0;
http://www.eskimo.com/~scs/cclass/krnotes/sx4a.html (1 of 2) [22/07/2003 5:07:56 PM]

section 1.1: Getting Started

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4a.html (2 of 2) [22/07/2003 5:07:56 PM]

section 1.2: Variables and Arithmetic Expressions

section 1.2: Variables and Arithmetic Expressions


page 10
Deep sentence:
Although C compilers do not care about how a program looks, proper indentation and
spacing are critical in making programs easy for people to read. We recommend writing
only one statement per line, and using blanks around operators to clarify grouping. The
position of braces is less important, although people hold passionate beliefs. We have
chosen one of several popular styles. Pick a style that suits you, then use it consistently.
There are two things to note here. One is that (with one or two exceptions) the compiler really does not
care how a program looks; it doesn't matter how it's broken into lines. The fragments
while(i < j)
i = 2 * i;
and
while(i < j) i = 2 * i;
and
while(i<j)i=2*i;
and
while(i < j)
i = 2 * i;
and
while
i
j
i
2
i

(
<
)
=
*
;

are all treated exactly the same way by the compiler.


http://www.eskimo.com/~scs/cclass/krnotes/sx4b.html (1 of 3) [22/07/2003 5:07:58 PM]

section 1.2: Variables and Arithmetic Expressions

The second thing to note is that style issues (such as how a program is laid out) are important, but they're
not something to be too dogmatic about, and there are also other, deeper style issues besides mere layout
and typography.
There is some value in having a reasonably standard style (or a few standard styles) for code layout.
Please don't take the authors' advice to ``pick a style that suits you'' as an invitation to invent your own
brand-new style. If (perhaps after you've been programming in C for a while) you have specific
objections to specific facets of existing styles, you're welcome to modify them, but if you don't have any
particular leanings, you're probably best off copying an existing style at first. (If you want to place your
own stamp of originality on the programs that you write, there are better avenues for your creativity than
inventing a bizarre layout; you might instead try to make the logic easier to follow, or the user interface
easier to use, or the code freer of bugs.)
Deep sentence:
...in C, as in many other languages, integer division truncates: any fractional part is
discarded.
The authors say all there is to say here, but remember it: just when you've forgotten this sentence, you'll
wonder why something is coming out zero when you thought it was supposed the be the quotient of two
nonzero numbers.
page 12
Here is more discussion on the difference between integer and floating-point division. Nothing deep; just
something to remember.
page 13
Hidden here are discriptions of some more of printf's ``conversion specifiers.'' %o and %x print
integers, in octal (base 8) and hexadecimal (base 16), respecively. Since a percent sign normally tells
printf to expect an additional argument and insert its value, you might wonder how to get printf to
just print a %. The answer is to double it: %%.
Also, note (as was mentioned on page 11) that you must match up the arguments to printf with the
conversion specification; the compiler can't (or won't) generally check them for you or fix things up if
you get them wrong. If fahr is a float, the code
printf("%d\n", fahr);

http://www.eskimo.com/~scs/cclass/krnotes/sx4b.html (2 of 3) [22/07/2003 5:07:58 PM]

section 1.2: Variables and Arithmetic Expressions

will not work. You might ask, ``Can't the compiler see that %d needs an integer and fahr is floatingpoint and do the conversion automatically, just like in the assignments and comparisons on page 12?''
And the answer is, no. As far as the compiler knows, you've just passed a character string and some other
arguments to printf; it doesn't know that there's a connection between the arguments and some special
characters inside the string. This is one of the implications of the fact, stated earlier, that functions like
printf are not special. (Actually, some compilers or other program checkers do know that a function
named printf is special, and will do some extra checking for you, but you can't count on it.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4b.html (3 of 3) [22/07/2003 5:07:58 PM]

section 1.3: The For Statement

section 1.3: The For Statement


pages 13-14
Deep sentence:
...in any context where it is permissible to use the value of a variable of some type, you can
use a more complicated expression of that type.
You may have used other languages which placed restrictions on where you could use expressions or
how complicated they could be. C has relatively few such restrictions. There's nothing magical about the
printf call above; this ability to perform a computation inside of an argument is not unique to
printf. In any function call, the arguments in the argument list are expressions, and it doesn't matter if
they are simple expressions which just fetch the value of one variable, like fahr, or more complicated
expressions, like 5.0/9.0 * (fahr - 32).

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4c.html [22/07/2003 5:07:59 PM]

section 1.4: Symbolic Constants

section 1.4: Symbolic Constants


pages 14-15
Deep sentence:
Notice that there is no semicolon at the end of a #define line.
Actually, all lines that begin with # are special; we'll learn more about them later.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4d.html [22/07/2003 5:08:00 PM]

section 1.5: Character Input and Output

section 1.5: Character Input and Output


page 15
Note that you do not need to worry about whether your computer uses a carriage return (CR) or linefeed
(LF) or CRLF combination or something else to terminate lines in text files; in a C program, the line
terminator will always appear to be the newline, \n.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4e.html [22/07/2003 5:08:02 PM]

http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html

section 1.5.1: File Copying


page 16
Pay particular attention to the discussion of why the variable to hold getchar's return value is declared
as an int rather than a char. The distinction may not seem terribly significant now, but it is important.
If you use a char, it may seem to work, but it may break down mysteriously later. Always remember to
use an int for anything you assign getchar's return value to.
page 17
The line
while ((c = getchar()) != EOF)
epitomizes the cryptic brevity which C is notorious for. You may find this terseness infuriating (and
you're not alone!), and it can certainly be carried too far, but bear with me for a moment while I defend it.
The simple example on pages 16 and 17 illustrates the tradeoffs well. We have four things to do:
1.
2.
3.
4.

call getchar,
assign its return value to a variable,
test the return value against EOF, and
process the character (in this case, print it again).

We can't eliminate any of these steps. We have to assign getchar's value to a variable (we can't just use
it directly) because we have to do two different things with it (test, and print). Therefore, compressing the
assignment and test into the same line (as on page 17) is the only good way of avoiding two distinct calls
to getchar (as on page 16). You may not agree that the compressed idiom is better for being more
compact or easier to read, but the fact that there is now only one call to getchar is a real virtue.
In a tiny program like this, the repeated call to getchar isn't much of a problem. But in a real program,
if the thing being read is at all complicated (not just a single character read with getchar), and if the
processing is at all complicated (such that the input call before the loop and the input call at the end of the
loop become widely separated), and if the way that input is done is ever changed some day, it's just too
likely that one of the input calls will get changed but not the other.
(Also, note that when an assignment like c = getchar() appears within a larger expression, the
surrounding expression receives the same value that is assigned. Using an assignment as a subexpression
in this way is perfectly legal and quite common in C.)

http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html (1 of 2) [22/07/2003 5:08:03 PM]

http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html

When you run the character copying program, and it begins copying its input (your typing) to its output
(your screen), you may find yourself wondering how to stop it. It stops when it receives end-of-file
(EOF), but how do you send EOF? The answer depends on what kind of computer you're using. On Unix
and Unix-related systems, it's almost always control-D. On MS-DOS machines, it's control-Z followed by
the RETURN key. Under Think C on the Macintosh, it's control-D, just like Unix. On other systems, you
may have to do some research to learn how to send EOF.
(Note, too, that the character you type to generate an end-of-file condition from the keyboard has nothing
to do with the EOF value returned by getchar. The EOF value returned by getchar is a code
indicating that the input system has detected an end-of-file condition, whether it's reading the keyboard or
a file or a magnetic tape or a network connection or anything else.)
Another excellent thing to know when doing any kind of programming is how to terminate a runaway
program. If a program is running forever waiting for input, you can usually stop it by sending it an end-offile, as above, but if it's running forever not waiting for something (i.e. if it's in an infinite loop) you'll
have to take more drastic measures. Under Unix, control-C will terminate the current program, almost no
matter what. Under MS-DOS, control-C or control-BREAK will sometimes terminate the current
program, but by default MS-DOS only checks for control-C when it's looking for input, so an infinite loop
can be unkillable. There's a DOS command, I think it's
break on
which tells DOS to look for control-C more often, and I recommend using this command if you're doing
any programming. (If a program is in a really tight infinite loop under MS-DOS, there can be no way of
killing it short of rebooting.) On the Mac, try command-period or command-option-ESCAPE.
Finally, don't be disappointed (as I was) the first time you run the character copying program. You'll type
a character, and see it on the screen right away, and assume it's your program working, but it's only your
computer echoing every key you type, as it always does. When you hit RETURN, a full line of characters
is made available to your program, which it reads all at once, and then copies to the screen (again). In
other words, when you run this program, it will probably seem to echo the input a line at a time, rather
than a character at a time. You may wonder how a program can read a character right away, without
waiting for the user to hit RETURN. That's an excellent question, but unfortunately the answer is rather
complicated, and beyond the scope of this introduction. (Among other things, how to read a character
right away is one of the things that's not defined by the C language, and it's not defined by any of the
standard library functions, either. How to do it depends on which operating system you're using.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback
http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html (2 of 2) [22/07/2003 5:08:03 PM]

section 1.5.2: Character Counting

section 1.5.2: Character Counting


page 18
Ignore the mention of efficiency with respect to nc = nc+1 vs. ++nc. Once you've gotten used to ++
meaning ``increment by 1,'' you'll probably find yourself preferring ++nc simply because it is more
concise, and incrementing things by 1 is so common. (Personally, once I got used to it, I found ++ more
natural, too, because after all, expressions like nc = nc+1, though they're common enough in
programming, are very unnatural from an algebraic perspective.)
pages 18-19
You may find it odd to have a loop with no body, but such loops do crop up. Just make sure that the
explicit null statement (or, if you prefer, empty {}) marking the empty loop body is plainly visible.
The whole first paragraph of page 19 counts as ``deep.'' A clean, well-designed loop will work properly
for all of its ``boundary conditions'': zero trips through the loop, one trip, many trips, maximum trips (if
there is any maximum, and if so, also maximum minus one). If a loop for some reason doesn't work at a
particular boundary condition, it's tempting to claim that that condition is rare or impossible and that the
loop is therefore okay. But if the loop can't handle the boundary condition, why can't it? It's probably
awkwardly constructed, and straightening it out so that it naturally handles all boundary conditions will
usually make it clearer and easier to understand (and may also remove other lurking bugs).

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4g.html [22/07/2003 5:08:05 PM]

section 1.5.3: Line Counting

section 1.5.3: Line Counting


page 19
Note the word of caution about = vs. == carefully. Typing one when you mean the other is,
unfortunately, a very easy mistake to make.
Note that the character constants discussed on page 19 are very different from the string constants
introduced on page 7.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4h.html [22/07/2003 5:08:06 PM]

section 1.5.4: Word Counting

section 1.5.4: Word Counting


page 21
Deep sentence:
In a program as tiny as this, it makes little difference, but in larger programs, the increase
in clarity is well worth the modest extra effort to write it this way from the beginning.
I agree with this. Some people complain that symbolic constants make a program harder to read, because
you always have to look them up to see what they mean. As long as you choose appropriate names for
symbolic constants and use them consistently (i.e. even if APPLE and ORANGE happen to have the
same value, don't use one when you mean the other), no one will have this complaint about your
programs.
Note that there's no direct way to simplify the condition
if (c == ' ' || c == '\n' || c == '\t')
In particular, something like
if (c == (' ' || '\n' || '\t'))
would not work. (What would it do?)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4i.html [22/07/2003 5:08:07 PM]

section 1.6: Arrays

section 1.6: Arrays


page 22
Note carefully that arrays in C are 0-based, not 1-based as they are in some languages. (As we'll see, 0based arrays turn out to be more convenient than 1-based arrays more of the time, but they may take a bit
of getting used to at first.)
When they say ``as reflected in the for loops that initialize and print the array,'' they're referring to the
fact that the vast majority of for loops in C look like this:
for(i = 0; i < 10; ++i)
and count from 0 to 9. The loop
for(i = 1; i <= 10; ++i)
would count from 1 to 10, but loops like this are comparatively rare. (In fact, whenever you see either ``=
1'' or ``<='' in a for loop, it's an indication that something unusual is going on which you'll want to be
aware of, and it may even be a bug.)
page 23
They've started going a little fast here, so read up if they're losing you. What's this magic expression c'0' that they're using as an array subscript? Remember, as we saw first on page 19, that characters in C
are represented by small integers corresponding to their values in the machine's character set. In ASCII,
which most machines use, 'A' is character code 65, '0' (zero) is code 48, '9' is code 57, and all the
other characters have their own values which I won't bother to list. If we've just read the character '9'
from the file, it has value 57, so c-'0' is 57 - 48 which is 9, and we'll increment cell number 9 in the
array, just like we want to. Furthermore, even if we're not using a machine which uses ASCII, by
subtracting '0', we'll always subtract whatever the right value is to map the characters from '0' to
'9' down to the array cell range 0 to 9.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4j.html [22/07/2003 5:08:09 PM]

section 1.7: Functions

section 1.7: Functions


page 24
Deep sentence:
...you will often see a short function defined and called only once, just because it clarifies
some piece of code.
Ideally, this is true in any language. Breaking a program up into functions (or subroutines or procedures
or whatever a language calls them) is one of the first and one of the most important ways to keep control
of the proliferating complexity in a software project.
page 25
Note that the for loop at the top of the page runs from 1 to n rather than 0 to n-1, and may therefore
seem suspect by the above note for page 22. In this case, since all that matters is that the loop is traversed
n times, it doesn't matter which values i takes on.
Not only the names of the parameters and local variables, but also their values (as we'll see in section
1.8), are all local to a function. Rather than remembering a list of things that are local, it's easier to
remember that everything is local: the whole point of a function as an abstraction mechanism is that it's a
black box; you don't have to know or care about any of its implementation details, such as what it
chooses to name its parameters and local variables. You pass it some arguments, and it returns you a
value according to its specification.
The distinction between the terms argument and parameter may seem overly picky, but it's a good way
of reinforcing the notion that the parameters and other details of a function's implementation are almost
completely separated from (that is, of no concern to) the caller.
page 26
Note the discussion about return values from main. The first few sample programs in this chapter,
including the very first ``hello, world'' example on page 6, have omitted a return value, which is, stricly
speaking, incorrect. Do get in the habit of returning a value from main, both to be correct, and because
``programs should return status to their environment.''
By ``Parameter names need not agree'' they mean that it's not a problem that the prototype declaration of
power says that the first parameter is named m, while the actual function definition that it's named
base.

http://www.eskimo.com/~scs/cclass/krnotes/sx4k.html (1 of 2) [22/07/2003 5:08:10 PM]

section 1.7: Functions

pages 26-7
It's probably a good idea if you're aware of this ``old style'' function syntax, so that you won't be taken
aback when you come across it, perhaps in code written by reactionary old fogies (such as the author of
these notes) who still tend to use it out of habit when they're not paying attention.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4k.html (2 of 2) [22/07/2003 5:08:10 PM]

section 1.8: Arguments -- Call by Value

section 1.8: Arguments -- Call by Value


page 27
If, on the other hand, you are not used to other languages such as Fortran, these call-by-value semantics
may not be surprising (any more than anything else in C which is new to you).
Even though you can modify a parameter in a function (i.e. treat it as a ``conveniently initialized local
variable''), you certainly don't have to, especially if (as is often the case) you'll need an unmodified copy
of the parameter later in the function.
page 28
Don't worry too much about the exception mentioned for arrays--there are a number of exceptions for
arrays, and we'll have much more to say about them later. But be aware that we are deliberately glossing
over a few details here, and they are details which will be come important later on. (In particular, the
statement on page 27 that ``the called function cannot directly alter a variable in the calling function''
may not seem to be true for arrays, and this is what the authors mean when they say that ``The story is
different''. We'll be seeing several functions which return things--usually strings--to their callers by
writing into caller-supplied arrays. In chapter 5 we'll learn how this is possible. If this discrepancy
wouldn't have bothered you now, pretend I didn't mention it.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4l.html [22/07/2003 5:08:12 PM]

section 1.9: Character Arrays

section 1.9: Character Arrays


Pay attention to the way this program is developed first in ``pseudocode,'' and then refined into real C
code. A clear pseudocode statement not only makes it easier to think about the structure of the eventual
real code, but if you make the eventual real code mimic the pseudocode, the real code will be equally
straightforward and easy to read.
The function getline, introduced here, is extremely useful, and we'll have as much use for it in our
own programs as the authors do in theirs. (In other words, they have succeeded in their goal of making it
``useful in other contexts.'' In fact, I've been using a getline function much like this one ever since I
learned C from K&R, and I generally find it preferable to the standard library's line-reading function.)
Pages 28 through 30 introduce quite a lot of material all at once; you'll probably want to read it several
times, especially if arrays or character strings are new to you.
Earlier we said that C provided no particular built-in support for composite objects such as character
strings, and here we begin to see the significance of that omission. A string is just an array of characters,
and you can access the characters within a string exactly as easily (because you use exactly the same
syntax) as you access the elements within any other array.
If you've used BASIC, you will probably wonder where C's SUBSTR function is. C doesn't have one, for
two reasons. First of all, there's less of a need for one, because it's so easy the get at the individual
characters within a string in C. More importantly, a SUBSTR function implies that you take a string and
extract a substring as a new string. However, creating a new string (i.e. the extracted substring) involves
allocating arbitrary amounts of memory to hold the string, and C rarely if ever allocates memory
implicitly for you.
If anything, it's too easy to access the individual characters within strings in C. String handling illustrates
one of the potentially frustrating aspects of C we mentioned earlier: the language doesn't define any highlevel string handling features for you, so you're free to do whatever low-level string processing you wish.
The down side is that constantly manipulating strings down at the character level, and always having to
remember to allocate memory for new strings, can get tedious after a while.
The preceding paragraph is not meant to discourage you, but just to point out a reality: any C program
which manipulates strings (and this includes most C programs) will find itself doing a certain amount of
character-level fiddling and a certain amount of memory allocation. It will also find that it can do just
about anything it wants to do (and that its programmer has the patience to do) with the strings it
manipulates.
Since string processing, and at this relatively low level, is so common in C, you'll want to pay careful
attention to the discussion on page 30 of how strings are stored in character arrays, and particularly to the

http://www.eskimo.com/~scs/cclass/krnotes/sx4m.html (1 of 3) [22/07/2003 5:08:14 PM]

section 1.9: Character Arrays

fact that a '\0' character is always present to mark the end of a string. (It's easy to forget to count the
'\0' character when allocating space for a string, for instance.) Notice the nice picture on page 30; this
is a good way of thinking about data structures (and not just simple character arrays, either).
page 29
Note that the program explicitly allocates space for the two strings it manipulates: the current line line,
and the longest line longest. (It only needs these two strings at any one time, even though the input
consists of arbitrarily many lines.) Note that it cannot simply assign one string to another (because C
provides no built-in support for composite objects such as character strings); the program calls the copy
function to do so. (The authors write their own copy function for explanatory purposes; the standard
library contains a string-copying function which would normally be used.) The only strings that aren't
explicitly allocated are the arrays in the getline and copy functions; as the discussion briefly
mentions, these do not need to be allocated because they're already allocated in the caller. (There are a
number of subtleties about array parameters to functions; we'll have more to say about them later.)
The code on page 29 contains a number of examples of compressed assignments and tests; evidently the
authors expect you to get used to this style in a hurry. The line
while ((len = getline(line, MAXLINE)) > 0)
is similar to the getchar loops earlier in this chapter; it calls getline, saves its return value in the
variable len, and tests it against 0.
The comparison
i<lim-1 && (c=getchar())!=EOF && c!='\n'
in the for loop in the getline function does several things: it makes sure there is room for another
character in the array; it calls, assigns, and tests getchar's return value against EOF, as before; and it
also tests the returned character against '\n', to detect end of line. The surrounding code is mildly
clumsy in that it has to check for \n a second time; later, when we learn more about loops, we may find
a way of writing it more cleanly. You may also notice that the code deals correctly with the possibility
that EOF is seen without a \n.
The line
while ((to[i] = from[i]) != '\0')
in the copy function does two things at once: it copies characters from the from array to the to array,
and at the same time it compares the copied character against '\0', so that it stops at the end of the
string. (If you think this is cryptic, wait 'til we get to page 106 in chapter 5!)
http://www.eskimo.com/~scs/cclass/krnotes/sx4m.html (2 of 3) [22/07/2003 5:08:14 PM]

section 1.9: Character Arrays

We've also just learned another printf conversion specifier: %s prints a string.
page 30
Deep sentence:
There is no way for a user of getline to know in advance how long an input line might
be, so getline checks for overflow.
Because dynamically allocating memory for arbitrary-length strings is mildly tedious in C, it's tempting
to use fixed-size arrays. (It's so tempting, in fact, that that's what most programs do, and since fixed-size
arrays are also considerably easier to discuss, all of our early example programs will use them.) Using
fixed-size arrays is fine, as long as some assurance is made that they don't overflow. Unfortunately, it's
also tempting (and easy) to forget to guard against array overflow, perhaps by deluding yourself into
thinking that too-long inputs ``can't happen.'' Murphy's law says that they do happen, and the various
corrolaries to Murphy's law say that they happen in the most unpleasant way and at the least convenient
time. Don't be cavalier about arrays; do make sure that they're big enough and that you guard against
overflowing them. (In another mark of C's general insensitivity to beginning programmers, most
compilers do not check for array overflow; if you write more data to an array than it is declared to hold,
you quietly scribble on other parts of memory, usually with disastrous results.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4m.html (3 of 3) [22/07/2003 5:08:14 PM]

section 1.10: External Variables and Scope

section 1.10: External Variables and Scope


page 31
There's a bit of jargon in this section. An external variable is what is sometimes called a global variable.
The authors introduce the term automatic to refer to the local variables we've seen so far; this is a good
word to remember, even if you never use it, because people will spring it on you when they're being
precise, and if you don't know this usage you'll think they're talking about transmissions or something.
(To be precise, ``local'' is a broader category than ``automatic''; there are both automatic and static
local variables.)
Deep sentence:
If [automatic variables] are not set, they will contain garbage.
Actually, if automatic variables always contained garbage, the situation wouldn't be quite so bad. In
practice, they often (though not always) do contain zero or some other predictable value, and this
happens just often enough to lull you into the occasional false sense of security, by making a program
with an inadvertently uninitialized variable seem to work.
Deep sentence:
An external variable must be defined, exactly once, outside of any function; this sets aside
storage for it. The variable must also be declared in each function that wants to access it;
this states the type of the variable.
The basic rule is ``define once; declare many times.'' As we'll see just below, it is not necessary for a
declaration of an external variable to appear in every single function; it is possible for one external
declaration to apply to many functions. (In the clause ``the variable must also be declared in each
function'', the word ``declared'' is an adjective, not a verb.)
page 33
In fact, the ``common practice'' of placing ``definitions of all external variables at the beginning of the
source file'' is so common that it's rare to see external declarations within functions, as in the functions on
page 32. The authors are using the in-function extern declarations partly because it is an alternative
style, and partly because we haven't talked about separate compilation (that is, building a single program
from several separate source files) yet. Rather than jumping the gun and discussing those two topics now,
I'll just mention that the discussion in section 1.10 might be a bit misleading, and that you should
probably wait until we get to the complete description of the issue in section 4.4 before you commit any
of this to memory.
http://www.eskimo.com/~scs/cclass/krnotes/sx4n.html (1 of 2) [22/07/2003 5:08:16 PM]

section 1.10: External Variables and Scope

Deep sentence:
You should note that we are using the words definition and declaration carefully when we
refer to external variables in this section. ``Definition'' refers to the place where the
variable is created or assigned storage; ``declaration'' refers to places where the nature of
the variable is stated but no storage is allocated.
Do note the careful distinction; it's an important one and one which I'll be using, too.
page 34
The authors' criticism of the second (page 32) version of the longest-line program is accurate. The
revision of the longest-line program to use external variables was done only to demonstrate the use of
external variables, not to improve the program in any way (nor does it improve the program in any way).
As a general rule, external variables are acceptable for storing certain kinds of global state information
which never changes, which is needed in many functions, and which would be a nuisance to pass around.
I don't think of external variables as ``communicating between functions'' but rather as ``setting common
state for the entire program.'' When you start thinking of an external variables as being one of the ways
you communicate with a particular function, and in particular when you find yourself changing the value
of some external variable just before calling some function, to affect its operation in some way, you start
getting into the troublesome uses of external variables, which you should avoid.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx4n.html (2 of 2) [22/07/2003 5:08:16 PM]

Chapter 2: Types, Operators, and Expressions

Chapter 2: Types, Operators, and


Expressions
page 35
Deep sentence:
The type of an object determines the set of values it can have and what operations can be
performed on it.
This is a fairly formal, mathematical definition of what a type is, but it is traditional (and meaningful).
There are several implications to remember:
1. The ``set of values'' is finite. C's int type can not represent all of the integers; its float type
can not represent all floating-point numbers.
2. When you're using an object (that is, a variable) of some type, you may have to remember what
values it can take on and what operations you can perform on it. For example, there are several
operators which play with the binary (bit-level) representation of integers, but these operators are
not meaningful for and may not be applied to floating-point operands.
3. When declaring a new variable and picking a type for it, you have to keep in mind the values and
operations you'll be needing.
In other words, picking a type for a variable is not some abstract academic exercise; it's closely
connected to the way(s) you'll be using that variable.
You don't need to worry about the list of ``small changes and additions'' made by the ANSI standard,
unless you started learning C long ago or have a keen interest in its history. We'll be using these new
features indiscriminately, usually without comment.
section 2.1: Variable Names
section 2.2: Data Types and Sizes
section 2.3: Constants
section 2.4: Declarations
section 2.5: Arithmetic Operators

http://www.eskimo.com/~scs/cclass/krnotes/sx5.html (1 of 2) [22/07/2003 5:08:17 PM]

Chapter 2: Types, Operators, and Expressions

section 2.6: Relational and Logical Operators


section 2.7: Type Conversions
section 2.8: Increment and Decrement Operators
section 2.9: Bitwise Operators
section 2.10: Assignment Operators and Expressions
section 2.11: Conditional Expressions
section 2.12: Precedence and Order of Evaluation

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5.html (2 of 2) [22/07/2003 5:08:17 PM]

section 2.1: Variable Names

section 2.1: Variable Names


Deep sentence:
Don't begin variable names with underscore, however, since library routines often use such
names.
If you happen to pick a name which ``collides'' with (is the same as) a name already chosen by a library
routine, either your code or the library routine (or both) won't work. Naming issues become very
significant in large projects, and problems can be avoided by setting guidelines for who may use which
names. One of these guidelines is simply that user code should not use names beginning with an
underscore, because these names are (for the most part) ``reserved to the implementation'' (that is,
reserved for use by the compiler and the standard library).
Note that case is significant; assuming that case is ignored (as it is with some other programming
languages and operating systems) can lead to real frustration.
The convention that all-upper-case names are used for symbolic constants (i.e. as created with the
#define directive, which we learned about in section 1.4) is arbitrary, but useful. Like the various
conventions for code layout (page 10), this convention is a good one to accept (i.e. not get too creative
about), until you have some very good reason for altering it.
Deep sentence:
Keywords like if, else, int, float, etc., are reserved; you can't use them as variable
names.
You can find the complete list of keywords in appendix A2.4 on page 192.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5a.html [22/07/2003 5:08:19 PM]

section 2.2: Data Types and Sizes

section 2.2: Data Types and Sizes


page 36
If you can look at this list of ``a few basic types in C'' and say to yourself, ``Oh, how simple, there are
only a few types, I won't have to worry much about choosing among them,'' you'll have an easy time with
declarations. (Some masochists wish that the type system were more complicated so that you could
specify more things about each variable, but those of us who would rather not have to specify these extra
things each time are glad that we don't have to.)
Note that the basic types are defined as having at least a certain size. There is no specification that a
short int will be exactly 16 bits, or that a long int will be exactly 32 bits. Some programmers
become obsessed with knowing exactly what sizes things will be in various situations, and write
programs which depend on things having certain sizes. Exact sizes are occasionally important, but most
of the time we can sidestep size issues and let the compiler do most of the worrying.
Most of the simple variables in most programs are of types int, long int, or double. Typically,
we'll use int and double for most purposes, and long int any time we need to hold values greater
than 32,767. We'll rarely use individual variables of type char; although we'll use plenty of arrays of
char. Types short int and float are important primarily when efficiency (speed or memory
usage) is a concern, and for us it usually won't be.
Note that even when we're manipulating individual characters, we'll usually use an int variable, for the
reason discussed in section 1.5.1 on page 16.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5b.html [22/07/2003 5:08:20 PM]

section 2.3: Constants

section 2.3: Constants


page 37
We write constants in decimal, octal, or hexadecimal for our convenience, not the compiler's. The
compiler doesn't care; it always converts everything into binary internally, anyway. (There is, however,
no good way to specify constants in source code in binary.)
pages 37-38
Read the descriptions of character and string constants carefully; most C programs work with these data
types a lot, and their proper use must be kept in mind. Note particularly these facts:
1. The character constant 'x' is quite different from the string constant "x".
2. The value of a character is simply ``the numeric value of the character in the machine's character
set.''
3. Strings are terminated by the null character, \0. (This applies to both string constants and to all
other strings we'll build and manipulate.) This means that the size of a string (the number of
char's worth of memory it occupies) is always one more than its length (i.e. as reported by
strlen) appears to be.
As we saw in section 1.6 on page 23, it's possible to switch rather freely between thinking of a character
as a character and thinking of it as its value. For example, the character '0' (that is, the character that
can print on your screen and looks like the number zero) has in the ASCII character set the internal value
48. Another way of saying this is to notice that the following expressions are all true:
'0' == 48
'0' == '\060'
'0' == '\x30'
We'll have a bit more to say about characters and their small integer representations in section 2.7.
Note also that the string "48" consists of the three characters '4', '8', and '\0'. Also in section 2.7
we'll meet the atoi function which computes a numeric value from a string of digits like this.
page 39
We won't be using enumerations, so you don't have to worry too much about the description of
enumeration constants.

http://www.eskimo.com/~scs/cclass/krnotes/sx5c.html (1 of 2) [22/07/2003 5:08:21 PM]

section 2.3: Constants

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5c.html (2 of 2) [22/07/2003 5:08:21 PM]

section 2.4: Declarations

section 2.4: Declarations


page 40
You may wonder why variables must be declared before use. There are two reasons:
1. It makes things somewhat easier on the compiler; it knows right away what kind of storage to
allocate and what code to emit to store and manipulate each variable; it doesn't have to try to intuit
the programmer's intentions.
2. It forces a bit of useful discipline on the programmer: you cannot introduce variables willy-nilly;
you must think about them enough to pick appropriate types for them. (The compiler's error
messages to you, telling you that you apparently forgot to declare a variable, are as often helpful
as they are a nuisance: they're helpful when they tell you that you misspelled a variable, or forgot
to think about exactly how you were going to use it.)
Although there are a few places where ``certain declarations can be made implicitly by context'', making
use of these removes the advantages of reason 2 above, so I recommend always declaring everything
explicitly.
Most of the time, I recommend writing one declaration per line (as in the ``latter form'' on page 40). For
the most part, the compiler doesn't care what order declarations are in. You can order the declarations
alphabetically, or in the order that they're used, or to put related declarations next to each other.
Collecting all variables of the same type together on one line essentially orders declarations by type,
which isn't a very useful order (it's only slightly more useful than random order).
If you'd rather not remember the rules for default initialization (namely that ``external or static variables
are initialized to zero by default'' and ``automatic variables for which there is no initializer have...
garbage values''), you can get in the habit of initializing everything. It never hurts to explicitly initialize
something when it would have been implicitly initialized anyway, but forgetting to initialize something
that needs it can be the source of frustrating bugs.
Don't worry about the distinction between ``external or static variables''; we haven't seen it yet.
One mild surprise is that const variables are not ``constant expressions'' as defined on page 38. You
can't say something like
const int maxline = 1000;
char line[maxline+1];

/* WRONG */

http://www.eskimo.com/~scs/cclass/krnotes/sx5d.html (1 of 2) [22/07/2003 5:08:23 PM]

section 2.4: Declarations

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5d.html (2 of 2) [22/07/2003 5:08:23 PM]

section 2.5: Arithmetic Operators

section 2.5: Arithmetic Operators


page 41
Keep in the back of your mind somewhere the fact that the behavior of the / and % operators is not
precisely defined for negative operands. This means that -7 / 4 might be -1 or -2, and -7 % 4 might
be -3 or +1. The difference won't matter for the simple programs we'll be writing at first, but eventually
you'll get bit by it if you don't remember it.
An additional arithmetic operation you might be wondering about is exponentiation. Some languages
have an exponentiation operator (typically ^ or **), but C doesn't.
The term ``precedence'' refers to how ``tightly'' operators bind to their operands (that is, to the things they
operate on). In mathematics, multiplication has higher precedence than addition, so 1 + 2 * 3 is 7,
not 9. In other words, 1 + 2 * 3 is equivalent to 1 + (2 * 3). C is the same way.
The term ``associativity'' refers to the grouping when two or more operators of the same precedence
participate next to each other in an expression. When an operator (like subtraction) associates ``left to
right,'' it means that 1 - 2 - 3 is equivalent to (1 - 2) - 3 and gives -4, not +2.
By the way, the word ``arithmetic'' as used in the title of this section is an adjective, not a noun, and it's
pronounced differently than the noun: the accent is on the third syllable.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5e.html [22/07/2003 5:08:24 PM]

section 2.6: Relational and Logical Operators

section 2.6: Relational and Logical Operators


If it isn't obvious, >= is greater-than-or-equal-to, <= is less-than-or-equal-to, == is equal-to, and != is
not-equal-to. We use >=, <=, and != because the symbols >=, <=, and != are not common on computer
keyboards, and we use == because equality testing and assignment are two completely different
operations, but = is already taken for assignment. (Obviously, typing = when you mean == is a very easy
mistake to make, so watch for it. Some compilers will warn you when you use one but seem to want the
other.)
The fact that evaluation of the logical operators && and || ``stops as soon as the truth or falsehood of the
result is known'' refers to the fact that
``false'' AND anything is false
or, in C,
(0 && anything) == 0
while, on the other hand,
``true'' OR anything is true
or, in C,
(1 || anything) == 1
Looking at these another way, if you want to do something if thing1 is true and thing2 is true, and you've
just noticed that thing1 is false, you don't even need to check thing2. Similarly, if you're supposed to do
something if thing3 is true or thing4 is true, and you notice that thing3 is true, you can go ahead and do
whatever it is you're supposed to do without checking thing4.
C works the same way, and if it's not true that ``most C programs rely on these properties,'' it's certainly
true that many do.
For another example of the usefulness of this ``short-circuiting'' behavior, suppose we're taking the
average of n numbers. If n is zero, that is, if we don't have any numbers to take the average of, we don't
want to divide by zero. Code like
if(n != 0 && sum / n > 1)
is common: it tests whether n is nonzero and the average is greater than 1, but it does not have to worry
http://www.eskimo.com/~scs/cclass/krnotes/sx5f.html (1 of 3) [22/07/2003 5:08:26 PM]

section 2.6: Relational and Logical Operators

about dividing by zero. (If, on the other hand, the compiler always evaluated both sides of the && before
checking to see whether they were both true, the code above could divide by zero.)
page 42
Note the extra parentheses in
(c = getchar()) != '\n'
Since this is a common idiom, you'll need to remember the parentheses. What would
c = getchar() != '\n'
do?
C's treatment of Boolean values (that is, those where we only care whether they're true or false) is
straightforward. We'll have more to say about it later, but for now, note that a value of zero is ``false,''
and any nonzero value is ``true.'' You might also note that there is no necessary connection between
statements like if() which expect a true/false value and operators like >= and && which generate
true/false values. You can use operators like >= and && in any expression, and you can use any
expression in an if() statement.
The authors make a good point about style: if valid is conceptually a Boolean variable (that is, it's an
integer, but we only care about whether it's zero or nonzero, in other words, ``false'' or ``true''), then
if(valid)
is a perfectly reasonable and readable condition. However, when values are not conceptually Boolean, I
encourage you to make explicit comparisons against 0. For example, we could have expressed our
average-taking code as
if(n && sum / n > 1)
but I think it's clearer to be explicit and say
if(n != 0 && sum / n > 1)
(However, many C programmers feel that expressions like
if(n && sum / n > 1)

http://www.eskimo.com/~scs/cclass/krnotes/sx5f.html (2 of 3) [22/07/2003 5:08:26 PM]

section 2.6: Relational and Logical Operators

are ``more concise,'' so you will see them all the time and you should be able to read them.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5f.html (3 of 3) [22/07/2003 5:08:26 PM]

section 2.7: Type Conversions

section 2.7: Type Conversions


The conversion rules described here and on page 44 are straightforward, but they're quite important, so
you'll need to learn them well. Usually, conversions happen automatically and when you want them to, but
not always, so it's important to keep the rules in mind. (Recall the discussion of 5/9 on page 12.)
Deep sentence:
A char is just a small integer, so chars may be freely used in arithmetic expressions.
Whether you treat a ``small integer'' as a character or an integer is pretty much up to you. As we saw
earlier, in the ASCII character set, the character '0' has the value 48. Therefore, saying
int i = '0';
is the same as saying
int i = 48;
If you print i out as a character, using
putchar(i);
or
printf("%c", i);
(the %c format prints characters; see page 13), you'll see the character '0'. If you print it out as a number:
printf("%d", i);
you'll see the value 48.
Most of the time, you'll use whatever notation matches what you're trying to do. If you want the character
'0', you'll use '0'. If you want the value 48 (as the number of months in four years, or something), you'll
use 48. If you want to print characters, you'll use putchar or printf %c, and if you want to print
integers, you'll use printf %d. Occasionally, you'll cross over between thinking of characters as
characters and as values, such as in the character-counting program in section 1.6 on page 22, or in the
atoi function we'll look at next. (You should never have to know that '0' has the value 48, and you
should never have to write code which depends on it.)

http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (1 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

page 43
To illustrate the ``schitzophrenic'' nature of characters (are they characters, or are they small integer
values?), it's useful to look at an implementation of the standard library function atoi. (If you're getting
overwhelmed, though, you may skip this example for now, and come back to it later.) The atoi routine
converts a string like "123" into an integer having the corresponding value.
As you study the atoi code at the top of page 43, figure out why it does not seem to explicitly check for
the terminating '\0' character.
The expression
s[i] - '0'
is an example of the ``crossing over'' between thinking about a character and its value. Since the value of
the character '0' is not zero (and, similarly, the other numeric characters don't have their ``obvious''
values, either), we have to do a little conversion to get the value 0 from the character '0', the value 1 from
the character '1', etc. Since the character set values for the digit characters '0' to '9' are contiguous
(48-57, if you must know), the conversion involves simply subtracting an offset, and the offset (if you think
about it) is simply the value of the character '0'. We could write
s[i] - 48
if we really wanted to, but that would require knowing what the value actually is. We shouldn't have to
know (and it might be different in some other character set), so we can let the compiler do the dirty work
by using '0' as the offset (since subtracting '0' is, by definition, the same as subtracting the value of the
character '0').
The functions from <ctype.h> are being introduced here without a lot of fanfare. Here is the main loop
of the atoi routine, rewritten to use isdigit:
for (i = 0; isdigit(s[i]); ++i)
n = 10 * n + (s[i] - '0');
Don't worry too much about the discussion of signed vs. unsigned characters for now. (Don't forget about it
completely, though; eventually, you'll find yourself working with a program where the issue is significant.)
For now, just remember:
1. Use int as the type of any variable which receives the return value from getchar, as discussed in
section 1.5.1 on page 16.
2. If you're ever dealing with arbitrary ``bytes'' of binary data, you'll usually want to use unsigned
char.

http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (2 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

page 44
As we saw in section 2.6 on page 44, relational and logical operators always ``return'' 1 for ``true'' and 0 for
``false.'' However, when C wants to know whether something is true or false, it just looks at whether it's
nonzero or zero, so any nonzero value is considered ``true.'' Finally, some functions which return true/false
values (the text mentions isdigit) may return ``true'' values of other than 1.
You don't have to worry about these distinctions too much, and you also don't have to worry about the
fragment
d = c >= '0' && c <= '9'
as long as you write conditionals in a sensible way. If you wanted to see whether two variables a and b
were equal, you'd never write
if((a == b) == 1)
(although it would work: the == operator ``returns'' 1 if they're equal). Similarly, you don't want to write
if(isdigit(c) == 1)
because it's equally silly-looking, and in this case it might not work. Just write things like
if(a == b)
and
if(isdigit(c))
and you'll steer clear of most problems. (Make sure, though, that you never try something like if('0'
<= c <= '9'), since this wouldn't do at all what it looks like it's supposed to.)
The set of implicit conversions on page 44, though informally stated, is exactly the set to remember for
now. They're easy to remember if you notice that, as the authors say, ``the `lower' type is promoted to the
`higher' type,'' where the ``order'' of the types is
char < short int < int < long int < float < double < long double
(We won't be using long double, so you don't need to worry about it.) We'll have more to say about
these rules on the next page.
Don't worry too much for now about the additional rules for unsigned values, because we won't be using
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (3 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

them at first.
Do notice that implicit (automatic) conversions do happen across assignments. It's perfectly acceptable to
assign a char to an int or vice versa, or assign an int to a float or vice versa (or any other
combination). Obviously, when you assign a value from a larger type to a smaller one, there's a chance that
it might not fit. Therefore, compilers will often warn you about such assignments.
page 45
Casts can be a bit confusing at first. A cast is the syntax used to request an explicit type conversion;
coercion is just a more formal word for ``conversion.'' A cast consists of a type name in parentheses and is
used as a unary operator. You may have used languages which had conversion operators which looked
more like function calls:
integer i = 2;
floating f = floating(i);
integer i2 = integer(f);

/* not C */
/* not C */

In C, you accomplish the same thing with casts:


int i = 2;
float f = (float)i;
int i2 = (int)f;
(Actually, in C, we wouldn't need casts in those initializations at all, because conversions between int and
float are some of the ones that C performs automatically.)
To further understand both how implicit conversions and explicit casts work, let's study how the implicit
conversions would look if we wrote them out explicitly. First we'll declare a few variables of various types:
char c1, c2;
int i1, i2;
long int L1, L2;
double d1, d2;
Next we'll look at the kinds of conversions which C automatically performs when performing arithmetic on
two dissimilar types, or when assigning a value to a dissimilar type. The rules are straightforward: when
performing arithmetic on two dissimilar types, C converts one or both sides to a common type; and when
assigning a value, C converts it to the type of the variable being assigned to.
If we add a char to an int:

http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (4 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

i2 = c1 + i1;
the fourth rule on page 44 tells us to convert the char to an int, as if we'd written
i2 = (int)c1 + i1;
If we multiply a long int and a double:
d2 = L1 * d1;
the second rule tells us to convert the long int to a double, as if we'd written
d2 = (double)L1 * d1;
An assignment of a char to an int
i1 = c1;
is as if we'd written
i1 = (int)c1;
and an assignment of a float to an int
i1 = f1;
is as if we'd written
i1 = (int)f1;
Some programmers worry that implicit conversions are somehow unreliable and prefer to insert lots of
explicit conversions. I recommend that you get comfortable with implicit conversions--they're quite useful-and don't clutter your code with extra casts.
There are a few places where you do need casts, however. Consider the code
i1 = 200;
i2 = 400;
L1 = i1 * i2;
The product 200 x 400 is 80000, which is not guaranteed to fit into an int. (Remember that an int is only
guaranteed to hold values up to 32767.) Since 80000 will fit into a long int, you might think that you're
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (5 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

okay, but you're not: the two sides of the multiplication are of the same type, so the compiler doesn't see the
need to perform any automatic conversions (none of the rules on page 44 apply). The multiplication is
carried out as an int, which overflows with unpredictable results, and only after the damage has been
done is the unpredictable value converted to a long int for assignment to L1. To get a multiplication
like this to work, you have to explicitly convert at least one of the int's to long int:
L1 = (long int)i1 * i2;
Now, the two sides of the * are of different types, so they're both converted to long int (by the fifth rule
on page 44), and the multiplication is carried out as a long int. If it makes you feel safer, you can use
two casts:
L1 = (long int)i1 * (long int)i2;
but only one is strictly required.
A similar problem arises when two integers are being divided. The code
i1 = 1;
f1 = i1 / 2;
does not set f1 to 0.5, it sets it to 0. Again, the two operands of the / operand are already of the same type
(the rules on page 44 still don't apply), so an integer division is performed, which discards any fractional
part. (We saw a similar problem in section 1.2 on page 12.) Again, an explicit conversion saves the day:
f1 = (float)i1 / 2;
Alternately, in a case like this, you can use a floating-point constant:
f1 = i1 / 2.0;
In either case, as soon as one of the operands is floating point, the division is carried out in floating point,
and you get the result you expect.
Implicit conversions always happen during arithmetic and assignment to variables. The situation is a bit
more complicated when functions are being called, however.
The authors use the example of the sqrt function, which is as good an example as any. sqrt accepts an
argument of type double and returns a value of type double. If the compiler didn't know that sqrt
took a double, and if you called
sqrt(4);
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (6 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

or
int n = 4;
sqrt(n);
the compiler would pass an int to sqrt. Since sqrt expects a double, it will not work correctly if it
receives an int. Therefore, it was once always necessary to use explicit conversions in cases like this, by
calling
sqrt((double)4)
or
sqrt((double)n)
or
sqrt(4.0)
However, it is now possible, with a function prototype, to tell the compiler what types of arguments a
function expects. The prototype for sqrt is
double sqrt(double);
and as long as a prototype is in effect (``in scope,'' as the cognoscenti would say), you can call sqrt
without worrying about conversions. When a prototype is in effect, the compiler performs implicit
conversions during function calls (specifically, while passing the arguments) exactly as it does during
simple assignments.
Obviously, using prototypes makes for much safer programming, and it is recommended that you use them
whenever possible. For the standard library functions (the ones already written for you), you get prototypes
automatically when you include the header files which describe sets of library functions. For example, you
get prototypes for all of C's built-in math functions by putting the line
#include <math.h>
at the top of your program. For functions that you write, you can supply your own prototypes, which we'll
be learning more about later.
However, there are a few situations (we'll talk about them later) where prototypes do not apply, so it's
important to remember that function calls are a bit different and that explicit conversions (i.e. casts) may
occasionally be required. Don't imagine that prototypes are a panacea.
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (7 of 8) [22/07/2003 5:08:29 PM]

section 2.7: Type Conversions

page 46
Don't worry about the rand example.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (8 of 8) [22/07/2003 5:08:29 PM]

section 2.8: Increment and Decrement Operators

section 2.8: Increment and Decrement Operators


The distinction between the prefix and postfix forms of ++ and -- will probably seem strained at first,
but it will make more sense once we begin using these operators in more realistic situations.
The authors point out that an expression like (i+j)++ is illegal, and it's worth thinking for a moment
about why. The ++ operator doesn't just mean ``add one''; it means ``add one to a variable'' or ``make a
variable's value one more than it was before.'' But (i+j) is not a variable, it's an expression; so there's
no place for ++ to store the incremented result. If you were bound and determined to use ++ here, you'd
have to introduce another variable:
int k = i + j;
k++;
But really, when you want to add one to an expression, just use
i + j + 1
Another unfortunate (and utterly meaningless) example is
i = i++;
If you want to increment i (that is, add one to it, and store the result back in i), either use
i = i + 1;
or
i++;
Don't try to combine the two.
page 47
Deep sentence:
In a context where no value is wanted, just the incrementing effect, as in
if(c == '\n')
nl++;

http://www.eskimo.com/~scs/cclass/krnotes/sx5h.html (1 of 2) [22/07/2003 5:08:31 PM]

section 2.8: Increment and Decrement Operators

prefix and postfix are the same.


In other words, when you're just incrementing some variable, you can use either the nl++ or ++nl
form. But when you're immediately using the result, as in the examples we'll look at later, using one or
the other makes a big difference.
In that light, study one of the examples on this page--squeeze, the modified getline, or strcat-and convince yourself that it would not work if the wrong form of increment (++i or ++j) were used.
You may note that all three examples on pages 47-48 use the postfix form. Postfix increment is probably
more common, though prefix definitely has its uses, too.
You may notice the keyword void popping up in a few code examples. void is a type we haven't met
yet; it's a type with no values and no operations. When a function is declared as ``returning'' void, as in
the squeeze and strcat examples on pages 47 and 48, it means that the function does not return a
value. (This was briefly mentioned on page 30 in chapter 1.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5h.html (2 of 2) [22/07/2003 5:08:31 PM]

section 2.9: Bitwise Operators

section 2.9: Bitwise Operators


page 48
The bitwise operators are definitely a bit (pardon the pun) more esoteric than the parts of C we've
covered so far (and, indeed, than probably most of C). We won't concentrate on them, but they do come
up all the time, so you should eventually learn enough about them to recognize what they do, even if you
don't use them in any of your own programs for a while. You may skip this section for now, though.
To see what the bitwise operators are doing, it may help to convert to binary for a moment and look at
what's happening to the individual bits. In the example on page 48, suppose that n is 052525, which is
21845 decimal, or 101010101010101 binary. Then n & 0177, in base 2 and base 8 (binary and octal)
looks like
101010101010101
& 000000001111111
--------------1010101

052525
& 000177
-----125

In the second example, if SET_ON is 012 and x is 0, then x | SET_ON looks like
000000000
| 000001010
--------1010

000000
| 000012
-----12

and if x starts out as 402, it looks like


100000010
| 000001010
--------100001010

000402
| 000012
-----412

Note that with &, anywhere we have a 0 we turn bits off, and anywhere we have a 1 we copy bits through
from the other side. With |, anywhere we have a 1 we turn bits on, and anywhere we have a 0 we leave
bits alone.
You'll frequently see the word mask used, both as a noun and a verb. You can imagine that we've cut a
mask or stencil out of cardboard, and are using spray paint to spray through the mask onto some other
piece of paper. For |, the holes in the mask are like 1's, and the spray paint is like 1's, and we paint more
1's onto the underlying paper. (If there was already paint under a hole, nothing really changes if we get
http://www.eskimo.com/~scs/cclass/krnotes/sx5i.html (1 of 2) [22/07/2003 5:08:33 PM]

section 2.9: Bitwise Operators

more paint on it; it's still a ``1''.)


The & operator is a bit harder to fit into this analogy: you can either imagine that the holes in the mask
are 1's and you're spraying some preservative which will fix some of the underlying bits after which the
others will get washed off, or you can imagine that the holes in the mask are 0's, and you're spraying
some erasing paint or some background color which obliterates anything (i.e. any 1's, any foreground
color) it reaches.
For a bit more information on ``bitwise'' operations, see the handout, ``A Brief Refresher on Some Math
Often Used in Computing.''
page 49
Work through the example at the top of the page, and convince yourself that 1 & 2 is 0 and that 1 &&
2 is 1.
The precedence of the bitwise operators is not what you might expect, and explicit parentheses are often
needed, as noted in this deep sentence from page 52:
Note that the precedence of the bitwise operators &, ^, and | falls below == and !=. This
implies that bit-testing expressions like
if ((x & MASK) == 0) ...
must be fully parenthesized to give proper results.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5i.html (2 of 2) [22/07/2003 5:08:33 PM]

section 2.10: Assignment Operators and Expressions

section 2.10: Assignment Operators and Expressions


page 50
You may wonder what it means to say that ``expr<sub>1</sub> is computed only once'' since in an assignment
like
i = i + 2
we don't ``evaluate'' the i on the left hand side of the = at all, we assign to it. The distinction becomes important,
however, when the left hand side (expr<sub>1</sub>) is more complicated than a simple variable. For example,
we could add 2 to each of the n cells of an array a with code like
int i = 0;
while(i < n)
a[i++] += 2;
If we tried to use the expanded form, we'd get
int i = 0;
while(i < n)
a[i++] = a[i++] + 2;
and by trying to increment i twice within the same expression we'd get (as we'll see) undesired, unpredictable, and in
fact undefined results. (Of course, a more natural form of this loop would be
for(i = 0; i < n; i++)
a[i] += 2;
and with the increment of i moved out of the array subscript, it wouldn't matter so much whether we used a[i] +=
2 or a[i] = a[i] + 2.)
page 51
To make the point more clear, the ``complicated expression'' without using += would look like
yyval[yypv[p3+p4] + yypv[p1+p2]] = yyval[yypv[p3+p4] + yypv[p1+p2]] + 2
(What's going on here is that the subexpression yypv[p3+p4] + yypv[p1+p2] is being used as a subscript to
determine which cell of the yyval array to increment by 2.)
The sentence on p. 51 that includes the words ``the assignment statement has a value'' is a bit misleading: an
assignment is really an expression in C. Like any expression, it has a value, and it can therefore participate as a
subexpression in a larger expression. (If the distinction between the terms ``statement'' and ``expression'' seems
vague, don't worry; we'll start talking about statements in the next chapter.)

http://www.eskimo.com/~scs/cclass/krnotes/sx5j.html (1 of 2) [22/07/2003 5:08:35 PM]

section 2.10: Assignment Operators and Expressions

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5j.html (2 of 2) [22/07/2003 5:08:35 PM]

section 2.11: Conditional Expressions

section 2.11: Conditional Expressions


``Ternary'' is a ten-dollar word meaning ``having three operands.'' (It's analogous to the terms unary and
binary, which refer to operators having one and two operands, respectively.) The conditional operator is a
bit of a frill, and it's a bit obscure, so you may skip section 2.11 in the book on first reading, but please
read the comments in these notes just below (under the mention of ``annoying compulsion'').
page 52
To see what the ?: operator has bought us, here is what the array-printing loop might look like without
it:
for(i = 0; i < n; i++) {
printf("%6d", a[i]);
if(i%10==9 || i==n-1)
printf("\n");
else
printf(" ");
}
You may be finding this compulsion to write ``compact'' or ``concise'' code using operators like ++ and
+= and ?: a bit annoying. There are three things to know:
1. In complicated code, these operators allow an economy of expression which is beneficial.
Mathematicians are constantly inventing new notations, in which one letter or symbol stands for a
complicated expression or operation, in order to solve complicated problems without drowning in
so much verbiage that it would be impossible to follow an argument or check for errors. Computer
programs are large and complex, so well-chosen abbreviations can make them easier to work
with, too.
2. Some C programmers, it's true, do take the urge to write succinct or concise code to excess, and
end up with cryptic, bewildering, obfuscated, impenetrable messes. (I'm not apologizing for them:
I hate overly abbreviated, impossible-to-read code, too!)
3. Since there is overly concise C code out there, it's occasionally necessary to dissect a piece of it
and figure out what it does, so you need to have enough familiarity with these operators, and with
some standard, idiomatic ways in which they're commonly combined, so that you won't be utterly
stymied.
However, there is nothing that says that you have to write concise code yourself. Don't be lured into
thinking that you're not a ``real C programmer'' until you routinely and easily write code which no one
else can read. Write in a style that's comfortable to you; don't be embarrassed if your code seems
``simple.'' (Actually, the very best code seems simple, too.) With time, you'll probably come to
appreciate at least some of the idioms, and to be comfortable enough with them that you may want to use

http://www.eskimo.com/~scs/cclass/krnotes/sx5k.html (1 of 2) [22/07/2003 5:08:36 PM]

section 2.11: Conditional Expressions

a few of them yourself, after all.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5k.html (2 of 2) [22/07/2003 5:08:36 PM]

section 2.12: Precedence and Order of Evaluation

section 2.12: Precedence and Order of Evaluation


Note that precedence is not the same thing as order of evaluation. Precedence determines how an
expression is parsed, and it has an influence on the order in which parts of it are evaluated, but the
influence isn't as strong as you'd think. Precedence says that in the expression
1 + 2 * 3
the multiplication happens before the addition. But if we have several function calls, such as
f() + g() * h()
we have no idea which function will be called first; the compiler might arrange to call f() first even
though its value won't be needed until last. If we were to write an abomination like
i = 1;
a[i++] + a[i++] * a[i++]
we would have no way of knowing which order the three increments would happen in, and in fact the
compiler wouldn't have any idea either. We could not argue that since multiplication has higher
precedence than addition, and since multiplication associates from left to right, the second i++ would
have to happen first, then the third, then the first. (Actually, associativity never says anything about
which side of a single binary operator gets evaluated first; associativity says which of several adjacent
same-precedence operators happens first.)
In general, you should be wary of ever trying to second-guess the relative order in which the various
parts of an expression will be evaluated, with two exceptions:
1. You can obviously assume that precedence will dictate the order in which binary operators are
applied. This typically says more than just what order things happens in, but also what the
expression actually means. (In other words, the precedence of * over + says more than that the
multiplication ``happens first'' in 1 + 2 * 3; it says that the answer is 7, not 9.)
2. You can assume that the && and || operators are evaluated left-to-right, and that the right-hand
side is not evaluated at all if the left-hand side determines the outcome.
To look at one more example, it might seem that the code
int i = 7;
printf("%d\n", i++ * i++);
would have to print 56, because no matter which order the increments happen in, 7x8 is 8x7 is 56. But
http://www.eskimo.com/~scs/cclass/krnotes/sx5l.html (1 of 3) [22/07/2003 5:08:39 PM]

section 2.12: Precedence and Order of Evaluation

++ just says that the increment happens later, not that it happens immediately, so this code could print 49
(if it chose to perform the multiplication first, and both increments later). And, it turns out that
ambiguous expressions like this are such a bad idea that the ANSI C Standard does not require compilers
to do anything reasonable with them at all, such that the above code might end up printing 42, or
8923409342, or 0, or crashing your computer.
Finally, note that parentheses don't dictate overall evaluation order any more than precedence does.
Parentheses override precedence and say which operands go with which operators, and they therefore
affect the overall meaning of an expression, but they don't say anything about the order of subexpressions
or side effects. We could not ``fix'' the evaluation order of any of the expressions we've been discussing
by adding parentheses. If we wrote
f() + (g() * h())
we still wouldn't know whether f(), g(), or h() would be called first. (The parentheses would force
the multiplication to happen before the addition, but precedence already would have forced that,
anyway.) If we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication or in any well-defined
order; this parenthesized version would be just as undefined as i++ * i++ was.
page 53
Deep sentence:
Function calls, nested assignment statements, and increment and decrement operators
cause ``side effects''--some variable is changed as a by-product of the evaluation of an
expression.
(There's a slight inaccuracy in this sentence: any assignment expression counts as a side effect.)
It's these ``side effects'' that you want to keep in mind when you're making sure that your programs are
well-defined and don't suffer any of the undefined behavior we've been discussing. (When we informally
said that complex expressions had several things going on ``at once,'' we were actually referring to
expressions with multiple side effects.) As a general rule, you should make sure that each expression
only has one side effect, or if it has several, that different variables are changed by the several side
effects.
page 54

http://www.eskimo.com/~scs/cclass/krnotes/sx5l.html (2 of 3) [22/07/2003 5:08:39 PM]

section 2.12: Precedence and Order of Evaluation

Deep sentence:
The moral is that writing code that depends on order of evaluation is a bad programming
practice in any language. Naturally, it is necessary to know what things to avoid, but if you
don't know how they are done on various machines, you won't be tempted to take
advantage of a particular implementation.
The first edition of K&R said
...if you don't know how they are done on various machines, that innocence may help to
protect you.
I actually prefer the first edition wording. Many textbooks encourage you to write small programs to find
out how your compiler implements some of these ambiguous expressions, but it's just one step from
writing a small program to find out, to writing a real program which makes use of what you've just
learned. And you don't want to write programs that work only under one particular compiler, that take
advantage of the way that compiler (but perhaps no other) happens to implement the undefined
expressions. It's fine to be curious about what goes on ``under the hood,'' and many of you will be curious
enough about what's going on with these ``forbidden'' expressions that you'll want to investigate them,
but please keep very firmly in mind that, for real programs, the very easiest way of dealing with
ambiguous, undefined expressions (which one compiler interprets one way and another interprets another
way and a third crashes on) is not to write them in the first place.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx5l.html (3 of 3) [22/07/2003 5:08:39 PM]

Chapter 3: Control Flow

Chapter 3: Control Flow


section 3.1: Statements and Blocks
section 3.2: If-Else
section 3.3: Else-If
section 3.4: Switch
section 3.5: Loops--While and For
section 3.6: Loops--Do-while
section 3.7: Break and Continue
section 3.8: Goto and Labels

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6.html [22/07/2003 5:08:40 PM]

section 3.1: Statements and Blocks

section 3.1: Statements and Blocks


page 55
Deep sentence:
There is no semicolon after the right brace that ends a block.
Nothing more to say here, but it's a frequent point of confusion.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6a.html [22/07/2003 5:08:41 PM]

section 3.2: If-Else

section 3.2: If-Else


The syntax description here may seem to suggest that statement<sub>1</sub> and
statement<sub>2</sub> must be single, simple statements, but, as mentioned in section 3.1, a block
of statements enclosed in braces {} is equivalent to a single statement.
page 56
``Coding shortcuts'' like
if(expression)
can indeed be cryptic, but they're also quite common, so you'll need to be able to recognize them even if
you don't choose to write them in your own code. Whenever you see code like
if (x)
or
if (f())
where x or f() do not have obvious ``Boolean'' names, just mentally add in != 0.
Don't worry too much if the multiple if/else ambiguity described on page 56 doesn't make perfect
sense; just note the deep sentence:
...it's a good idea to use braces when there are nested ifs.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6b.html [22/07/2003 5:08:43 PM]

section 3.3: Else-If

section 3.3: Else-If


pages 57-58
Binary search is an extremely important algorithm, but it turns out that it is subtle to get the
implementation just right. (It has been observed that although the first binary search was published in
1946, the first published binary search without bugs did not appear until 1962.) The basic idea is the
same as the algorithm we all tend to use when we're asked to guess a number between 1 and 100: ``Is it
between 1 and 50? Yes? Okay, is it between 25 and 50? No? Okay, is it between 1 and 12? ... '' (Don't
worry if you can't follow all of the details of the algorithm or the code on page 58, but do remember to be
extremely careful if you're ever asked to write a binary search routine.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6c.html [22/07/2003 5:08:44 PM]

section 3.4: Switch

section 3.4: Switch


pages 58-59
We won't be concentrating on switch statements much (they're a bit of a luxury; there's nothing you
can do with a switch that you can't do with an if/else chain, as in section 3.3 on page 57). But
they're quite handy, and good to know about.
The example on page 59 is about as contrived as the example in section 1.6 (page 22) which it replaces,
but studying both examples will give you an excellent feel for how a switch statement works, what the
if/then statements are that a switch is equivalent to and how to map between the two, and why a
switch statement can be convenient.
In the example in the text, note especially the way that ten case labels are attached to one set of
statements (ndigit[c-'0']++;). As the authors point out, this works because of the way switch
cases ``fall through,'' which is a mixed blessing.
The danger of fall-through is illustrated by:
switch(food) {
case APPLE:
printf("apple\n");
case ORANGE:
printf("orange\n");
break;
default:
printf("other\n");
}
When food is APPLE, this code erroneously prints
apple
orange
because the break statement after the APPLE case was omitted.

http://www.eskimo.com/~scs/cclass/krnotes/sx6d.html (1 of 2) [22/07/2003 5:08:45 PM]

section 3.4: Switch

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6d.html (2 of 2) [22/07/2003 5:08:45 PM]

section 3.5: Loops -- While and For

section 3.5: Loops -- While and For


page 60
Remember that, as always, the statement can be a brace-enclosed block.
Make sure you understand how the for loop
for (expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>)
statement
is equivalent to the while loop
expr<sub>1</sub>;
while (expr<sub>2</sub>) {
statement
expr<sub>3</sub> ;
}
There is nothing magical about the three expressions at the top of a for loop; they can be arbitrary
expressions, and they're evaluated just as the expansion into the equivalent while loop would suggest.
(Actually, there are two tiny differences: the behavior of continue, which we'll get to in a bit, and the
fact that the test expression, expr<sub>2</sub>, is optional and defaults to ``true'' for a for loop, but
is required for a while loop.)
for(;;) is one way of writing an infinite loop in C; the other common one is while(1). Don't worry
about what a break would mean in a loop, we'll be seeing it in a few more pages.
pages 60-61
Deep sentences:
Whether to use while or for is largely a matter of personal preference...
Nonetheless, it is bad style to force unrelated computations into the initialization and
increment of a for, which are better reserved for loop control operations.
In general, the three expressions in a for loop should all manipulate (initialize, test, and increment) the
same variable or data structure. If they don't, they are ``unrelated computations,'' and a while loop
would probably be clearer. (The reason that one loop or the other can be clearer is simply that, when you
see a for loop, you expect to see an idiomatic initialize/test/increment of a single variable or data
structure, and if the for loop you're looking at doesn't end up matching that pattern, you've been
http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (1 of 5) [22/07/2003 5:08:48 PM]

section 3.5: Loops -- While and For

momentarily misled.)
page 61
When the authors say that ``the index and limit of a C for loop can be altered from within the loop,''
they mean that a loop like
int i, n = 10;
for(i = 0; i < n; i++) {
if(i == 5)
i++;
printf("%d\n", i);
if(i == 8)
n++;
}
where i and n are modified within the loop, is legal. (Obviously, such a loop can be very confusing, so
you'll probably be better off not making use of this freedom too much.)
When they say that ``the index variable... retains its value when the loop terminates for any reason,'' you
may not find this too surprising, unless you've used other languages where it's not the case. The fact that
loop control variables retain their values after a loop can make some code much easier to write; for
example, the atoi function at the bottom of this page depends on having its i counter manipulated by
several loops as it steps over three different parts of the string (whitespace, sign, digits) with i's value
preserved between each step.
Deep sentence:
Each step does its part, and leaves things in a clean state for the next.
This is an extremely important observation on how to write clean code. As you study the atoi code,
notice that it falls into three parts, each implementing one step of the pseudocode description: skip white
space, get sign, get integer part and convert it. At each step, i points at the next character which that
step is to inspect. (If a step is skipped, because there is no leading whitespace or no sign, the later steps
don't care.)
You may hear the term invariant used: this refers to some condition which exists at all stages of a
program or function. In this case, the invariant is that i always points to the next character to be
inspected. Having some well-chosen invariants can make code much easier to write and maintain. If there
aren't enough invariants--if i is sometimes the next character to look at and sometimes the character that
was just looked at--debugging and maintaining the code can be a nightmare.

http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (2 of 5) [22/07/2003 5:08:48 PM]

section 3.5: Loops -- While and For

In the atoi example, the lines


for (i = 0; isspace(s[i]); i++) /* skip white space */
;
are about at the brink of ``forcing unrelated computations into the initialization and increment,''
especially since so much has been forced into the loop header that there's nothing left in the body. It
would be equally clear to write this part as
i = 0;
while (isspace(s[i]))
i++;

/* skip white space */

The line
sign = (s[i] == '-') ? -1 : 1;
may seem a bit cryptic at first, though it's a textbook example of the use of ?: . The line is equivalent to
sign = 1;
if(s[i] == '-')
sign = -1;
pages 61-62
It's instructive to study this Shell or ``gap'' sort, but don't worry if you find it a bit bewildering.
Deep sentence:
Notice how the generality of for makes the outer loop fit the same form as the others,
even though it is not an arithmetic progression.
The point is that loops don't have to count 0, 1, 2... or 1, 2, 3... . (This one counts n/2, n/4, n/8... .
Later we'll see loops which don't step over numbers at all.)
page 63
Deep sentence:
The commas that separate function arguments, variables in declarations, etc. are not
comma operators...
http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (3 of 5) [22/07/2003 5:08:48 PM]

section 3.5: Loops -- While and For

This looks strange, but it's true. If you say


for (i = 0, j = strlen(s)-1; i < j; i++, j--)
the first comma says to do i = 0 then do j = strlen(s)-1, and the second comma says to do i++
then do j--. However, when you say
getline(line, MAXLINE);
the comma just separates the two arguments line and MAXLINE; they both have to be evaluated, but it
doesn't matter in which order, and they're both passed to getline. (If the comma in a function call
were interpreted as a comma operator, the function would only receive one argument, since the value of
the first operand of the comma operator is discarded.) Since the comma operator discards the value of its
first operand, its first operand had better have a side effect. The expression
++a,++b
increments a and increments b and (if anyone cares) returns b's value, but the expression
a+1,b+1
adds 1 to a, discards it, and returns b+1.
If the comma operator isn't making perfect sense, don't worry about it for now. You're most likely to see
it in the first or third expression of a for statement, where it has the obvious meaning of separating two
(or more) things to do during the initialization or increment step. Just be careful that you don't
accidentally write things like
for(i = 0; j = 0; i < n && j < j; i++; j++)

/* WRONG */

for(i = 0, j = 0, i < n && j < j, i++, j++)

/* WRONG */

or

The correct form of a multi-index loop is something like


for(i = 0, j = 0; i < n && j < j; i++, j++)
Semicolons always separate the initialization, test, and increment parts; commas may appear within the
initialization and increment parts.

http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (4 of 5) [22/07/2003 5:08:48 PM]

section 3.5: Loops -- While and For

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (5 of 5) [22/07/2003 5:08:48 PM]

section 3.6: Loops -- Do-while

section 3.6: Loops -- Do-while


page 63
Note the semicolon following the parenthesized expression in the do-while loop; it's a required part of
the syntax.
Make sure you understand the difference between a while loop and a do-while loop. A while loop
executes strictly according to its conditional expression: if the expression is never true, the loop executes
zero times. The do-while loop, on the other hand, makes an initial ``no peek'' foray through the loop
body no matter what.
To see the difference, let's imagine three different ways of writing the loop in the itoa function on page
64. Suppose we somehow forgot to use a termination condition at all, and wrote something like
for(;;) {
s[i++] = n % 10 + '0';
n /= 10;
}
Eventually, n becomes zero, but we keep going around the loop, and we convert a number like 123 into a
string like "0000000000123", except with an infinite number of leading zeroes. (Mathematically, this
is correct, but it's not what we want here, especially if we want our program to use a finite amount of
time and space.)
Our next attempt might be
while(n > 0) {
s[i++] = n % 10 + '0';
n /= 10;
}
so that we stop creating digits when n reaches 0. This works fine for positive numbers, but for 0, it stops
too soon: it would convert the number 0 to the empty string "". That's why the do-while loop is
appropriate here; the fact that it always makes at least one pass through the loop makes sure that we
always generate at least one digit, even it it's 0.
(It's also useful to look at the invariants in this loop: during each trip through the loop, n contains the rest
of the number we have to convert, s[] contains the digits we've already converted, and i points at the
next cell in s[] which is to receive a digit. Each trip through the loop converts one digit, increments i,
and divides n by 10.)
http://www.eskimo.com/~scs/cclass/krnotes/sx6f.html (1 of 2) [22/07/2003 5:08:49 PM]

section 3.6: Loops -- Do-while

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6f.html (2 of 2) [22/07/2003 5:08:49 PM]

section 3.7: Break and Continue

section 3.7: Break and Continue


pages 64-65
Note that a break inside a switch inside a loop causes a break out of the switch, while a break
inside a loop inside a switch causes a break out of the loop.
Neither break nor continue has any effect on a brace-enclosed block of statements following an if.
break causes a break out of the innermost switch or loop, and continue forces the next iteration of
the innermost loop.
There is no way of forcing a break or continue to act on an outer loop.
Another example of where continue is useful is when processing data files. It's often useful to allow
comments in data files; one convention is that a line beginning with a # character is a comment, and
should be ignored by any program reading the file. This can be coded with something like
while(getline(line, MAXLINE) > 0) {
if(line[0] == '#')
continue;
/* process data file line */
}
The alternative, without a continue, would be
while(getline(line, MAXLINE) > 0) {
if(line[0] != '#') {
/* process data file line */
}
}
but now the processing of normal data file lines has been made subordinate to comment lines. (Also, as
the authors note, it pushes most of the body of the loop over by another tab stop.) Since comments are
exceptional, it's nice to test for them, get them out of the way, and go on about our business, which the
code using continue nicely expresses.

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx6g.html (1 of 2) [22/07/2003 5:08:51 PM]

section 3.7: Break and Continue

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6g.html (2 of 2) [22/07/2003 5:08:51 PM]

section 3.8: Goto and Labels

section 3.8: Goto and Labels


pages 65-66
A tremendous amount of impassioned debate surrounds the lowly goto statement, which exists in many
languages. Some people say that gotos are fine; others say that they must never be used. You should
definitely shy away from gotos, but don't be dogmatic about it; some day, you'll find yourself writing
some screwy piece of code where trying to avoid a goto (by introducing extra tests or Boolean control
variables) would only make things worse.
page 66
When you find yourself writing several nested loops in order to search for something, such that you
would need to use a goto to break out of all of them when you do find what you're looking for, it's often
an excellent idea to move the search code out into a separate function. Doing so can make both the
``found'' and ``not found'' cases easier to handle. Here's a slight variation on the example in the middle of
page 66, written as a function:
/* return i such that a[i] == b[j] for some j, or -1 if none */
int findequal(int a[], int na, int b[], int nb)
{
int i, j;
for(i = 0; i < na; i++) {
for(j = 0; j < nb; j++) {
if(a[i] == b[j])
return i;
}
}
return -1;
}
This function can then be called as
i = findequal(a, na, b, nb);
if(i == -1)
/* didn't find any common element */
else
/* got one */
http://www.eskimo.com/~scs/cclass/krnotes/sx6h.html (1 of 2) [22/07/2003 5:08:52 PM]

section 3.8: Goto and Labels

(The only disadvantage here is that it's trickier to return i and j if we need them both.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx6h.html (2 of 2) [22/07/2003 5:08:52 PM]

Chapter 4: Functions and Program Structure

Chapter 4: Functions and Program


Structure
page 67
Deep paragraph:
Functions break large computing tasks into smaller ones, and enable people to build on
what others have done instead of starting over from scratch. Appropriate functions hide
details of operation from parts of the program that don't need to know about them, thus
clarifying the whole, and easing the pain of making changes.
Functions are probably the most import weapon in our battle against software complexity. You'll want to
learn when it's appropriate to break processing out into functions (and also when it's not), and how to set
up function interfaces to best achieve the qualities mentioned above: reuseability, information hiding,
clarity, and maintainability.
The quoted sentences above show that a function does more than just save typing: a well-defined
function can be re-used later, and eases the mental burden of thinking about a complex program by
freeing us from having to worry about all of it at once. For a well-designed function, at any one time, we
should either have to think about:
1. that function's internal implementation (when we're writing or maintaining it); or
2. a particular call to the function (when we're working with code which uses it).
But we should not have to think about the internals when we're calling it, or about the callers when we're
implementing the internals. (We should perhaps think about the callers just enough to ensure that the
function we're designing will be easy to call, and that we aren't accidentally setting up so that callers will
have to think about any internal details.)
Sometimes, we'll write a function which we only call once, just because breaking it out into a function
makes things clearer and easier.
Deep sentence:
C has been designed to make functions efficient and easy to use; C programs generally
consist of many small functions rather than a few big ones.
Some people worry about ``function call overhead,'' that is, the work that a computer has to do to set up

http://www.eskimo.com/~scs/cclass/krnotes/sx7.html (1 of 5) [22/07/2003 5:08:54 PM]

Chapter 4: Functions and Program Structure

and return from a function call, as opposed to simply doing the function's statements in-line. It's a risky
thing to worry about, though, because as soon as you start worrying about it, you have a bit of a
disincentive to use functions. If you're reluctant to use functions, your programs will probably be bigger
and more complicated and harder to maintain (and perhaps, for various reasons, actually less efficient).
The authors choose not to get involved with the system-specific aspects of separate compilation, but we'll
take a stab at it here. We'll cover two possibilities, depending on whether you're using a traditional
command-line compiler or a newer integrated development environment (IDE) or other graphical user
interface (GUI) compiler.
When using a command-line compiler, there are usually two main steps involved in building an
executable program from one or more source files. First, each source file is compiled, resulting in an
object file containing the machine instructions (generated by the compiler) corresponding to the code in
that source file. Second, the various object files are linked together, with each other and with libraries
containing code for functions which you did not write (such as printf), to produce a final, executable
program.
Under Unix, the cc command can perform one or both steps. So far, we've been using extremely simple
invocations of cc such as
cc hello.c
(section 1.1, page 6). This invocation compiles a single source file, links it, and places the executable
(somewhat inconveniently) in a file named a.out.
Suppose we have a program which we're trying to build from three separate source files, x.c, y.c, and
z.c. We could compile all three of them, and link them together, all at once, with the command
cc x.c y.c z.c
(see also page 70). Alternatively, we could compile them separately: the -c option to cc tells it to
compile only, but not to link. Instead of building an executable, it merely creates an object file, with a
name ending in .o, for each source file compiled. So the three commands
cc -c x.c
cc -c y.c
cc -c y.c
would compile x.c, y.c, and z.c and create object files x.o, y.o, and z.o. Then, the three object
files could be linked together using
cc x.o y.o z.o
http://www.eskimo.com/~scs/cclass/krnotes/sx7.html (2 of 5) [22/07/2003 5:08:54 PM]

Chapter 4: Functions and Program Structure

When the cc command is given an .o file, it knows that it does not have to compile it (it's an object file,
already compiled); it just sends it through to the link process.
Here we begin to see one of the advantages of separate compilation: if we later make a change to y.c,
only it will need recompiling. (At some point you may want to learn about a program called make,
which keeps track of which parts need recompiling and issues the appropriate commands for you.)
Above we mentioned that the second, linking step also involves pulling in library functions. Normally,
the functions from the Standard C library are linked in automatically. Occasionally, you must request a
library manually; one common situation under Unix is that certain math routines are in a separate math
library, which is requested by using -lm on the command line. Since the libraries must typically be
searched after your program's own object files are linked (so that the linker knows which library
functions your program uses), any -l option must appear after the names of your files on the command
line. For example, to link the object file mymath.o (previously compiled with cc -c mymath.c)
together with the math library, you might use
cc mymath.o -lm
Two final notes on the Unix cc command: if you're tired of using the nonsense name a.out for all of
your programs, you can use -o to give another name to the output (executable) file:
cc -o hello hello.c
would create an executable file named hello, not a.out. Finally, everything we've said about cc also
applies to most other Unix C compilers. Many of you will be using acc (a semistandard name for a
version of cc which does accept ANSI Standard C) or gcc (the FSF's GNU C Compiler, which also
accepts ANSI C and is free).
There are command-line compilers for MS-DOS systems which work similarly. For example, the
Microsoft C compiler comes with a CL (``compile and link'') command, which works almost the same as
Unix cc. You can compile and link in one step:
cl hello.c
or you can compile only:
cl /c hello.c
creating an object file named hello.obj which you can link later.

http://www.eskimo.com/~scs/cclass/krnotes/sx7.html (3 of 5) [22/07/2003 5:08:54 PM]

Chapter 4: Functions and Program Structure

The preceding has all been about command-line compilers. If you're using some kind of integrated
development environment, such as Turbo C or the Microsoft Programmer's Workbench or Think C, most
of the mechanical details are taken care of for you. (There's also less I can say here about these
environments, because they're all different.) Typically there's a way to specify the list of files (modules)
which make up your project, and a single ``build'' button which does whatever's required to build (and
perhaps even execute) your program.
section 4.1: Basics of Functions
section 4.2: Functions Returning Non-Integers
section 4.3: External Variables
section 4.4: Scope Rules
section 4.5: Header Files
section 4.6: Static Variables
section 4.7: Register Variables
section 4.8: Block Structure
section 4.9: Initialization
section 4.10: Recursion
section 4.11: The C Preprocessor
section 4.11.1: File Inclusion
section 4.11.2: Macro Substitution
section 4.11.3: Conditional Inclusion

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx7.html (4 of 5) [22/07/2003 5:08:54 PM]

Chapter 4: Functions and Program Structure

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7.html (5 of 5) [22/07/2003 5:08:54 PM]

section 4.1: Basics of Functions

section 4.1: Basics of Functions


page 68
Once again, notice how a clear, simple description of the problem we're trying to solve leads to an (almost)
equally clear program implementing it.
Here are some more nice statements about the virtues of a clean, modular design:
Although it's certainly possible to put the code for all of this in main, a better way is to use
the structure to advantage by making each part a separate function. Three small pieces are
easier to deal with than one big one, because irrelevant details can be buried in the functions,
and the chance of unwanted interactions is minimized. And the pieces may even be useful in
other programs.
Let's say a bit more about how and why functions can be useful. First, we can see that, having chosen to use
a separate function for each part of the print-matching-lines program, the top-level main routine on page
69 is particularly simple and straightforward; it's little more than a transcription into C of the pseudocode
on page 68. The authors don't tend to use too many comments in their code, anyway, but this code hardly
needs any: the names of the functions called speak for themselves. (The only thing that might not be
obvious at first is that strindex is being used not so much to find the index of a substring but just to
determine whether a substring is present at all.) Second, we may be pleased to notice that we're already
having a chance to re-use the getline function we first wrote in Chapter 1. Third, we note that the two
functions which we've chosen to use (getline and strindex) are themselves reasonably simple and
straightforward to write. Finally, note that sometimes what you re-use is not so much a function as a
function interface. The code on page 69 uses a new implementation of getline, but the interface (the
argument list, return value, and functionality) is the same as for the versions of getline in section 1.9 on
page 29. We could have used that version here, or this new version there. Later, if we think of some even
better way of reading lines, we can write yet another version of getline, and as long as it has the same
interface, these programs can call it without their having to be rewritten.
The ease with which a program like this comes together may be mildly deceptive, because nowhere have
we discussed the the motivations which led to the particular pseudocode description on page 68 or the
particular definitions of the functions which were chosen to break the problem down into. Choosing a
design for a program, and defining subfunctions (their interfaces and their behavior) are both arts, and of
course the tasks are not unrelated. A good design leads to the invention of functions which might well be
useful later, and an existing body of good, general-purpose functions (all crying out to be re-used) can help
to guide the design of the next program.
What makes a good building block, either an abstract one that we use in a pseudocode description, or a
concrete one in the form of a general-purpose function? The most important aspect of a good building
block is that have a single, well-defined task to perform. Two of the three functions used in the line-

http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (1 of 6) [22/07/2003 5:08:57 PM]

section 4.1: Basics of Functions

matching program fill this role very well: getline's job is to read one line, and strindex'es job is to
find one string in another string. printf's specification is considerably broader: its job is to print stuff.
(It's not surprising that printf can therefore be the harder routine to call, and is certainly much harder to
implement. Its saving virtue is that it is nonetheless broadly applicable and infinitely reusable.)
When you find that a program is hard to manage, it's often because if has not been designed and broken up
into functions cleanly. Two obvious reasons for moving code down into a function are because:
1. It appeared in the main program several times, such that by making it a function, it can be written just
once, and the several places where it used to appear can be replaced with calls to the new function.
2. The main program was getting too big, so it could be made (presumably) smaller and more manageable
by lopping part of it off and making it a function.
These two reasons are important, and they represent significant benefits of well-chosen functions, but they
are not sufficient to automatically identify a good function. A good function has at least these two
additional attributes:
3. It does just one well-defined task, and does it well.
4. Its interface to the rest of the program is clean and narrow.
Attribute 3 is just a restatement of something we said above. Attribute 4 says that you shouldn't have to
keep track of too many things when calling a function. If you know what a function is supposed to do, and
if its task is simple and well-defined, there should be just a few pieces of information you have to give it to
act upon, and one or just a few pieces of information which it returns to you when it's done. If you find
yourself having to pass lots and lots of information to a function, or remember details of its internal
implementation to make sure that it will work properly this time, it's often a sign that the function is not
sufficiently well-defined. (It may be an arbitrary chunk of code that was ripped out of a main program that
was getting too big, such that it essentially has to have access to all of that main function's variables.)
The whole point of breaking a program up into functions is so that you don't have to think about the entire
program at once; ideally, you can think about just one function at a time. A good function is a ``black box'':
when you call it, you only have to know what it does (not how it does it); and when you're writing it, you
only have to know what it's supposed to do (and you don't have to know why or under what circumstances
its caller will be calling it). Some functions may be hard to write (if they have a hard job to do, or if it's
hard to make them do it truly well), but that difficulty should be compartmentalized along with the function
itself. Once you've written a ``hard'' function, you should be able to sit back and relax and watch it do that
hard work on call from the rest of your program. If you find that difficulties pervade a program, that the
hard parts can't be buried inside black-box functions and then forgotten about, if you find that there are hard
parts which involve complicated interactions among multiple functions, then the program probably needs
redesigning.

http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (2 of 6) [22/07/2003 5:08:57 PM]

section 4.1: Basics of Functions

For the purposes of explanation, we've been seeming to talk so far only about ``main programs'' and the
functions they call and the rationale behind moving some piece of code down out of a ``main program'' into
a function. But in reality, there's obviously no need to restrict ourselves to a two-tier scheme. The ``main
program,'' main(), is itself just a function, and any function we find ourself writing will often be
appropriately written in terms of sub-functions, sub-sub-functions, etc.
That's probably enough for now about functions in general. Here are a few more notes about the linematching program.
The authors mention that ``The standard library provides a function strstr that is similar to strindex,
except that it returns a pointer instead of an index.'' We haven't met pointers yet (they're in chapter 5), so
we aren't quite in a position to appreciate the difference between an index and a pointer. Generally, an
index is a small number referring to some element of an array. A pointer is more general: it can point to any
data object of a particular type, whether it's one element of an array, or some other object anywhere in
memory. (Don't worry too much about the distinction yet, but bear in mind that there is a distinction. Note,
too, that the distinction is not absolute; in fact, the word ``index'' seems to derive from the concept of
pointing, as you can see if you think about what you use your index finger for, or if you notice that the
entries in a book's index point at the referenced parts of the book. We frequently speak casually of an index
variable ``pointing at'' some cell of an array, even though it's not a true pointer variable.)
One facet of the getline function's interface might bear mentioning: its first argument, the character
array s, is being used to return the line that it reads. This may seem to contradict the rule that a function
can never modify the value of a variable in its caller. As was briefly mentioned on page 28, there's an
exception for arrays, which well be learning about in chapter 5; for now, we'll gloss over the point.
(Actually, we're glossing over two points: not only is getline able to return a value via an argument, but
the argument isn't really an array, although it's declared as and looks like one. Please forgive these gentle
fictions; explaining them completely would really be premature at this point. Perhaps they weren't worth
mentioning yet, after all.)
For comparison, here is yet another version of getline:
int getline(char s[], int lim)
{
int c, i = 0;
while(--lim > 0 && (c=getchar()) != EOF) {
s[i++] = c;
if(c == '\n')
break;
}
s[i] = '\0';
http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (3 of 6) [22/07/2003 5:08:57 PM]

section 4.1: Basics of Functions

return i;
}
Note that by using break, we avoid having to test for '\n' in two different places.
If you're having trouble seeing how the strindex function works, its algorithm is
for (each position i in s)
if (t occurs at position i in s)
return i;
(else) return -1;
Filling in the details of ``if (t occurs at position i in s)'', we have:
for (each position i in s)
for (each character in t)
if (it matches the corresponding character in s)
if (it's '\0')
return i;
else
keep going
else
no match at position i
(else) return -1;
A slightly less compressed implementation than the one on page 69 would be:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
for(j = i, k = 0; t[k] != '\0'; j++, k++)
if(s[j] != t[k])
break;
if(t[k] == '\0')
return i;
}
http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (4 of 6) [22/07/2003 5:08:57 PM]

section 4.1: Basics of Functions

return -1;
}
Note that we have to check for the end of the string t twice: once to see if we're at the end of it in the
innermost loop, and again to see why we terminated the innermost loop. (If we terminated the innermost
loop because we reached the end of t, we found a match; otherwise, we didn't.) We could rearrange things
to remove the duplicated test:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
j = i;
k = 0;
do {
if(t[k] == '\0')
return i;
} while(s[j++] == t[k++]);
}
return -1;
}
It's a matter of style which implementation of strindex is preferable; it's impossible to say which is
``best.'' (Can you see a slight difference in the behavior of the version on page 69 versus the two here?
Under what circumstance(s) would this difference be significant? How would the version on page 69
behave under those circumstances, and how would the two routines here behave?)
page 70
Deep sentence:
A program is just a set of definitions of variables and functions.
This sentence may or may not seem deep, and it may or may not be deep, but it's a fundamental definition
of what a C program is.

http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (5 of 6) [22/07/2003 5:08:57 PM]

section 4.1: Basics of Functions

Note that a function's return value is automatically converted to the return type of the function, if necessary,
just as in assignments like
f = i;
where f is float and i is int.
Most programmers do use parentheses around the expression in a return statement, because that way it
looks more like while(), for(), etc. The reason the parentheses are optional is that the formal syntax is
return expression ;
and, as we know, any expression surrounded by parentheses is another expression.
It's debatable whether it's ``not illegal'' for a function to have return statements with and without values.
It's a ``sign of trouble'' at best, and undefined at worst. Another clear sign of trouble (which is equally
undefined) is when a function returns no value, or is declared as void, but a caller attempts to use the
return value.
The main program on page 69 returns the number of matching lines found. This is probably better than
returning nothing, but the convention is usually that a C program returns 0 when it succeeds and a positive
number when it fails.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (6 of 6) [22/07/2003 5:08:57 PM]

section 4.2: Functions Returning Non-Integers

section 4.2: Functions Returning Non-Integers


page 71
Actually, we may have seen at least one function returning a non-integer, in the Fahrenheit-Celsius
conversion program in exercise 1-15 on page 27 in section 1.7.
The type name which precedes the name of a function (and which sets its return type) looks just like (i.e.
is syntactically the same as) the void keyword we've been using to identify functions which don't return
a value.
Note that the version of atof on page 71 does not handle exponential notation like 1.23e45; handling
exponents is left for exercise 4-2 on page 73.

``The standard library includes an atof'' means that we're reimplementing something which would
otherwise be provided for us anyway (i.e. just like printf). In general, it's a bad idea to rewrite
standard library routines, because by doing so you negate the advantage of having someone else write
them for you, and also because the compiler or linker are allowed to complain if you redefine a standard
routine. (On the other hand, seeing how the standard library routines are implemented can be a good
learning experience.)
page 72
In the ``primitive calculator'' code at the top of page 72, note that the call to atof is buried in the
argument list of the call to printf.
Deep sentences:
The function atof must be declared and defined consistently. If atof itself and the call
to it in main have inconsistent types in the same source file, the error will be detected by
the compiler. But if (as is more likely) atof were compiled separately, the mismatch
would not be detected, atof would return a double that main would treat as an int,
and meaningless answers would result.
The problems of mismatched function declarations are somewhat reduced today by the widespread use of
ANSI function prototypes, but they're still important to be aware of.
The implicit function declarations mentioned at the bottom of page 72 are an older feature of the
language. They were handy back in the days when most functions returned int and function prototypes
hadn't been invented yet, but today, if you want to use prototypes, you won't want to rely on implicit
http://www.eskimo.com/~scs/cclass/krnotes/sx7b.html (1 of 3) [22/07/2003 5:09:00 PM]

section 4.2: Functions Returning Non-Integers

declarations. If you don't like depending on defaults and implicit declarations, or if you do want to use
function prototypes religiously, you're under no compunction to make use of (or even learn about)
implicit function declarations, and you'll want to configure your compiler so that it will warn you if you
call a function which does not have an explicit, prototyped declaration in scope.
You may wonder why the compiler is able to get some things right (such as implicit conversions between
integers and floating-point within expressions) whether or not you're explicit about your intentions, while
in other circumstances (such as while calling functions returning non-integers) you must be explicit. The
question of when to be explicit and when to rely on the compiler hinges on several questions:
1. How much information does the compiler have available to it?
2. How likely is it that the compiler will infer the right action?
3. How likely is it that a mistake which you the programmer might make will be caught by the
compiler, or silently compiled into incorrect code?
It's fine to depend on things like implicit conversions as long as the compiler has all the information it
needs to get them right, unambiguously. (Relying on implicit conversions can make code cleaner, clearer,
and easier to maintain.) Relying on implicit declarations, however, is discouraged, for several reasons.
First, there are generally fewer declarations than expressions in a program, so the impact (i.e. work) of
making them all explicit is less. Second, thinking about declarations is good discipline, and requiring that
everything normally be declared explicitly can let the compiler catch a number of errors for you (such as
misspelled functions or variables). Finally, since the compiler only compiles one source file at a time, it
is never able to detect inconsistencies between files (such as a function or variable declared one way in
once source file and some other way in another), so it's important that cross-file declarations be explicit
and consistent. (Various strategies, such as placing common declarations in header files so that they can
be #included wherever they're needed, and requesting that the compiler warn about function calls without
prototypes in scope, can help to reduce the number of errors having to do with improper declarations.)
For the most part, you can also ignore the ``old style'' function syntax, which hardly anyone is using any
more. The only thing to watch out for is that an empty set of parentheses in a function declaration is an
old-style declaration and means ``unspecified arguments,'' not ``no arguments.'' To declare a new-style
function taking no arguments, you must include the keyword void between the parentheses, which
makes the lack of arguments explicit. (A declaration like
int f(void);
does not declare a function accepting one argument of type void, which would be meaningless, since
the definition of type void is that it is a type with no values. Instead, as a special case, a single,
unnamed parameter of type void indicates that a function takes no arguments.) For example, the
definition of the getchar function might look like
int getchar(void)
http://www.eskimo.com/~scs/cclass/krnotes/sx7b.html (2 of 3) [22/07/2003 5:09:00 PM]

section 4.2: Functions Returning Non-Integers

{
int c;
read next character into c somehow
if (no next character)
return EOF;
return c;
}
page 73
Note that this version of atoi, written in terms of atof, has very slightly different behavior: it reads
past a '.' (and, assuming a fully-functional version of atof, an 'e').
The use of an explicit cast when returning a floating-point expression from a routine declared as
returning int represents another point on the spectrum of what you should worry about explicitly versus
what you should feel comfortable making use of implicitly. This is a case where the compiler can do the
``right thing'' safely and unambiguously, as long as what you said (in this case, to return a floating-point
expression from a routine declared as returning int) is in fact what you meant. But since the real
possibility exists that discarding the fractional part is not what you meant, some compilers will warn you
about it. Typically, compilers which warn about such things can be quieted by using an explicit cast; the
explicit cast (even though it appears to ask for the same conversion that would have happened implicitly)
serves to silence the warning. (In general, it's best to silence spurious warnings rather than just ignoring
them. If you get in the habit of ignoring them, sooner or later you'll overlook a significant one that you
would have cared about.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7b.html (3 of 3) [22/07/2003 5:09:00 PM]

section 4.3: External Variables

section 4.3: External Variables


The word ``external'' is, roughly speaking, equivalent to ``global.''
page 74
A program with ``too many data connections between functions'' hasn't managed to achieve the desirable
attributes we were talking about earlier, in particular that a function's ``interface to the rest of the
program is clean and narrow.'' Another bit of jargon you may hear is the word ``coupling,'' which refers
to how much one piece of a program has to know about another.
In general, as we have mentioned, the connections between functions should generally be few and welldefined, in which case they will be amenable to regular old function arguments, and you won't be
tempted to pass lots of data around in global variables. (On the other hand, global variables are fine for
some things, such as configuration information which the whole program cares about and which is set
just once at program startup and then doesn't change.)
The word ``lifetime'' refers to how long a variable and its value stick around. (The jargon term is
``duration.'') So far, we've seen that global variables persist for the life of the program, while local
variables last only as long as the functions defining them are active. However, lifetime (duration) is a
separate and orthogonal concept from scope; we'll soon be meeting local variables which persist for the
life of the program.
Deep sentence:
Thus if two functions must share some data, yet neither calls the other, it is often most
convenient if the shared data is kept in external variables rather than passed in and out via
arguments.
(Later, though, we'll learn about data structures which can make it more convenient to pass certain data
around via function arguments, so we'll have less reason for using external variables for these sorts of
purposes.)
``Reverse Polish'' is used by some (earlier, all) Hewlett-Packard calculators. (The name is based on the
nationality of the mathematician who studied and formalized this notation.) It may seem strange at first,
but it's natural if you observe that you need both numbers (operands) before you can carry out an
operation on them. (This fact is one of the reasons that reverse Polish notation is ``easier to implement.'')
The calculator example is a bit long and a bit involved, but I urge you to work through and understand it.
A calculator is something that everyone's likely to be familiar with; it's interesting to see how one might
work inside; and the techniques used here are generally useful in all sorts of programs.
http://www.eskimo.com/~scs/cclass/krnotes/sx7c.html (1 of 4) [22/07/2003 5:09:02 PM]

section 4.3: External Variables

A ``stack'' is simply a last-in, first-out list. You ``push'' data items onto a stack, and whenever you ``pop''
an item from the stack, you get the one most recently pushed.
pages 76-79
The code for the calculator may seem daunting at first, but it's much easier to follow if you look at each
part in isolation (as good functions are meant to be looked at), and notice that the routines fall into three
levels. At the top level is the calculator itself, which resides in the function main. The main function
calls three lower-level functions: push, pop, and getop. getop, in turn, is written in terms of the still
lower-level functions getch and ungetch.
A few details of the communication among these functions deserve mention. The getop routine actually
returns two values. Its formal return value is a character representing the next operation to be performed.
Usually, that character is just the character the user typed, that is, +, -, *, or /. In the case of a number
typed by the user, the special code NUMBER is returned (which happens to be #defined to be the
character '0', but that's arbitrary). A return value of NUMBER indicates that an entire string of digits has
been typed, and the string itself is copied into the array s passed to getop. In this case, therefore, the
array s is the second return value.
In some printings, the second line on page 76 reads
#include <math.h>

/* for atof() */

which is incorrect; it should be


#include <stdlib.h>

/* for atof() */

page 77
Make sure you understand why the code
push(pop() - pop());

/* WRONG */

might not work correctly.


``The representation can be hidden'' means that the declarations of these variables can follow main in
the file, such that main can't ``see'' them (that is, can't attempt to refer to them). Furthermore, as we'll see,
the declarations might be moved to a separate source file, and main won't care.
pages 77-78

http://www.eskimo.com/~scs/cclass/krnotes/sx7c.html (2 of 4) [22/07/2003 5:09:02 PM]

section 4.3: External Variables

Note that getop does not incorporate the functionality of atoi or atof--it collects and returns the
digits as a string, and main calls atof to convert the string to a floating-point number (prior to pushing
it on the stack). (There's nothing profound about this arrangement; there's no particular reason why
getop couldn't have been set up to do the conversion itself.)
The reasons for using a routine like ungetch are good and sufficient, but they may not be obvious at
first. The essential motivation, as the authors explain, is that when we're reading a string of digits, we
don't know when we've reached the end of the string of digits until we've read a non-digit, and that nondigit is not part of the string of digits, so we really shouldn't have read it yet, after all. The rest of the
program is set up based on the assumption that one call to getop will return the string of digits, and the
next call will return whatever operator followed the string of digits.
To understand why the surprising and perhaps kludgey-sounding getch/ungetch approach is in fact a
good one, let's consider the alternatives. getop could keep track of the one-too-far character somehow,
and remember to use it next time instead of reading a new character. (Exercise 4-11 asks you to
implement exactly this.) But this arrangement of getop is considerably less clean from the standpoint of
the ``invariants'' we were discussing earlier. getop can be written relatively cleanly if one of its
invariants is that the operator it's getting is always formed by reading the next character(s) from the input
stream. getop would be considerably messier if it always had to remember to use an old character if it
had one, or read a new character otherwise. If getop were modified later to read new kinds of operators,
and if reading them involved reading more characters, it would be easy to forget to take into account the
possibility of an old character each time a new character was needed. In other words, everywhere that
getop wanted to do the operation
read the next character
it would instead have to do
if (there's an old character)
use it
else
read the next character
It's much cleaner to push the checking for an old character down into the getch routine.
Devising a pair of routines like getch and ungetch is an excellent example of the process of
abstraction. We had a problem: while reading a string of digits, we always read one character too far.
The obvious solution--remembering the one-too-far character and using it later--would have been clumsy
if we'd implemented it directly within getop. So we invented some new functions to centralize and
encapsulate the functionality of remembering accidentally-read characters, so that getop could be
written cleanly in terms of a simple ``get next character'' operation. By centralizing the functionality, we
make it easy for getop to use it consistently, and by encapsulating it, we hide the (potentially ugly)
http://www.eskimo.com/~scs/cclass/krnotes/sx7c.html (3 of 4) [22/07/2003 5:09:02 PM]

section 4.3: External Variables

details from the rest of the program. getch and ungetch may be tricky to write, but once we've
written them, we can seal up the little black boxes they're in and not worry about them any more, and the
rest of the program (especially getop) is cleaner.
page 79
If you're not used to the conditional operator ?: yet, here's how getch would look without it:
int getch(void)
{
if (bufp > 0)
return buf[--bufp];
else
return getchar();
}
Also, the extra generality of these two routines (namely, that they can push back and remember several
characters, a feature which the calculator program doesn't even use) makes them a bit harder to follow.
Exercise 4-8 asks you two write simpler versions which allow only one character of pushback. (Also, as
the text notes, we don't really have to be writing ungetch at all, because the standard library already
provides an ungetc which can provide one character of pushback for getchar.)
When we defined a stack, we said that it was ``last-in, first-out.'' Are the versions of getch and
ungetch on page 79 last-in, first-out or first-in, first out? Do you agree with this choice?
One last note: the name of the variable bufp suggests that it is a pointer, but it's actually an index into
the buf array.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7c.html (4 of 4) [22/07/2003 5:09:02 PM]

section 4.4: Scope Rules

section 4.4: Scope Rules


page 80
With respect to the ``practical matter'' of splitting the calculator program up into multiple source files,
though it's certainly small enough to fit comfortably into a single source file, it's not so small that there's
anything wrong with splitting it up into multiple source files, especially if we start adding functionality to
it.
The scope of a name is what we have been calling its ``visibility.'' When we say things like ``calling a
function with a prototype in scope'' we mean that a prototype is visible, that a declaration is in effect.
The variables sp and val can be used by the push and pop routines because they're defined in the
same file (and the definitions appear before push and pop). They can't be used in main because no
declaration for them appears in main.c (nor in calc.h, which main.c #includes). If main
attempted to refer to sp or val, they'd be flagged as undefined. (Don't worry about the visibility of
``push and pop themselves.'')
The paragraph beginning ``On the other hand'' is explaining how global (``external'') variables like sp
and val could be accessed in a file other than the file where they are defined. In the examples we've
been looking at, as we've said, sp and val can be used in push and pop because the variables are
defined above the functions. If the variables were defined elsewhere (i.e. in some other file), we'd need a
declaration above--and that's exactly what extern is for. (See page 81 for an example.)
page 81
A definition creates a variable, and for any given global variable, you only want to do that once.
Anywhere else, you want to refer to an existing variable, created elsewhere, without creating a new,
conflicting one. Referring to an existing variable or function is exactly what a declaration is for.
Note also that the definition may optionally initialize the variable. (Don't worry about why a declaration
may optionally include an array dimension.)
``This same organization would also be needed if the definitions of sp and val followed their use in one
file'' means that we could conceivably have, in one file,
extern int sp;
extern double val[];
void push(double f) { ... }

http://www.eskimo.com/~scs/cclass/krnotes/sx7d.html (1 of 2) [22/07/2003 5:09:04 PM]

section 4.4: Scope Rules

double pop(void) { ... }


int sp = 0;
double val[MAXVAL];
So ``extern'' just means ``somewhere else''; it doesn't have to mean ``in a different file,'' though usually
it does.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7d.html (2 of 2) [22/07/2003 5:09:04 PM]

section 4.5: Header Files

section 4.5: Header Files


page 82
By the way, the ``.h'' traditionally used in header file names simply stands for ``header.''
We can imagine several strategies for using header files. At one extreme would be to use zero header
files, and to repeat declarations in each file which needed them. This would clearly be a poor strategy,
because whenever a declaration changed, we would have to remember to change it in several places, and
it would be easy to miss one of them, leading to stubborn bugs. At the other extreme would be to use one
header file for each source file (declaring just the things defined in that source file, to be #included by
files using those things), but such a proliferation of header files would usually be unwieldy. For small
projects (such as the calculator example), it's a reasonable strategy to use one header file for the entire
project. For larger projects, you'll usually have several header files for sets of related declarations.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7e.html [22/07/2003 5:09:05 PM]

section 4.6: Static Variables

section 4.6: Static Variables


page 83
Deep sentence:
The static declaration, applied to an external variable or function, limits the scope of
that object to the rest of the source file being compiled. External static thus provides a
way to hide names like buf and bufp in the getch-ungetch combination, which must
be external so they can be shared, yet which should not be visible to users of getch and
ungetch.
So we can have three kinds of declarations: local to one function, restricted to one source file, or global
across potentially many source files. We can imagine other possibilities, but these three cover most
needs.
Notice that the static keyword does two completely different things. Applied to a local variable (one
inside of a function), it modifies the lifetime (``duration'') of the variable so that it persists for as long as
the program does, and does not disappear between invocations of the function. Applied to a variable
outside of a function (or to a function) static limits the scope to the current file.
To summarize the scope of external and static functions and variables: when a function or global
variable is defined without static, its scope is potentially the entire program, although any file which
wishes to use it will generally need an extern declaration. A definition with static limits the scope
by prohibiting other files from accessing a variable or function; even if they try to use an extern
declaration, they'll get errors about ``undefined externals.''
The rules for declaring and defining functions and global variables, and using the extern and static
keywords, are admittedly complicated and somewhat confusing. You don't need to memorize all of the
rules right away: just use simple declarations and definitions at first, and as you find yourself needing
some of the more complicated possibilities such as static variables, the rules will begin to make more
sense.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7f.html [22/07/2003 5:09:06 PM]

section 4.7: Register Variables

section 4.7: Register Variables


page 83
The register keyword is only a hint. The compiler might not put something in a register even though
you ask it to, and it might put something in a register even though you don't ask it to. Most modern
compilers do a good job of deciding when to put things in registers, so most of the time, you don't need
to worry about it, and you don't have to use the register keyword at all.
(A note to assembly language programmers: there's no way to specify which register a register
variable gets assigned to. Also, when you specify a function parameter as register, it just means that
the local copy of the parameter should be copied to a register if possible; it does not necessarily indicate
that the parameter is going to be passed in a register.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7g.html [22/07/2003 5:09:07 PM]

section 4.8: Block Structure

section 4.8: Block Structure


pages 84-85
You've probably heard that global variables are ``bad'' because they exist everywhere and it can be hard
to keep track of who's using them. In the same way, it can be useful to limit the scope of a local variable
to just the bit of the function that uses it, which is exactly what happens if we declare a variable in an
inner block.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7h.html [22/07/2003 5:09:08 PM]

section 4.9: Initialization

section 4.9: Initialization


page 85
These are some of the rules on initialization; we'll learn a few more later as we learn about a few more
data types.
If you don't feel like memorizing the rules for default initialization, just go ahead and explicitly initialize
everything you care about.
Earlier we said that C is quite general in its treatment of expressions: anywhere you can use an
expression, you can use any expression. Here's an exception to that rule: in an initialization of an external
or static variable (strictly speaking, any variable of static duration; generally speaking, any global
variable or local static variable), the initializer must be a constant expression, with value
determinable at compile time, without calling any functions. (This rule is easy to understand: since these
initializations happen conceptually at compile time, before the program starts running, there's no way for
a function call--that is, some run-time action--to be involved.)
page 86
It probably won't concern you right away, but it turns out that there's another exception about the
allowable expressions in initializers: in the brace-enclosed list of initializers for an array, all of the
expressions must be constant expressions (even for local arrays).
There is an error in some printings: if there are fewer explicit initializers than required for an array, the
others will be initialized to zero, for external, static, and automatic (local) arrays. (When an automatic
array has no initializers at all, then it contains garbage, just as simple automatic variables do.)
If the initialization
char pattern[] = "ould";
makes sense to you, you're fine. But if the statement that
char pattern[] = "ould";
is equivalent to
char pattern[] = { 'o', 'u', 'l', 'd', '\0' };
bothers you at all, study it until it makes sense. Also, note that a character array which seems to contain
http://www.eskimo.com/~scs/cclass/krnotes/sx7i.html (1 of 2) [22/07/2003 5:09:10 PM]

section 4.9: Initialization

(for example) four characters actually contains five, because of the terminating '\0'.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7i.html (2 of 2) [22/07/2003 5:09:10 PM]

section 4.10: Recursion

section 4.10: Recursion


page 86
Recursion is a simple but deep concept which is occasionally presented somewhat bewilderingly. Please
don't be put off by it. If this section stops making sense, don't worry about it; we'll revisit recursion in
chapter 6.
Earlier we said that a function is (or ought to be) a ``black box'' which does some job and does it well.
Whenever you need to get that job done, you're supposed to be able to call that function. You're not
supposed to have to worry about any reasons why the function might not be able to do that job for you
just now.
It turns out that some functions are naturally written in such a way that they can do their job by calling
themselves to do part of their job. This seems like a crazy idea at first, but based on a strict interpretation
of our observation about functions--that we ought to be able to call them whenever we need their job
done--calling a function from within itself ought not to be illegal, and in fact in C it is legal. Such a call is
called a recursive call, and it works because it's possible to have several instances of a function active
simultaneously. They don't interfere with each other, because each instance has its own copies of its
parameters and local variables. (However, if a function accesses any static or global data, it must be
written carefully if it is to be called recursively, because then different instances of it could interfere with
each other.)
Let's consider the printd example rather carefully. First, remind yourself about the reverse-order
problem from the itoa example on page 64 in section 3.6. The ``obvious'' algorithm for determining the
digits in a number, which involves successively dividing it by 10 and looking at the remainders,
generates digits in right-to-left order, but we'd usually like them in left-to-right order, especially if we're
printing them out as we go. Let's see if we can figure out another way to do it.
It's easy to find the lowest (rightmost) digit; that's n % 10. It's easy to compute all but the lowest digit;
that's n / 10. So we could print a number left-to-right, directly, without any explicit reversal step, if we
had a routine to print all but the last digit. We could call that routine, then print the last digit ourselves.
But--here's the surprise--the routine to ``print all but the last digit'' is printd, the routine we're writing,
if we call it with an argument of n / 10.
Recursion seems like cheating--it seems that if you're writing a routine to do something (in this case, to
print digits) and instead of writing code to print digits you just punt and call a routine for printing digits
and which is in fact the very routine you're supposed to write--it seems like you haven't done the job you
came to do. A recursive function seems like circular reasoning; it seems to beg the question of how it
does its job.

http://www.eskimo.com/~scs/cclass/krnotes/sx7j.html (1 of 2) [22/07/2003 5:09:11 PM]

section 4.10: Recursion

But if you're writing a recursive function, as long as you do a little bit of work yourself, and only pass on
a portion of the job to another instance of yourself, you haven't completely reneged on your
responsibilities. Furthermore, if you're ever called with such a small job to do that the little bit you're
willing to do encompasses the whole job, you don't have to call yourself again (there's no remaining
portion that you can't do). Finally, since each recursive call does some work, passing on smaller and
smaller portions to succeeding recursive calls, and since the last call (where the remaining portion is
empty) doesn't generate any more recursive calls, the recursion is broken and doesn't constitute an
infinite loop.
Don't worry about the quicksort example if it seems impenetrable--quicksort is an important algorithm,
but it is not easy to understand completely at first.
Note that the qsort routine described here is very different from the standard library qsort (in fact, it
probably shouldn't even have the same name).

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7j.html (2 of 2) [22/07/2003 5:09:11 PM]

section 4.11: The C Preprocessor

section 4.11: The C Preprocessor


page 88
We've been using #include and #define already, but now we'll describe them more completely.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7k.html [22/07/2003 5:09:13 PM]

section 4.11.1: File Inclusion

section 4.11.1: File Inclusion


The two syntaxes for #include lines can be used in various ways, but very simply speaking, "" is for
header files you've written, and <> is for headers which are provided for you (which someone else has
written).
page 89
Deep sentences:
#include is the preferred way to tie the declarations together for a large program. It
guarantees that all the source files will be supplied with the same definitions and variable
declarations, and thus eliminates a particularly nasty kind of bug. Naturally, when an
included file is changed, all files that depend on it must be recompiled.
That's the story on #include, in a nutshell.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7l.html [22/07/2003 5:09:14 PM]

section 4.11.2: Macro Substitution

section 4.11.2: Macro Substitution


#defines last for the whole file; you can't have local ones like you can for local variables.
``Substitutions are made only for tokens'' means that a substitutable macro name is only recognized when
it stands alone. Also, substitution never happens in quoted strings, because it turns out that you usually
don't want it to. Strings are generally used for communication with the user, while you want substitutions
to happen where you're talking to the compiler.
The point of the ``forever'' example is to demonstrate that the replacement text doesn't have to be a
simple number or string constant. You'd use the forever macro like this:
forever {
...
}
which the preprocessor would expand to
for (;;) {
...
}
which, as we learned in section 3.5 on page 60, is an infinite loop. (Presumably there's a break; see
section 3.7 p. 64.)
Another popular trick is
#define ever ;;
so that you can say
for(ever) {
...
}
But ``preprocessor tricks'' like these tend to get out of hand very quickly; if you use too many of them
you're not writing in C any more but rather in your own peculiar dialect, and no one will be able to read
your code without understanding all of your ``silly little macros.'' It is best if simple macros expand to
simple constants (or expressions).
Macros with arguments are also called ``function-like macros'' because they act almost like miniature
http://www.eskimo.com/~scs/cclass/krnotes/sx7m.html (1 of 3) [22/07/2003 5:09:16 PM]

section 4.11.2: Macro Substitution

functions. There are some important differences, however:

no call-by-value copying semantics


no space saving
hard to have local variables or block structure
have to parenthesize carefully (see below)

page 90
The correct way to write the square() macro is
#define square(x) ((x) * (x))
There are three rules to remember when defining function-like macros:
1. The macro expansion must always be parenthesized so that any low-precedence operators it
contains will still be evaluated first. If we didn't write the square() macro carefully, the
invocation
1 / square(n)
might expand to
1 / n * n
while it should expand to
1 / (n * n)
2. Within the macro definition, all occurrences of the parameters must be parenthesized so that any
low-precedence operators the actual arguments contain will be evaluated first. If we didn't write
the square() macro carefully, the invocation
square(n + 1)
might expand to
n + 1 * n + 1
while it should expand to

http://www.eskimo.com/~scs/cclass/krnotes/sx7m.html (2 of 3) [22/07/2003 5:09:16 PM]

section 4.11.2: Macro Substitution

(n + 1) * (n + 1)
3. If a parameter appears several times in the expansion, the macro may not work properly if the
actual argument is an expression with side effects. No matter how we parenthesize the
square() macro, the invocation
square(i++)
would result in
i++ * i++
(perhaps with some parentheses), but this expression is undefined, because we don't know when
the two increments will happen with respect to each other or the multiplication.
Since the square() macro can't be written perfectly safely, (arguments with side effects will always be
troublesome), its callers will always have to be careful (i.e. not to call it with arguments with side
effects). One convention is to capitalize the names of macros which can't be treated exactly as if they
were functions:
#define Square(x) ((x) * (x))
page 90 continued
#undef can be used when you want to give a macro restricted scope, if you can remember to undefine it
when you want it to go out of scope. Don't worry about ``[ensuring] that a routine is really a function, not
a macro'' or the getchar example.
Also, don't worry about the # and ## operators. These are new ANSI features which aren't needed except
in relatively special circumstances.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7m.html (3 of 3) [22/07/2003 5:09:16 PM]

section 4.11.3: Conditional Inclusion

section 4.11.3: Conditional Inclusion


page 91
The #if !defined(HDR) trick is a bit esoteric to start out with. Let's look at a simpler example: in
ANSI C, the remove function deletes a file. On some older Unix systems, however, the function to
delete a file is instead named unlink. Therefore, when deleting a file, we might use code like this:
#if defined(unix)
unlink(filename);
#else
remove(filename);
#endif
We would arrange to have the macro unix defined when we were compiling our program on a Unix
machine, and not otherwise.
You may wonder what the difference is between the if() statement we've been using all along, and this
new #if preprocessing directive. if() acts at run time; it selects whether or not a statement or group of
statements is executed, based on a run-time condition. #if, on the other hand, acts at compile time; it
determines whether certain parts of your program are even seen by the compiler or not. If for some
reason you want to have two slightly different versions of your program, you can use #if to separate the
different parts, leaving the bulk of the code common, such that you don't have to maintain two totally
separate versions.
#if can be used to conditionally compile anything: not just statements and expressions, but also
declarations and entire functions.
Back to the HDR example (though this is somewhat of a tangent, and it's not vital for you to follow it): it's
possible for the same header file to be #included twice during one compilation, either because the
same #include line appears twice within the same source file, or because a source file contains
something like
#include "a.h"
#include "b.h"
but b.h also #includes a.h. Since some declarations which you might put in header files would
cause errors if they were acted on twice, the #if !defined(HDR) trick arranges that the contents of
a header file are only processed once.
Note that two different macros, both named HDR, are being used on page 91, for two entirely different
http://www.eskimo.com/~scs/cclass/krnotes/sx7n.html (1 of 2) [22/07/2003 5:09:17 PM]

section 4.11.3: Conditional Inclusion

purposes. At the top of the page, HDR is a simple on-off switch; it is #defined (with no replacement
text) when hdr.h is #included for the first time, and any subsequent #inclusion merely tests whether
HDR is #defined. (Note that it is in fact quite possible to define a macro with no replacement text; a
macro so defined is distinguishable from a macro which has not been #defined at all. One common
use of a macro with no replacement text is precisely as a simple #if switch like this.)
At the bottom of the page, HDR ends up containing the name of a header file to be #included; the
name depends on the #if and #elif directives. The line
#include HDR
#includes one of them, depending on the final value of HDR.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx7n.html (2 of 2) [22/07/2003 5:09:17 PM]

Chapter 5: Pointers and Arrays

Chapter 5: Pointers and Arrays


page 93
Pointers are often thought to be the most difficult aspect of C. It's true that many people have various
problems with pointers, and that many programs founder on pointer-related bugs. Actually, though, many
of the problems are not so much with the pointers per se but rather with the memory they point to, and
more specifically, when there isn't any valid memory which they point to. As long as you're careful to
ensure that the pointers in your programs always point to valid memory, pointers can be useful, powerful,
and relatively trouble-free tools. (In these notes, we'll be emphasizing techniques for ensuring that
pointers always point where they should.)
If you haven't worked with pointers before, they're bound to be a bit baffling at first. Rather than
attempting a complete definition (which probably wouldn't mean anything, either) up front, I'll ask you to
read along for a few pages, withholding judgment, and after we've seen a few of the things that pointers
can do, we'll be in a better position to appreciate what they are.
section 5.1: Pointers and Addresses
section 5.2: Pointers and Function Arguments
section 5.3: Pointers and Arrays
section 5.4: Address Arithmetic
section 5.5: Character Pointers and Functions
section 5.6: Pointer Arrays; Pointers to Pointers
section 5.7: Multi-dimensional Arrays
section 5.8: Initialization of Pointer Arrays
section 5.9: Pointers vs. Multi-dimensional Arrays
section 5.10: Command-line Arguments

http://www.eskimo.com/~scs/cclass/krnotes/sx8.html (1 of 2) [22/07/2003 5:09:19 PM]

Chapter 5: Pointers and Arrays

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8.html (2 of 2) [22/07/2003 5:09:19 PM]

section 5.1: Pointers and Addresses

section 5.1: Pointers and Addresses


If you like to use concrete examples and to think about exactly what's going on at the machine level,
you'll want to know how many bytes are occupied by shorts, longs, pointers, etc. It's equally possible,
though, to understand pointers at a more abstract level, thinking about them only in terms of boxes and
arrows, as in the figures on pages 96, 98, 104, 107, and 114-5. (Not worrying about the exact size in
bytes basically means not worrying about how big the boxes are.) The figure at the bottom of page 93 is
probably the least pretty pointer picture in the whole book; don't worry if it doesn't mean much to you.
When we say that a pointer holds an ``address,'' and that unary & is the ``address of'' operator, our
language is of course influenced by the fact that the underlying hardware assigns addresses to memory
locations, but again, it is not necessary (nor necessarily desirable) to think about actual machine
addresses when working with pointers. Thinking about the machine addresses can make certain aspects
of pointers easier to understand, but doing so can also make certain mistakes and misunderstandings
easier. In particular, a pointer in C is more than just an address; as we'll see on the next page, a pointer
also carries the notion of what type of data it points to.
page 94
The presentation on this page is going to seem very artificial at first. At best, you're going to say, ``This
makes sense, but what's it for?'' In fact, it is artificial, and no real program would ever do meaningless
little pointer operations such as are embodied in the example on this page. However, this is the traditional
way to introduce pointers from scratch, and once we've moved past it, we'll be able to talk about some
more meaningful uses of pointers, and to forget about these artificial ones. (Once we're done talking
about the traditional, artificial introduction on page 94, we'll also attempt a slightly more elaborate,
slightly less traditional, slightly more meaningful parallel introduction, so stay tuned.)
Deep sentence:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an int.
We'll have more to say about this sentence in a bit.
As an even more traditional, even less meaningful, even simpler example, we could say
int i = 1;
int *ip;

/* an integer */
/* a pointer-to-int */

http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (1 of 7) [22/07/2003 5:09:22 PM]

section 5.1: Pointers and Addresses

ip = &i;
printf("%d\n", *ip);
*ip = 5;

/* ip points to i */
/* prints i, which is 1 */
/* sets i to 5 */

(The obvious questions are, ``if you want to print i, or set it to 5, why not just do it? Why mess around
with this `pointer' thing?'' More on that in a minute.)
The unary & and * operators are complementary. Given an object (i.e. a variable), & generates a pointer
to it; given a pointer, * ``returns'' the value of the pointed-to object. ``Returns'' is in quotes because, as
you may have noticed in the examples, you're not restricted to fetching values via pointers: you can also
store values via pointers. In an assignment like
*ip = 0;
the subexpression *ip is conceptually ``replaced'' by the object which ip points to, and since *ip
appears on the left-hand side of the assignment operator, what happens to the pointed-to object is that it
gets assigned to.
One of the things that's hard about pointers is simply talking about what's going on. We've been using the
words ``return'' and ``replace'' in quotes, because they don't quite reflect what's actually going on, and
we've been using clumsy locutions like ``fetch via pointers'' and ``store via pointers.'' There is some
jargon for referring to pointer use; one word you'll often see is dereference, a term which, though its
derivation is suspect, is used to mean ``follow a pointer to get at, and use, the object it points to.'' Thus,
we sometimes call unary * the ``pointer dereferencing operator,'' and we may say that the expressions
printf("%d\n", *ip);
and
*ip = 5;
both ``dereference the pointer ip.'' We may also talk about indirecting on a pointer: to indirect on a
pointer is again to follow it to see what it points to; and * may also be called the ``pointer indirection
operator.''
Our examples of pointers so far have been, admittedly, artificial and rather meaningless. Let's try a
slightly more realistic example. In the previous chapter, we used the routines atoi and atof to convert
strings representing numbers to the actual numbers represented. Often the strings were typed by the user,
and read with getline. As you may have noticed, neither atoi nor atof does any validity or error
checking: both simply stop reading when they reach a character that can't be part of the number they're
converting, and if there aren't any numeric characters in the string, they simply return 0. (For example,
atoi("49er") is 49, and atoi("three") is 0, and atof("1.2.3") is 1.2 .) These attributes
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (2 of 7) [22/07/2003 5:09:22 PM]

section 5.1: Pointers and Addresses

make atoi and atof easy to write and easy (for the programmer) to use, but they are not the most userfriendly routines possible. A good user interface would warn the user and prompt again in case of
invalid, non-numeric input.
Suppose we were writing a simple inventory-control system. For each part stored in our warehouse, we
might record the part number, location, and number of parts on hand. For simplicity, we'll assume that
the location is always a simple bin number.
Somewhere in the inventory-control program, we might find the variables
int part_number;
int location;
int number_on_hand;
and there might be a routine that lets the user enter any of these numbers. Suppose that there is another
variable,
int which_entry;
which indicates which of the three numbers is being entered (1 for part_number, 2 for location, or
3 for number_on_hand). We might have code like this:
char instring[30];
switch (which_entry) {
case 1:
printf("enter part number:\n");
getline(instring, 30);
part_number = atoi(instring);
break;
case 2:
printf("enter location:\n");
getline(instring, 30);
location = atoi(instring);
break;
case 3:
printf("enter number on hand:\n");
getline(instring, 30);
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (3 of 7) [22/07/2003 5:09:22 PM]

section 5.1: Pointers and Addresses

number_on_hand = atoi(instring);
break;
}
Suppose that we now begin to add a bit of rudimentary verification to the input routines. The first case
might look like
case 1:
do {
printf("enter part number:\n");
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
part_number = atoi(instring);
} while (part_number == 0);
break;
If the first character is not a digit, or if atoi returns 0, the code goes around the loop another time, and
prompts the user again, in hopes that the user will type some proper numeric input this time. (The tests
for numeric input are not sufficient, nor even wise if 0 is a possible input value, as it presumably is for
number on hand. In fact, the two tests really do the same thing! But please overlook these faults. If you're
curious, you can learn about a new ANSI function, strtol, which is like atoi but gives you a bit
more control, and would be a better routine to use here.)
The code fragment above is for just one of the three input cases. The obvious way to perform the same
checking for the other two cases would be to repeat the same code two more times, changing the prompt
string and the name of the variable assigned to (location or number_on_hand instead of
part_number). Duplicating the code is a nuisance, though, especially if we later come up with a better
way to do input verification (perhaps one not suffering from the imperfections mentioned above). Is there
a better way?
One way would be to use a temporary variable in the input loop, and then set one of the three real
variables to the value of the temporary variable, depending on which_entry:
int temp;
do {
printf("enter the number:\n");
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
temp = atoi(instring);
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (4 of 7) [22/07/2003 5:09:22 PM]

section 5.1: Pointers and Addresses

} while (temp == 0);


switch (which_entry) {
case 1:
part_number = temp;
break;
case 2:
location = temp;
break;
case 3:
number_on_hand = temp;
break;
}
Another way, however, would be to use a pointer to keep track of which variable we're setting. (In this
example, we'll also get the prompt right.)
char instring[30];
int *numpointer;
char *prompt;
switch (which_entry) {
case 1:
numpointer = &part_number;
prompt = "part number";
break;
case 2:
numpointer = &location;
prompt = "location";
break;
case 3:
numpointer = &number_on_hand;
prompt = "number on hand";
break;
}
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (5 of 7) [22/07/2003 5:09:22 PM]

section 5.1: Pointers and Addresses

do {
printf("enter %s:\n", prompt);
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
*numpointer = atoi(instring);
} while (*numpointer == 0);
The idea here is that prompt is the prompt string and numpointer points to the particular numeric
value we're entering. That way, a single input verification loop can print any of the three prompts and set
any of the three numeric variables, depending on where numpointer points. (We won't officially see
character pointers and strings until section 5.5, so don't worry if the use of the prompt pointer seems
new or inexplicable.)
This example is, in its own ways, quite artificial. (In a real inventory-control program, we'd obviously
need to keep track of many parts; we couldn't use single variables for the part number, location, and
quantity. We probably wouldn't really have a which_entry variable telling us which number to
prompt for, and we'd do the numeric validation quite differently. We might well do numeric entry and
validation in a separate function, removing this need for the pointers.) However, the pointer aspect of this
example--using a pointer to refer to one of several different things, so that one generic piece of code can
access any of the things--is a very typical (i.e. realistic) use of pointers.
There's one nuance of pointer declarations which deserves mention. We've seen that
int *ip;
declares the variable ip as a pointer to an int. We might look at that declaration and imagine that int
* is the type and ip is the name of the variable being declared. (Actually, so far, these assumptions are
both true.) We might therefore imagine that a more ``obvious'' way of writing the declaration would be
int* ip;
This would work, but it is misleading, as we'll see if we try to declare two int pointers at once. How
shall we do it? If we try
int* ip1, ip2;

/* WRONG */

we don't succeed; this would declare ip1 as a pointer-to-int, but ip2 as an int (not a pointer). The
correct declaration for two pointers is

http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (6 of 7) [22/07/2003 5:09:22 PM]

section 5.1: Pointers and Addresses

int *ip1, *ip2;


As the authors said in the middle of page 94, the intent of pointer (and in fact all) declarations is that they
give little miniature expressions indicating what type a certain use of the variables will have. The
declaration
int *ip1;
doesn't so much say that ip is a pointer-to-int; it says that *ip is an int. (To be sure, ip is a pointerto-int.) In the declaration
int *ip1, *ip2;
both *ip1 and *ip2 are ints; so ip1 and ip2 are both pointers-to-int. You'll hear this aspect of C
declarations referred to as ``declaration mimics use.'' If it bothers you, or if you think you might
accidentally write things like
int *ip1, ip2;
then to stay on the safe side you might want to get in the habit of writing declarations on separate lines:
int *ip1;
int *ip2;
I promised to point out the safe techniques for ensuring that pointers always point where they should.
The examples in this section, which have all involved pointers pointing to single variables, are relatively
safe; a single variable is not a very risky thing to point to, so code like the examples in this section is
relatively unlikely to go awry and result in invalid pointers. (One potential problem, though, which we'll
talk more about later, is that since local, ``automatic'' variables are automatically deallocated when the
function containing them returns, any pointer to a local variable also becomes invalid. Therefore, a
function which returns a pointer must never return a pointer to one of its own local variables, and it
would also be invalid to take a pointer to a local variable and assign it to a global pointer variable.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (7 of 7) [22/07/2003 5:09:22 PM]

section 5.2: Pointers and Function Arguments

section 5.2: Pointers and Function Arguments


page 95
This section discusses a very common use of pointers: setting things up so that a function can modify
values in its caller, or return values, via its arguments. Remember that, normally, C passes arguments by
value, and that if a function modifies one of its arguments, it modifies only its local copy, not the value in
the caller. (Normally, this is a good thing; having a function which inadvertently assigns to its arguments
and hence inadvertently modifies a value in its caller can be a source of obscure bugs in languages which
don't use call-by-value.) However, what happens if a function wants to modify a value in its caller, and
its caller wants to let it? How can a function return two values? (A function's formal return value is
always a single value.)
The answer to both questions is that a function can declare a parameter which is a pointer. The caller then
passes a pointer to (that is, the ``address of'') a variable in the caller which is to be modified or which is
to receive the second return value. In fact, we've seen examples of this already: getline returns the
length of the line it reads as well as the line itself, and the getop routine in the calculator example in
section 4.3 returned both a code for an operator and a string representing the full text of the operator.
(We needed that string when the operator was '0' indicating numeric input, so that the string could
return the full numeric input.) Though we didn't say so at the time, we were actually using pointers in
these examples. (We'll explore the relationship between arrays and pointers, which made this possible, in
section 5.3.)
With all of this in mind, make sure that you understand why the swap example on page 95 would not
work, and how and why the swap example on page 96 does work, and what the figure on page 96 shows.
The swap example demonstrated a function which modified some variables (a and b) in its caller. The
getint example demonstrates how to return two values from a function by returning one value as the
normal function return value and the other one by writing to a pointer. (There is no fundamental
difference, though, between ``modifying a variable in the caller'' and ``returning a value by writing to a
pointer''; these are just two applications of pointer parameters.)
The version of getint on page 97 is somewhat complicated because it allows free-form input, that is,
the values need only be separated by whitespace or punctuation; they are not restricted to being one per
line or anything. (C source code is also free-form in this regard; see page 4 of chapter 1 of these notes.)
To see more clearly the essence of what getint is supposed to do, imagine for a moment that the input
is restricted to one value per line, as in the ``primitive calculator'' example on page 72 of section 4.2. In
that case, we might use the following simpler (i.e. more primitive) code:
int getint(int *pn)
{

http://www.eskimo.com/~scs/cclass/krnotes/sx8b.html (1 of 3) [22/07/2003 5:09:23 PM]

section 5.2: Pointers and Function Arguments

char line[20];
if (getline(line, 20) <= 0)
return EOF;
*pn = atoi(line);
return 1;
}
The getint function on page 97 is documented as returning nonzero for a valid number and 0 for
something other than a number. Our stripped-down version does not, and as it happens, the example code
at the bottom of page 96 does not make use of the valid/invalid distinction. Can you see a way to rewrite
the code at the bottom of page 96 to fill in the cells of the array with only valid numbers?
You might also notice, again from the code at the bottom of page 96, that & need not be restricted to
single, simple variables; it can take the address of any data object, in this case, one cell of the array.
Just as for all of C's other operators, & can be applied to arbitrary expressions, although it is restricted to
expressions which represent addressable objects. Expressions like &1 or &(2+3) are meaningless and
illegal.
You may remember a discussion from section 1.5.1 on page 16 of how C's getchar routine is able to
return all possible characters, plus an end-of-file indication, in its single return value. Why does getint
need two return values? Why can't it use the same trick that getchar does?
The examples in this section are again relatively safe. The pointers have all been parameters, and the
callers have passed pointers (that is, the ``addresses'' of) their own, properly-allocated variables. That is,
code like
int a = 1, b = 2;
swap(&a, &b);
and
int a;
getint(&a);
is correct and quite safe.
Something to beware of, though, is the temptation to inadvertently pass an uninitialized pointer variable
(rather than the ``address'' of some other variable) to a routine which expects a pointer. We know that the
getint routine expects as its argument a pointer to an int in which it is to store the integer it gets.
Suppose we took that description literally, and wrote
int *ip;

/* a pointer to an int */

http://www.eskimo.com/~scs/cclass/krnotes/sx8b.html (2 of 3) [22/07/2003 5:09:23 PM]

section 5.2: Pointers and Function Arguments

getint(ip);
Here we have in fact passed a pointer-to-int to getint, but the pointer we passed doesn't point
anywhere! When getint writes to (``dereferences'') the pointer, in an expression like *pn = 0, it will
scribble on some random part of memory, and the program may crash. When people get caught in this
trap, they often think that to fix it they need to use the & operator:
getint(&ip);

/* WRONG */

or maybe the * operator:


getint(*ip);

/* WRONG */

but these go from bad to worse. (If you think about them carefully, &ip is a pointer-to-pointer-to-int,
and *ip is an int, and neither of these types matches the pointer-to-int which getint expects.) The
correct usage for now, as we showed already, is something like
int a;
getint(&a);
In this case, a is an honest-to-goodness, allocated int, so when we generate a pointer to it (with &a) and
call getint, getint receives a pointer that does point somewhere.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8b.html (3 of 3) [22/07/2003 5:09:23 PM]

section 5.3: Pointers and Arrays

section 5.3: Pointers and Arrays


page 97
For some people, section 5.3 is evidently the hardest section in this book, or even if they haven't read this
book, the most confusing aspect of the language. C introduces a novel and, it can be said, elegant
integration of pointers and arrays, but there are a distressing number of ways of misunderstanding arrays,
or pointers, or both. Take this section very slowly, learn the things it does say, and don't learn anything it
doesn't say (i.e. don't make any false assumptions).
It's not necessarily true that ``the pointer version will in general be faster''; efficiency is (or ought to be) a
secondary concern when considering the use of pointers.
page 98
On the top half of this page, we aren't seeing anything we haven't seen before. We already knew (or
should have known) that the declaration int a[10]; declares an array of ten contiguous int's
numbered from 0 to 9. We saw on page 94 and again on page 96 that & can be used to take the address of
one cell of an array.
What's new on this page are first the nice pictures (and they are nice pictures; I think they're the right
way of thinking about arrays and pointers in C) and the definition of pointer arithmetic. If the phrase
``then by definition pa+1 points to the next element'' alarms you; if you hadn't known that pa+1 points
to the next element; don't worry. You hadn't known this, and you aren't expected even to have suspected
it: the reason that pa+1 points to the next element is simply that it's defined that way, as the sentence
says. Furthermore, subtraction works in an exactly analogous way: If we were to say
pa = &a[5];
then *(pa-1) would refer to the contents of a[4], and *(pa-i) would refer to the contents of the
location i elements before cell 5 (as long as i <= 5).
Note furthermore that we do not have to worry about the size of the objects pointed to. Adding 1 to a
pointer (or subtracting 1) always means to move over one object of the type pointed to, to get to the next
element. (If you're too worried about machine addresses, or the actual address values stored in pointers,
or the actual sizes of things, it's easy to mistakenly assume that adding or subtracting 1 adds or subtracts
1 from the machine address, but as we mentioned, you don't have to think at this low level. We'll see in
section 5.4 how pointer arithmetic is actually scaled, automatically, by the size of the object pointed to,
but we don't have to worry about it if we don't want to.)
Deep sentence:
http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (1 of 7) [22/07/2003 5:09:26 PM]

section 5.3: Pointers and Arrays

The meaning of ``adding 1 to a pointer,'' and by extension, all pointer arithmetic, is that
pa+1 points to the next object, and pa+i points to the i-th object beyond pa.
This aspect of pointers--that arithmetic works on them, and in this way--is one of several vital facts about
pointers in C. On the next page, we'll see the others.
page 99
Deep sentences:
The correspondence between indexing and pointer arithmetic is very close. By definition,
the value of a variable or expression of type array is the address of element zero of the
array.
This is a fundamental definition, which we'll now spend several pages discussing.
Don't worry too much yet about the assertion that ``pa and a have identical values.'' We're not surprised
about the value of pa after the assignment pa = &a[0]; we've been taking the address of array
elements for several pages now. What we don't know--we're not yet in a position to be surprised about it
or not--is what the ``value'' of the array a is. What is the value of the array a?
In some languages, the value of an array is the entire array. If an array appears on the right-hand sign of
an assignment, the entire array is assigned, and the left-hand side had better be an array, too. C does not
work this way; C never lets you manipulate entire arrays.
In C, by definition, the value of an array, when it appears in an expression, is a pointer to its first
element. In other words, the value of the array a simply is &a[0]. If this statement makes any kind of
intuitive sense to you at this point, that's great, but if it doesn't, please just take it on faith for a while.
This statement is a fundamental (in fact the fundamental) definition about arrays and pointers in C, and if
you don't remember it, or don't believe it, then pointers and arrays will never make proper sense. (You
will also need to know another bit of jargon: we often say that, when an array appears in an expression, it
decays into a pointer to its first element.)
Given the above definition, let's explore some of the consequences. First of all, though we've been saying
pa = &a[0];
we could also say
pa = a;

http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (2 of 7) [22/07/2003 5:09:26 PM]

section 5.3: Pointers and Arrays

because by definition the value of a in an expression (i.e. as it sits there all alone on the right-hand side)
is &a[0]. Secondly, anywhere we've been using square brackets [] to subscript an array, we could also
have used the pointer dereferencing operator *. That is, instead of writing
i = a[5];
we could, if we wanted to, instead write
i = *(a+5);
Why would this possibly work? How could this possibly work? Let's look at the expression *(a+5) step
by step. It contains a reference to the array a, which is by definition a pointer to its first element. So
*(a+5) is equivalent to *(&a[0]+5). To make things clear, let's pretend that we'd assigned the
pointer to the first element to an actual pointer variable:
int *pa = &a[0];
Now we have *(a+5) is equivalent to *(&a[0]+5) is equivalent to *(pa+5). But we learned on
page 98 that *(pa+5) is simply the contents of the location 5 cells past where pa points to. Since pa
points to a[0], *(pa+5) is a[5]. Thus, for whatever it's worth, any time you have an array subscript
a[i], you could write it as *(a+i).
The idea of the previous paragraph isn't worth much, because if you've got an array a, indexing it using
the notation a[i] is considerably more natural and convenient than the alternate *(a+i). The
significant fact is that this little correspondence between the expressions a[i] and *(a+i) holds for
more than just arrays. If pa is a pointer, we can get at locations near it by using *(pa+i), as we learned
on page 98, but we can also use pa[i]. This time, using the ``other'' notation (array instead of pointer,
when we thought we had a pointer) can be more convenient.
At this point, you may be asking why you can write pa[i] instead of *(pa+i). You may be
wondering how you're going to remember that you can do this, or remember what it means if you see it
in someone else's code, when it's such a surprising fact in the first place. There are several ways to
remember it; pick whichever one suits you:
1. It's an arbitrary fact, true by definition; just memorize it.
2. If, for an array a, instead of writing a[i], you can also write *(a+i) (as we proved a few
paragraphs back); then it's only fair that for a pointer pa, instead of writing *(pa+i), you can
also write pa[i].
3. Deep sentence: ``In evaluating a[i], C converts it to *(a+i) immediately; the two forms are
equivalent.''
4. An array is a contiguous block of elements of a particular type. A pointer often points to a
contiguous block of elements of a particular type. Therefore, it's very handy to treat a pointer to a
http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (3 of 7) [22/07/2003 5:09:26 PM]

section 5.3: Pointers and Arrays

contiguous block of elements as if it were an array, by saying things like pa[i].


5. [This is the most radical explanation, though it's also the most true; but if it offends your
sensibilities or only seems to make things more confusing, please ignore it.] When you said
a[i], you weren't really subscripting an array at all, because an array like a in an expression
always turns into a pointer to its first element. So the array subscripting operator [] always finds
itself working on pointers, and it's a simple identity (another definition) that pa[i] is *(pa+i).
(But do pick at least one reason to remember this fact, as it's a fact you'll need to remember; expressions
like pa[i] are quite common.)
The authors point out that ``There is one difference between an array name and a pointer that must be
kept in mind,'' and this is quite true, but note very carefully that there is in fact every difference between
an array and a pointer. When an array name appears in most expressions, it turns into a pointer (to the
array's first element), but that does not mean that the array is a pointer. You may hear it stated that ``an
array is just a constant pointer,'' and this is a convenient explanation, but it is a simplified and potentially
misleading explanation.
With that said, do make sure you understand why a=pa and a++ (where a is an array) cannot mean
anything.
Deep sentence:
When an array name is passed to a function, what is passed is the location of the initial
element.
Though perhaps surprising, this sentence doesn't say anything new. A function call, and more
importantly, each of its arguments, is an expression, and in an expression, a reference to an array is
always replaced by a pointer to its first element. So given
int a[10];
f(a);
it is not the entire array a that is passed to f but rather just a pointer to its first element. For an example
closer to the text on page 99, given
char string[] = "Hello, world!";
int len = strlen(string);
it is not the entire array string that is passed to strlen (recall that C never lets you do anything with
a string or an array all at once), but rather just a pointer to its first element.
We now realize that we've been operating under a gentle fiction during the first four chapters of the book.
http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (4 of 7) [22/07/2003 5:09:26 PM]

section 5.3: Pointers and Arrays

Whenever we wrote a function like getline or getop which seemed to accept an array of characters,
and whenever we thought we were passing arrays of characters to these routines, we were actually
passing pointers. This explains, among other things, how getline and getop were able to modify the
arrays in the caller, even though we said that call-by-value meant that functions can't modify variables in
their callers since they receive copies of the parameters. When a function receives a pointer, it cannot
modify the original pointer in the caller, but it can definitely modify what the pointer points to.
If that doesn't make sense, make sure you appreciate the full difference between a pointer and what it
points to! It is intirely possible to modify one without modifying the other. Let's illustrate this with an
example. If we say
char a[] = "hello";
char b[] = "world";
we've declared two character arrays, a and b, each containing a string. If we say
char *p = a;
we've declared p as a pointer-to-char, and initialized it to point to the first character of the array a. If
we then say
*p = 'H';
we've modified what p points to. We have not modified p itself. After saying *p = 'H'; the string in
the array a has been modified to contain "Hello".
If we say
p = b;
on the other hand, we have modified the pointer p itself. We have not really modified what p points to.
In a sense, ``what p points to'' has changed--it used to be the string in the array a, and now it's the string
in the array b. But saying p = b didn't modify either of the strings.
page 100
Since, as we've just seen, functions never receive arrays as parameters, but instead always receive
pointers, how have we been able to get away with defining functions (like getline and getop) which
seemed to accept arrays? The answer is that whenever you declare an array parameter to a function, the
compiler pretends that you actually declared a pointer. (It does this mostly so that we can get away with
the ``gentle fiction'' of pretending that we can pass arrays to functions.)

http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (5 of 7) [22/07/2003 5:09:26 PM]

section 5.3: Pointers and Arrays

When you see a statement like ``char s[]; and char *s; are equivalent'' (as in fact you see at the
top of page 100), you can be sure that (and you must remember that) it is only function formal parameters
that are being talked about. Anywhere else, arrays and pointers are quite different, as we've discussed.
Expressions like p[-1] (at the end of section 5.3) may be easier to understand if we convert them back
to the pointer form *(p + -1) and thence to *(p-1) which, as we've seen, is the object one before
what p points to.
With the examples in this section, we begin to see how pointer manipulations can go awry. In sections
5.1 and 5.2, most of our pointers were to simple variables. When we use pointers into arrays, and when
we begin using pointer arithmetic to access nearby cells of the array, we must be careful never to go off
the end of the array, in either direction. A pointer is only valid if it points to one of the allocated cells of
an array. (There is also an exception for a pointer just past the end of an array, which we'll talk about
later.) Given the declarations
int a[10];
int *pa;
the statements
pa = a;
*pa = 0;
*(pa+1) = 1;
pa[2] = 2;
pa = &a[5];
*pa = 5;
*(pa-1) = 4;
pa[1] = 6;
pa = &a[9];
*pa = 9;
pa[-1] = 8;
are all valid. These statements set the pointer pa pointing to various cells of the array a, and modify
some of those cells by indirecting on the pointer pa. (As an exercise, verify that each cell of a that
receives a value receives the value of its own index. For example, a[6] is set to 6.)
However, the statements
pa = a;
*(pa+10) = 0;
*(pa-1) = 0;
pa = &a[5];

/* WRONG */
/* WRONG */

http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (6 of 7) [22/07/2003 5:09:26 PM]

section 5.3: Pointers and Arrays

*(pa+10) = 0;
pa = &a[10];
*pa = 0;

/* WRONG */
/* WRONG */

and
int *pa2;
pa = &a[5];
pa2 = pa + 10;
pa2 = pa - 10;

/* WRONG */
/* WRONG */

are all invalid. The first examples set pa to point into the array a but then use overly-large offsets (+10, 1) which end up trying to store a value outside of the array a. The statements in the last set of examples
set pa2 to point outside of the array a. Even though no attempt is made to access the nonexistent cells,
these statements are illegal, too. Finally, the code
int a[10];
int *pa, *pa2;
pa = &a[5];
pa2 = pa + 10;
*pa2 = 0;

/* WRONG */
/* WRONG */

would be very wrong, because it not only computes a pointer to the nonexistent 15<sup>th</sup> cell
of a 10-element array, but it also tries to store something there.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (7 of 7) [22/07/2003 5:09:26 PM]

section 5.4: Address Arithmetic

section 5.4: Address Arithmetic


This section is going to get pretty hairy. Some of it talks about things we've already seen (adding integers
to pointers); some of it talks about things we need to learn (comparing and subtracting pointers); and
some of it talks about a rather sophisticated example (a storage allocator). Don't worry if you can't follow
all the details of the storage allocator, but do read along so that you can pick up the other new points. (In
other words, make sure you read from ``Zero is the sole exception'' in the middle of page 102 to ``that is,
the string length'' on page 103, and also the last paragraph on page 103.)
What is a storage allocator for? So far, we've used pointers to point to existing variables and arrays,
which the compiler allocated for us. But eventually, we may want to allocate data structures (arrays, and
others we haven't seen yet) of a size which we don't know at compile time. Earlier, we spoke briefly
about a hypothetical inventory-management system, which recorded information about each part stored
in a warehouse. How many different parts could there be? If we used fixed-size arrays, there would be a
fixed upper limit on the number of parts we could enter into the system, and we'd be annoyed if that limit
were reached. A better solution is not to allocate a fixed array at compile time, but rather to use a runtime storage allocator to allocate memory for the data structures used to describe each part. That way, the
number of parts which the system can hold is limited only by available memory, not on any static limit
built into the program. Using a storage allocator to allocate memory at run time in this way is called
dynamic allocation.
However, dynamic memory allocation is where C programming can really get tricky, because you the
programmer are responsible for most aspects of it, and there are plenty of things you can do wrong (e.g.
not allocate quite enough memory, accidentally keep using it after you deallocate it, have random invalid
pointers pointing everywhere, etc.). Therefore, we won't be talking about dynamic allocation for a while,
which is why you can skim over the storage allocator in this section for now.
page 102
The first new piece of information in this section (which you'll need to remember even if you're not
following the details of the storage allocator example) is the introduction of the ``null pointer.''
So far, all of our pointers have pointed somewhere, and we've cautioned about pointers which don't. To
help us distinguish between pointers which point somewhere and pointers which don't, there is a single,
special pointer value we can use, which is guaranteed not to point anywhere. When a pointer doesn't
point anywhere, we can set it to this value, to make explicit the fact that it doesn't point anywhere.
This special pointer value is called the null pointer. The way to set a pointer to this value is to use a
constant 0:
int *ip = 0;

http://www.eskimo.com/~scs/cclass/krnotes/sx8d.html (1 of 4) [22/07/2003 5:09:29 PM]

section 5.4: Address Arithmetic

The 0 is just a shorthand; it does not necessarily mean machine address 0. To make it clear that we're
talking about the null pointer and not the integer 0, we often use a macro definition like
#define NULL 0
so that we can say things like
int *ip = NULL;
(If you've used Pascal or LISP, the nil pointer in those languages is analogous.)
In fact, the above #definition of NULL has been placed in the standard header file <stdio.h> for us
(and in several other standard header files as well), so we don't even need to #define it. I agree
completely with the authors that using NULL instead of 0 makes it more clear that we're talking about a
null pointer, so I'll always be using NULL, too.
Just as we can set a pointer to NULL, we can also test a pointer to see if it's NULL. The code
if(p != NULL)
*p = 0;
else
printf("p doesn't point anywhere\n");
tests p to see if it's non-NULL. If it's not NULL, it assumes that it points somewhere valid, and writes a 0
there. Otherwise (i.e. if p is the null pointer) the code complains.
Though we can use null pointers as markers to remind ourselves of which of our pointers don't point
anywhere, it's up to us to do so. It is not guaranteed that all uninitialized pointer variables (which
obviously don't point anywhere) are initialized to NULL, so if we want to use the null pointer convention
to remind ourselves, we'd best explicitly initialize all unused pointers to NULL. Furthermore, there is no
general mechanism that automatically checks whether a pointer is non-null before we use it. If we think
that a pointer might not point anywhere, and if we're using the convention that pointers that don't point
anywhere are set to NULL, it's up to us to compare the pointer to NULL to decide whether it's safe to use
it.
The next new piece of information in this section (which we've already alluded to) is pointer comparison.
You can compare two pointers for equality or inequality (== or !=): they're equal if they point to the
same place or are both null pointers; they're unequal if they point to different places, or if one points
somewhere and one is a null pointer. If two pointers point into the same array, the relational comparisons
<, <=, >, and >= can also be used.
page 103
http://www.eskimo.com/~scs/cclass/krnotes/sx8d.html (2 of 4) [22/07/2003 5:09:29 PM]

section 5.4: Address Arithmetic

The sentences
...n is scaled according to the size of the objects p points to, which is determined by the
declaration of p. If an int is four bytes, for example, the int will be scaled by four.
say something we've seen already, but may only confuse the issue. We've said informally that in the code
int a[10];
int *pa = &a[0];
*(pa+1) = 1;
pa contains the ``address'' of the int object a[0], but we've discouraged thinking about this address as
an actual machine memory address. We've said that the expression pa+1 moves to the next int in the
array (in this case, a[1]). Thinking at this abstract level, we don't even need to worry about any
``scaling by the size of the objects pointed to.''
If we do look at a lower, machine level of addressing, we may learn that an int occupies some number
of bytes (usually two or four), such that when we add 1 to a pointer-to-int, the machine address is
actually increased by 2 or 4. If you like to consider the situation from this angle, you're welcome to, but
if you don't, you certainly don't have to. If you do start thinking about machine addresses and sizes, make
extra sure that you remember that C does do the necessary scaling for you. Don't write something like
int a[10];
int *pa = &a[0];
*(pa+sizeof(int)) = 1;
where sizeof(int) is the size of an int in bytes, and expect it to access a[1].
Since adding an int to a pointer gives us another pointer:
int a[10];
int *pa1 = &a[0];
int *pa2 = pa1 + 5;
we might wonder if we can rearrange the expression
pa2 = pa1 + 5
to get
pa2 - pa1

http://www.eskimo.com/~scs/cclass/krnotes/sx8d.html (3 of 4) [22/07/2003 5:09:29 PM]

section 5.4: Address Arithmetic

(where this is no longer a C assignment, we're just wondering if we can subtract pa1 from pa2, and
what the result might be). The answer is yes: just as you can compare two pointers which point into the
same array, you can subtract them, and the result is, naturally enough, the distance between them, in cells
or elements.
(In the large parenthetical statement in the middle of the page, don't worry too much about ptrdiff_t,
size_t, and sizeof.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8d.html (4 of 4) [22/07/2003 5:09:29 PM]

section 5.5: Character Pointers and Functions

section 5.5: Character Pointers and Functions


page 104
Since text strings are represented in C by arrays of characters, and since arrays are very often
manipulated via pointers, character pointers are probably the most common pointers in C.
Deep sentence:
C does not provide any operators for processing an entire string of characters as a unit.
We've said this sort of thing before, and it's a general statement which is true of all arrays. Make sure you
understand that in the lines
char *pmessage;
pmessage = "now is the time";
pmessage = "hello, world";
all we're doing is assigning two pointers, not copying two entire strings.
At the bottom of the page is a very important picture. We've said that pointers and arrays are different,
and here's another illustration. Make sure you appreciate the significance of this picture: it's probably the
most basic illustration of how arrays and pointers are implemented in C.
We also need to understand the two different ways that string literals like "now is the time" are
used in C. In the definition
char amessage[] = "now is the time";
the string literal is used as the initializer for the array amessage. amessage is here an array of 16
characters, which we may later overwrite with other characters if we wish. The string literal merely sets
the initial contents of the array. In the definition
char *pmessage = "now is the time";
on the other hand, the string literal is used to create a little block of characters somewhere in memory
which the pointer pmessage is initialized to point to. We may reassign pmessage to point somewhere
else, but as long as it points to the string literal, we can't modify the characters it points to.
As an example of what we can and can't do, given the lines

http://www.eskimo.com/~scs/cclass/krnotes/sx8e.html (1 of 5) [22/07/2003 5:09:31 PM]

section 5.5: Character Pointers and Functions

char amessage[] = "now is the time";


char *pmessage = "now is the time";
we could say
amessage[0] = 'N';
to make amessage say "Now is the time". But if we tried to do
pmessage[0] = 'N';
(which, as you may recall, is equivalent to *pmessage = 'N'), it would not necessarily work; we're
not allowed to modify that string. (One reason is that the compiler might have placed the ``little block of
characters'' in read-only memory. Another reason is that if we had written
char *pmessage = "now is the time";
char *qmessage = "now is the time";
the compiler might have used the same little block of memory to initialize both pointers, and we wouldn't
want a change to one to alter the other.)
Deep sentence:
The first function is strcpy(s,t), which copies the string t to the string s. It would be
nice just to say s=t but this copies the pointer, not the characters.
This is a restatement of what we said above, and a reminder of why we'll need a function, strcpy, to
copy whole strings.
page 105
Once again, these code fragments are being written in a rather compressed way. To make it easier to see
what's going on, here are alternate versions of strcpy, which don't bury the assignment in the loop test.
First we'll use array notation:
void strcpy(char s[], char t[])
{
int i;
for(i = 0; t[i] != '\0'; i++)
s[i] = t[i];
s[i] = '\0';
http://www.eskimo.com/~scs/cclass/krnotes/sx8e.html (2 of 5) [22/07/2003 5:09:31 PM]

section 5.5: Character Pointers and Functions

}
Note that we have to manually append the '\0' to s after the loop. Note that in doing so we depend
upon i retaining its final value after the loop, but this is guaranteed in C, as we learned in Chapter 3.
Here is a similar function, using pointer notation:
void strcpy(char *s, char *t)
{
while(*t != '\0')
*s++ = *t++;
*s = '\0';
}
Again, we have to manually append the '\0'. Yet another option might be to use a do/while loop.
All of these versions of strcpy are quite similar to the copy function we saw on page 29 in section
1.9.
page 106
The version of strcpy at the top of this page is my least favorite example in the whole book. Yes,
many experienced C programmers would write strcpy this way, and yes, you'll eventually need to be
able to read and decipher code like this, but my own recommendation against this kind of cryptic code is
strong enough that I'd rather not show this example yet, if at all.
We need strcmp for about the same reason we need strcpy. Just as we cannot assign one string to
another using =, we cannot compare two strings using ==. (If we try to use ==, all we'll compare is the
two pointers. If the pointers are equal, they point to the same place, so they certainly point to the same
string, but if we have two strings in two different parts of memory, pointers to them will always compare
different even if the strings pointed to contain identical sequences of characters.)
Note that strcmp returns a positive number if s is greater than t, a negative number if s is less than t,
and zero if s compares equal to t. ``Greater than'' and ``less than'' are interpreted based on the relative
values of the characters in the machine's character set. This means that 'a' < 'b', but (in the ASCII
character set, at least) it also means that 'B' < 'a'. (In other words, capital letters will sort before
lower-case letters.) The positive or negative number which strcmp returns is, in this implementation at
least, actually the difference between the values of the first two characters that differ.
Note that strcmp returns 0 when the strings are equal. Therefore, the condition
if(strcmp(a, b))
http://www.eskimo.com/~scs/cclass/krnotes/sx8e.html (3 of 5) [22/07/2003 5:09:31 PM]

section 5.5: Character Pointers and Functions

do something...
doesn't do what you probably think it does. Remember that C considers zero to be ``false'' and nonzero to
be ``true,'' so this code does something if the strings a and b are unequal. If you want to do something if
two strings are equal, use code like
if(strcmp(a, b) == 0)
do something...
(There's nothing fancy going on here: strcmp returns 0 when the two strings are equal, so that's what
we explicitly test for.)
To continue our ongoing discussion of which pointer manipulations are safe and which are risky or must
be done with care, let's consider character pointers. As we've mentioned, one thing to beware of is that a
pointer derived from a string literal, as in
char *pmessage = "now is the time";
is usable but not writable (that is, the characters pointed to are not writable.) Another thing to be careful
of is that any time you copy strings, using strcpy or some other method, you must be sure that the
destination string is a writable array with enough space for the string you're writing. Remember, too, that
the space you need is the number of characters in the string you're copying, plus one for the terminating
'\0'.
For the above reasons, all three of these examples are incorrect:
char *p1 = "Hello, world!";
char *p2;
strcpy(p2, p1);
/* WRONG */

char *p = "Hello, world!";


char a[13];
strcpy(a, p);
/* WRONG */

char *p3 = "Hello, world!";


char *p4 = "A string to overwrite";
http://www.eskimo.com/~scs/cclass/krnotes/sx8e.html (4 of 5) [22/07/2003 5:09:31 PM]

section 5.5: Character Pointers and Functions

strcpy(p4, p3);

/* WRONG */

In the first example, p2 doesn't point anywhere. In the second example, a is a writable array, but it
doesn't have room for the terminating '\0'. In the third example, p4 points to memory which we're not
allowed to overwrite. A correct example would be
char *p = "Hello, world!";
char a[14];
strcpy(a, p);
(Another option would be to obtain some memory for the string copy, i.e. the destination for strcpy,
using dynamic memory allocation, but we're not talking about that yet.)
page 106 continued (bottom)
Expressions like *p++ and *--p may seem cryptic at first sight, but they're actually analogous to array
subscript expressions like a[i++] and a[--i], some of which we were using back on page 47 in
section 2.8.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8e.html (5 of 5) [22/07/2003 5:09:31 PM]

section 5.6: Pointer Arrays; Pointers to Pointers

section 5.6: Pointer Arrays; Pointers to Pointers


page 107
Deep sentence:
Since pointers are variables themselves, they can be stored in arrays just as other variables
can.
This is just one aspect of the generality of C's data types, which we'll be seeing in the next few sections.
We've used a recursive definition of ``expression'': a constant or variable is an expression, an expression
in parentheses is an expression, an expression plus an expression is an expression, etc. There are
obviously an infinite number of expressions, of arbitrary complexity. In exactly the same way, there are
an infinite number of data types in C. We've already seen the basic data types: int, char, double, etc.
But then we have the derived data types such as array-of-char and pointer-to-int and functionreturning-double. So we can say that for any type, array-of-type is another type, and pointer-to-type is
another type, and function-returning-type is another type. Once we've said that, we can see that there is
also the possibility of arrays of pointers, and arrays of arrays, and functions returning pointers, and even
(in section 5.11, though this is a deeper topic) pointers to functions. (The only possibilities that C doesn't
support are functions returning arrays, and arrays of functions, and functions returning functions.)
Make sure you understand why an integer is something that can be ``compared or moved in a single
operation,'' but that a string (that is, an array of char) is not. Then, realize that a pointer is also
something that can be ``compared or moved in a single operation.'' (Actually, though, the string
comparisons we'll be doing are not single operations.) From time to time you'll hear me caution you not
to worry too much about certain aspects of efficiency. Here, it's true that the overhead of copying entire
strings from one place to another, a character at a time (which is the overhead we'll be getting around by
manipulating pointers instead) can be significant, but that's not the only concern: once we're comfortable
with the idea, manipulating pointers will be somewhat easier on us, too. (Copying lots of characters
around is a nuisance, and it can also be dangerous, if the destination isn't big enough or isn't in the right
place.)
Don't worry about the ``one long character array'' that the ``lines to be sorted are stored end-to-end in.''
Instead, look at the picture at the bottom of page 107, which shows the pointers that might be set up after
reading the lines
defghi
jklmnopqrst
abc

http://www.eskimo.com/~scs/cclass/krnotes/sx8f.html (1 of 5) [22/07/2003 5:09:34 PM]

section 5.6: Pointer Arrays; Pointers to Pointers

On the left are the pointers before sorting, and on the right are the pointers after sorting. The three strings
have not been moved, but by reshuffling the pointers, the three pointers in order now point to the lines
abc
defghi
jklmnopqrst
page 108
Once again, we see a nice simple decomposition of the problem, which might seem deceptively simple
except that when problems are decomposed in simple ways like this, and then implemented faithfully,
they really can be this simple. Deferring the sorting step is an excellent idea, especially if we didn't quite
follow the details of the sorting functions in the previous chapter. (Actually, in practice, we can usually
defer the sorting step forever, since there's often a general-purpose sort routine provided for us
somewhere. C is no exception: a qsort function is a required part of its standard library. For the most
part, the only people who have to write sort routines are programming students and the few people who
get stuck implementing system functions.)
The main program at the bottom of page 108 looks a bit more elaborate than the pseudocode at the top
of the page, but the essence of the program is the three calls to readlines, qsort, and
writelines. Everything else is declarations, plus an error message which is printed if readlines is
for some reason not able to read the input. Eventually, you should be able to understand why all of the
various declarations are required, but you can skim over them at first.
page 109
The readlines function first calls our old friend getline to read each line into a local array, line.
On page 29 in section 1.9, we saw a program for finding the longest line in the input: it read each line
into a local array line, and kept a copy of the longest line in a second array longest. In that program,
it didn't matter that the input array line was continually overwritten with each new input line, and that
most lines (except the longest one) were lost and forgotten. Here, however, we do need to save all of the
input lines somewhere, so that we can sort them and print them later.
The lines are saved by calling alloc, a function which we wrote in section 5.4 but may have skimmed
over. alloc allocates n bytes of new memory for something which we need to save. Each time we read
another line, we call alloc to allocate some new memory to store it, then call strcpy to copy the line
from the line array to the newly allocated memory. This way, it's okay that the next line is read into the
same line array; we save each line, as it's read, into its own little alloc'ed piece of memory.
Note that memory allocated with a routine such as alloc persists, just as global and static variables
do; it does not disappear when the function that allocated it returns.

http://www.eskimo.com/~scs/cclass/krnotes/sx8f.html (2 of 5) [22/07/2003 5:09:34 PM]

section 5.6: Pointer Arrays; Pointers to Pointers

Hopefully you're getting used to reading compressed condition statements by now, because here's
another doozy:
if (nlines >= maxlines || (p == alloc(len)) == NULL)
This line checks to make sure we have enough room to store the new line we just read. We need two
things: (1) a slot in the lineptr array to store the pointer, and (2) space allocated by alloc to store
the line itself. If we don't have either of these things, we return -1, indicating that we ran out of memory.
We don't have a slot in the lineptr array if we've already read maxlines lines, and we don't have
room to store the line itself if alloc returns NULL. The subexpression (p = alloc(len)) ==
NULL is equivalent in form to to other assign-and-test combinations we've been using involving
getchar and getline: it assigns alloc's return value to p, then compares it to NULL.
Normally, we might be suspicious of the call alloc(len). Why? Remember that strings are always
terminated by '\0', so the space required to store a string is always one more than the the number of
characters in it. Normally, we'll call things like alloc(len + 1), and accidentally calling
alloc(len) is usually a bug. Here, it happens to be okay, because before we copy the line to the
newly-allocated memory, we strip the newline '\n' from the end of it, by overwriting it with '\0',
hence making the string one shorter than len. (Why is the last character in line, namely the '\n', at
line[len-1], and not line[len]?)
The fragments
if (nlines >= maxlines ...
and
lineptr[nlines++] = p;
deserve some attention. These represent a common way of filling in an array in C. nlines always holds
the number of lines we've read so far (it's another invariant). It starts out as 0 (we haven't read any lines
yet) and it ends up as the total number of lines we've read. Each time we read a new line, we store the
line (more precisely, a pointer to it) in lineptr[nlines++]. By using postfix ++, we store the
pointer in the slot indexed by the previous value of nlines, which is what we want, because arrays are
0-based in C. The first time through the loop, nlines is 0, so we store a pointer to the first line in
lineptr[0], and then increment nlines to 1. If nlines ever becomes equal to maxlines, we've
filled in all the slots of the array, and we can't use any more (even though, at that point, the highest-filled
cell in the array is lineptr[maxlines-1], which is the last cell in the array, again because arrays
are 0-based). We test for this condition by checking nlines >= maxlines, as a little measure of
paranoia. The test nlines == maxlines would also work, but if we ever accidentally introduce a
bug into the program such that we fill past the last slot without noticing it, we wouldn't want to keep on
filling farther and farther past the end.
Deep sentences:
http://www.eskimo.com/~scs/cclass/krnotes/sx8f.html (3 of 5) [22/07/2003 5:09:34 PM]

section 5.6: Pointer Arrays; Pointers to Pointers

...lineptr is an array of MAXLINES elements, each element of which is a pointer to a


char. That is, lineptr[i] is a character pointer...
We can see that lineptr[i] has to be a character pointer, by looking at two things: in the function
readlines, the line
lineptr[lines++] = p;
has a character pointer on the right-hand side, and the only thing we can assign a character pointer to is
another character pointer. Also, in the function writelines, in the line
printf("%s\n", lineptr[i]);
printf's %s format expects a pointer to a character, so that's what lineptr[i] had better be.
Note that writelines prints a newline after each line, since newlines were stripped out of the input
lines by readlines.
Don't worry too much about the discussion at the bottom of page 109. We saw in section 5.3 that due to
the ``strong relationship'' between pointers and arrays, it is always possible to manipulate an array using
pointer-like notation, and to manipulate a pointer using array-like notation. Since lineptr is an array,
it is possible to manipulate it using pointer-like notation, but since what it's an array of is other pointers,
it can start to get a bit confusing. Though many programmers do write things like
printf("%s\n", *lineptr++);
and though this is correct code, and though one should probably understand it to have a 100% complete
understanding of C, I've decided that code like that is just a bit too hard to follow, and I'd always write
(perhaps more pedestrian and mundane) things like
printf("%s\n", lineptr[i]);
or
printf("%s\n", lineptr[i++]);
page 110
Since I didn't ask you to follow the qsort example in section 4.10 in complete detail, I won't ask you to
work through this one completely, either. But if you compare the code here to the code on pages 87-88,
you will see that the only significant differences are that the variables and arrays containing the things
being sorted have been changed from int to char * (pointer-to-char), and the comparison

http://www.eskimo.com/~scs/cclass/krnotes/sx8f.html (4 of 5) [22/07/2003 5:09:34 PM]

section 5.6: Pointer Arrays; Pointers to Pointers

if (v[i] < v[left])


has been changed to
if (strcmp(v[i], v[left]) < 0)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8f.html (5 of 5) [22/07/2003 5:09:34 PM]

section 5.7: Multi-dimensional Arrays

section 5.7: Multi-dimensional Arrays


page 111
The month_day function is another example of a function which simulates having multiple return values by
using pointer parameters. month_day is declared as void, so it has no formal return value, but two of its
parameters, pmonth and pday, are pointers, and it fills in the locations pointed to by these two pointers with
the two values it wants to ``return.'' One line of the definition of month_day on page 111 is cut off in all
printings I have seen: it should read
void month_day(int year, int yearday, int *pmonth, int *pday)
As we've said, although any nonzero value is considered ``true'' in C, the built-in relational and Boolean
operators always ``return'' 0 or 1. Therefore, the line
int leap = year%4 == 0 && year%100 != 0 || year%400 == 0;
sets leap to 1 or 0 (``true'' or ``false'') depending on the condition
year%4 == 0 && year%100 != 0 || year%400 == 0
which is the condition for leap years in the Gregorian calendar. (It's a little-known fact that century years are not
leap years unless they are also divisible by 400. Thus, 2000 will be a leap year.) The 1/0 value that leap
receives is what the authors are referring to when they say that ``the arithmetic value of a logical expression...
can be used as a subscript of the array daytab.'' This line could also have been written
int leap;
if (year%4 == 0 && year%100 != 0 || year%400 == 0)
leap = 1;
else
leap = 0;
or
int leap = (year%4 == 0 && year%100 != 0 || year%400 == 0) ? 1 : 0;
page 112
The daytab array holds small integers (in the range 0-31), so it can legally be made an array of char, though
whether this is a legitimate use is a question of style.
Deep sentence:

http://www.eskimo.com/~scs/cclass/krnotes/sx8g.html (1 of 4) [22/07/2003 5:09:36 PM]

section 5.7: Multi-dimensional Arrays

In C, a two-dimensional array is really a one-dimensional array, each of whose elements is an


array.
Earlier we said that ``array-of-type is another type,'' and here we must believe it: since array-of-type is a type,
array-of-(array-of-type) is yet another type.
The statement that ``Elements are stored by rows, so the rightmost subscript, or column, varies fastest as
elements are accessed in storage order'' probably won't make much sense unless you've done a lot of work with
other languages, such as FORTRAN, which do have true multi-dimensional arrays. It's pretty arbitrary what you
call a ``row'' and what you call a ``column''; the most important thing to know is which subscript goes with
which dimension. If you have
int a[10][20];
then in the reference a[i][j], i can range from 0 to 9 and j can range from 0 to 19. In other words, you
might write
for (i = 0; i < 10; i++)
for (j = 0; j < 20; j++)
do something with a[i][j]
We also want to know what a actually is. Is it an array of 10 arrays, each of size 20, or is it an array of 20
arrays, each of size 10? There are other ways of convincing ourselves of the answer, but for now let's just say
that the ``closer'' dimensions are closer to what a is. Therefore, a is first an array of size 10, and what it's an
array of is arrays of 20 ints. This also tells us that if we ever refer to a[i] (without a second subscript), then
we're referring to just one of those 10 arrays (of size 20) in its entirety.
When we look back at the initialization of the daytab array on page 111, everything lines up. daytab is
defined as
char daytab[2][13]
and we can see from the initializer that there are two (sub)arrays, each of size 13. (We can also see that there is
some justification for saying that the first subscript refers to ``rows'' and the second to ``columns.'')
The authors illustrate one way of dealing with C's 0-based arrays when you have an algorithm that really wants
to treat an array as if it were 1-based. Here, rather than remembering to subtract one from the 1-based month
number each time, they chose to waste a ``column'' of the array, and declare it one larger than necessary, so that
they could refer to subscripts from [1] to [12].
One last note about the initialization of daytab: you may have seen code in other programming books that
kept an array of the cumulative days of all the months:
{0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365}

http://www.eskimo.com/~scs/cclass/krnotes/sx8g.html (2 of 4) [22/07/2003 5:09:36 PM]

section 5.7: Multi-dimensional Arrays

Precomputing an array like that might make things a tiny bit easier on the computer (it wouldn't have to loop
through the entire array each time, as it does in the day_of_year function), but it makes it considerably
harder to see what the numbers mean, and to verify that they are correct. The simple table of individual month
lengths is much clearer, and if the computer has to do a bit more grunge work, well, that's what computers are
for. As explained in another book co-authored by Brian Kernighan:
A cumulative table of days must be calculated by someone and checked by someone else. Since
few people are familiar with the number of days up to the end of a particular month, neither
writing nor checking is easy. But if instead we use a table of days per month, we can let the
computer count them for us. (``Let the machine do the dirty work.'')
The bottom of page 112 begins to get confusing. The ``number of rows'' of an array like daytab ``is
irrelevant'' when passed to a function such as the hypothetical f because the compiler doesn't need to know the
number of rows when calculating subscripts. It does need to know the number of columns or ``width,'' because
that's how it knows that the second element on the second row of a 10-column array is actually 12 cells past the
beginning of the array, which is essentially what it needs to know when it goes off and actually accesses the
array in memory. But it doesn't need to know how long the overall array is, as long as we promise not to run off
the end of it, and that's always up to us. (This is why we haven't specified the array sizes in the definitions of
functions such as getline on pages 29 and 69, or atoi on pages 43, 61, and 73, or readlines on page
109, although we did carry the array size as a separate argument to getline and readlines, to assist us in
our promise not to run off the end.)
The third version of f on page 112 comes about because of the ``gentle fiction'' involving array parameters. We
learned on page 99 that functions don't really receive arrays as parameters; they receive arrays (since any array
passed by the caller decayed immediately to a pointer). On page 39 we wrote a strlen function as
int strlen(char s[])
but on page 99 we rewrote it as
int strlen(char *s)
which is closer to the way the compiler sees the situation. (In fact, when we write int strlen(char
s[]), the compiler essentially rewrites it as int strlen(char *s) for us.) In the same way, a function
declared as
f(int daytab[][13])
can be rewritten by us (or if not, is rewritten by the compiler) to
f(int (*daytab)[13])
which declares the daytab parameter as a pointer-to-array-of-13-ints. Here we see two things: (1) the rewrite
which changes an array parameter to a pointer parameter happens only once (we end up with a pointer to an
http://www.eskimo.com/~scs/cclass/krnotes/sx8g.html (3 of 4) [22/07/2003 5:09:36 PM]

section 5.7: Multi-dimensional Arrays

array, not a pointer to a pointer), and (2) the syntax for pointers to arrays is a bit messy, because of some
required extra parentheses, as explained in the text.
If this seems obscure, don't worry about it too much; just declare functions with array parameters matching the
arrays you call them with, like
f(int daytab[2][13])
and let the compiler worry about the rewriting.
Deep sentence:
More generally, only the first dimension (subscript) of an array is free; all the others have to be
specified.
This just says what we said already: when declaring an array as a function parameter, you can leave off the first
dimension because it is the overall length and not knowing it causes no immediate problems (unless you
accidentally go off the end). But the compiler always needs to know the other dimensions, so that it knows how
the rown and columns line up.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8g.html (4 of 4) [22/07/2003 5:09:36 PM]

section 5.8: Initialization of Pointer Arrays

section 5.8: Initialization of Pointer Arrays


page 113
This section is short and sweet, and there are only two things I feel the need to comment on. The
sentence ``The characters of the i-th string are placed somewhere'' simply refers to the fact that string
literals always work that way (except when they're used as array initializers, as explained on page 104).
We don't really care where the characters are, as long as we can keep hold of a pointer to them.
The other thing to notice is that the month_name function does verify that its argument is valid. If it
didn't check n against the boundary values 1 and 12, what would happen if we called
month_name(123)?

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8h.html [22/07/2003 5:09:37 PM]

section 5.9: Pointers vs. Multi-dimensional Arrays

section 5.9: Pointers vs. Multi-dimensional Arrays


Actually, some people (and not just newcomers) are sometimes confused about the difference between a
one-dimensional array and a single pointer, too; moving to two-dimensional arrays, arrays of pointers,
and pointers to pointers only makes things worse. (But don't lose heart: if you pay attention and keep
your head screwed on straight, you should be able to keep the differences clearly in mind.)
The adjective ``syntactically'' in the paragraph at the bottom of the page is significant: after saying
int *b[10];
an immediate reference to b[3][4] would not be completely legal. It wouldn't be a syntax error or
anything, but when the compiler tried to fetch the third pointer and then the fourth integer pointed to, it
would go off into deep space, because there isn't a third pointer yet and it doesn't point anywhere.
You might want to draw a picture of the data structures that would result ``[a]ssuming that each element
of b does point to a twenty-element array,'' and verify that there are ``200 ints set aside, plus ten cells
for the pointers.'' (The picture will be similar to the one on the next page.)
Actually, I'm not sure if having rows of different lengths is the only important advantage of using a
pointer array. Another is that the size of the arrays (as we'll see later) can be decided at run-time; another
is that the pointers make certain manipulations easier (such as the sorting example we worked through in
section 5.6).
page 114
Do study the pictures on this page carefully, and make sure you understand the representations of the
name and aname arrays and how they differ. (You might want to refer back to the similar discussion of
pmessage and amessage on page 104 in section 5.5.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8i.html [22/07/2003 5:09:38 PM]

section 5.10: Command-line Arguments

section 5.10: Command-line Arguments


page 115
The picture at the top of page 115 doesn't quite match the declaration
char *argv[]
it's actually a picture of the situation declared by
char **argv
which is what main actually receives. (The array parameter declaration char *argv[] is rewritten by
the compiler to char **argv, in accordance with the discussion in sections 5.3 and 5.8.) Also, the ``0''
at the bottom of the array is just a representation of the null pointer which conventionally terminates the
argv array. (Normally, you'll never encounter the terminating null pointer, because if you think of
argv as an array of size argc, you'll never access beyond argv[argc-1].)
The loop
for (i = 1; i < argc; i++)
looks different from most loops we see in C (which either start at 0 and use <, or start at 1 and use <=).
The reason is that we're skipping argv[0], which contains the name of the program.
The expression
printf("%s%s", argv[i], (i < argc-1) ? " " : "");
is a little nicety to print a space after each word (to separate it from the next word) but not after the last
word. (The nicety is just that the code doesn't print an extra space at the end of the line.) It would also be
possible to fold in the following printf of the newline:
printf("%s%s", argv[i], (i < argc-1) ? " " : "\n");
As I mentioned in comment on the bottom of page 109, it's not necessary to write pointer-incrementing
code like
while(--argc > 0)
printf("%s%s", *++argv, (argc > 1) ? " " : "");

http://www.eskimo.com/~scs/cclass/krnotes/sx8j.html (1 of 5) [22/07/2003 5:09:40 PM]

section 5.10: Command-line Arguments

if you don't feel comfortable with it. I used to try write code like this, because it seemed to be what
everybody else did, but it never sat well, and it was always just a bit too hard to write and to prove
correct. I've reverted to simple, obvious loops like
int argi;
char *sep = "";
for (argi = 1; argi < argc; argi++) {
printf("%s%s", sep, argv[argi]);
sep = " ";
}
printf("\n");
Often, it's handy to have the original argc and argv around later, anyway. (This loop also shows
another way of handling space separators.)
page 116
Page 116 shows a simple improvement on the matching-lines program first presented on page 69; page
117 adds a few more improvements. The differences between page 69 and page 116 are that the pattern is
read from the command line, and strstr is used instead of strindex. The difference between page
116 and page 117 is the handling of the -n and -x options. (The next obvious improvement, which we're
not quite in a position to make yet, is to allow a file name to be specified on the command line, rather
than always reading from the standard input.)
page 117
Several aspects of this code deserve note.
The line
while (c = *++argv[0])
is not in error. (In isolation, it might look like an example of the classic error of accidentally writing =
instead of == in a comparison.) What it's actually doing is another version of a combined set-and-test: it
assigns the next character pointed to by argv[0] to c, and compares it against '\0'. You can't see the
comparison against '\0', because it's implicit in the usual interpretation of a nonzero expression as
``true.'' An explicit test would look like this:
while ((c = *++argv[0]) != '\0')

http://www.eskimo.com/~scs/cclass/krnotes/sx8j.html (2 of 5) [22/07/2003 5:09:40 PM]

section 5.10: Command-line Arguments

argv[0] is a pointer to a character in a string; ++argv[0] increments that pointer to point to the next
character in the string; and *++argv[0] increments the pointer while returning the next character
pointed to. argv[0] is not the first string on the command line, but rather whichever one we're looking
at now, since elsewhere in the loop we increment argv itself.
Some of the extra complexity in this loop is to make sure that it can handle both
-x -n
and
-xn
In pseudocode, the option-parsing loop is
for ( each word on the command line )
if ( it begins with '-' )
for ( each character c in that word )
switch ( c )
...
For comparison, here is another way of writing effectively the same loop:
int argi;
char *p;
for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)
for (p = &argv[argi][1]; *p != '\0'; p++)
switch (*p) {
case 'x':
...
This uses array notation to access the words on the command line, but pointer notation to access the
characters within a word (more specifically, a word that begins with '-'). We could also use array
notation for both:
int argi, chari;
for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)
for (chari = 1; argv[argi][chari] != '\0'; chari++)

http://www.eskimo.com/~scs/cclass/krnotes/sx8j.html (3 of 5) [22/07/2003 5:09:40 PM]

section 5.10: Command-line Arguments

switch (argv[argi][chari]) {
case 'x':
...
In either case, the inner, character loop starts at the second character (index [1]), not the first, because
the first character (index [0]) is the '-'.
It's easy to see how the -n option is implemented. If -n is seen, the number flag is set to 1 (a.k.a.
``true''), and later, in the line-matching loop, each time a line is printed, if the number flag is true, the
line number is printed first. It's harder to see how -x works. An except flag is set to 1 if -x is present,
but how is except used? It's buried down there in the line
if ((strstr(line, *argv) != NULL) != except)
What does that mean? The subexpression
(strstr(line, *argv) != NULL)
is 1 if the line contains the pattern, and 0 if it does not. except is 0 if we should print matching lines,
and 1 if we should print non-matching lines. What we've actually implemented here is an ``exclusive
OR,'' which is ``if A or B but not both.'' Other ways of writing this would be
int matched = (strstr(line, *argv) != NULL);
if (matched && !except || !matched && except) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
or
int matched = (strstr(line, *argv) != NULL);
if (except ? !matched : matched) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
or

http://www.eskimo.com/~scs/cclass/krnotes/sx8j.html (4 of 5) [22/07/2003 5:09:40 PM]

section 5.10: Command-line Arguments

int matched = (strstr(line, *argv) != NULL);


if (!except) {
if (matched) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
}
else {
if (!matched) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
}
There's clearly a tradeoff: the last version is in some sense the most clear (and the most verbose), but it
ends up repeating the line-number printing and any other processing which must be done for found lines.
Therefore, the compressed, perhaps slightly more cryptic forms are better: some day, it's a virtual
certainty that more processing will be added for printed lines (for example, if we're searching multiple
files, we'll want to print the filename for matching lines, too), and if the printing is duplicated in two
places, it's far too likely that we'll overlook that fact and add the new code in only one place.
One last point on the pattern-matching program: it's probably clearer to declare a pointer variable
char *pat;
and set it to the word from argv to be used as the search pattern (argv[1] or *argv, depending on
whether we're looking at page 116 or 117), and then use that in the call to strstr:
if (strstr(line, pat) != NULL ...

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx8j.html (5 of 5) [22/07/2003 5:09:40 PM]

Chapter 6: Structures

Chapter 6: Structures
page 127
There's one other piece of motivation behind structures that it's useful to discuss. Suppose we didn't have
structures (or didn't know what they were or how to use them). Suppose we wanted to implement payroll
records. We might set up a bunch of parallel arrays, holding the names, mailing addresses, social security
numbers, and salaries of all of our employees:
char *name[100];
char *address[100];
long ssn[100];
float salary[100];
The idea here is that name[0], address[0], ssn[0], and salary[0] would describe one
employee, array slots with subscript [1] would describe the second employee, etc. There are at least two
problems with this scheme: first, if we someday want to handle more than 100 employees, we have to
remember to change the size of several arrays. (Using a symbolic constant like
#define MAXEMPLOYEES 100
would certainly help.)
More importantly, there would be no easy way to pass around all the information associated with a single
employee. Suppose we wanted to write the function print_employee, which will print all the
information associated with a particular employee. What arguments would this function take? We could
pass it the index to use to retrieve the information from the arrays, but that would mean that all of the
arrays would have to be global. We could pass the function an individual name, address, SSN, and salary,
but that would mean that whenever we added a new piece of information to the database (perhaps next
week we'll want to keep track of employee's shoe sizes), we would have to add another argument to the
print_employee function, and change all of the calls. (Pretty soon, the number of arguments to the
print_employee function would become unwieldy.) What we'd really like is a way to encapsulate all
of the data about a single employee into a single data structure, so we could just pass that data structure
around.
The right solution to this problem, in languages such as C which support the idea, is to define a structure
describing an employee. We can make one array of these structures to describe all the employees, and we
can pass around single instances of the structure where they're needed.
section 6.1: Basics of Structures

http://www.eskimo.com/~scs/cclass/krnotes/sx9.html (1 of 2) [22/07/2003 5:09:41 PM]

Chapter 6: Structures

section 6.2: Structures and Functions


section 6.3: Arrays of Structures
The sizeof operator
section 6.4: Pointers to Structures
section 6.5: Self-referential Structures

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9.html (2 of 2) [22/07/2003 5:09:41 PM]

section 6.1: Basics of Structures

section 6.1: Basics of Structures


Don't get too excited about the prospect of doing graphics in C--there's no one standard or portable way
of doing it, so the points and rectangles we're going to be discussing must remain abstract for now (we
won't be able to plot them out).
page 128
To summarize the syntax of structure declarations: A structure declaration has about four parts, most of
them optional: the keyword struct, a structure tag (optional), a brace-enclosed list of declarations for
the members (also called ``fields'' or ``components'') of the structure (optional), and a list of variables of
the new structure type (optional). The arrangement looks like this:
struct tag {
member declarations
} declared variables ;
Normally, a structure declaration defines either a tag and the members, or some variables based on an
existing tag, or sometimes all three at once. That is, we might first declare a structure:
struct point {
int x;
int y;
};

/* 1 */

and then some variables of that type:


struct point here, there;

/* 2 */

Or, we could combine the two:


struct point {
int x;
int y;
} here, there;

/* 3 */

The list of members (if present) describes what the new structure ``looks like inside.'' The list of
variables (if present) is (obviously) the list of variables of this new type which we're defining and which
the rest of the program will use. The tag (if present) is just an arbitrary name for the structure type itself
(not for any variable we're defining). The tag is used to associate a structure definition (as in fragment 1)
with a later declaration of variables of that same type (as in fragment 2).
http://www.eskimo.com/~scs/cclass/krnotes/sx9a.html (1 of 2) [22/07/2003 5:09:43 PM]

section 6.1: Basics of Structures

One thing to beware of: when you declare the members of a structure without defining any variables,
always remember the trailing semicolon, as shown in fragment 1 above. (If you forget it, the compiler
will wait until the next thing it finds in your source file, and try to define that as a variable or function of
the structure type.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9a.html (2 of 2) [22/07/2003 5:09:43 PM]

section 6.2: Structures and Functions

section 6.2: Structures and Functions


In this section, we'll begin playing with structures more or less as if they were ordinary variables such as
we've been using all along (which they more or less are). As we'll see, we can declare variables of
structure type, declare functions which accept structures as parameters and return them, declare pointers
to structures, take the address of a structure (creating a pointer-to-structure) with &, and assign structures.
Notice that when we declare something as ``a structure type,'' we always have to say which structure
type, usually by using the struct tag. If we've set up a ``point'' structure as above, then to declare a
variable of this type, we say
struct point thepoint;
Both
struct thepoint;

/* WRONG */

point thepoint;

/* WRONG */

and

would be errors.
The above list of things the language lets us do with structures lets us keep them and move them around,
but there isn't really anything defined by the language that we can do with structures. It's up to us to
define any operations on structures, usually by writing functions. (The addpoint function on page 130
is a good example. It will make a bit more sense if you think of it as adding not isolated points, but rather
vectors. [We can't add Seattle plus Los Angeles, but we could add (two miles south, one mile east) plus
(one mile east, two miles north).])
page 131
As an aside, how safe are the min() and max() macros defined at the top of page 131, with respect to
the criteria discussed on pages 15 and 16 of the notes on section 4.11.2 (page 90 in the text)?
The precise meaning of the ``shorthand'' -> operator is that sp->m is, by definition, equivalent to
(*sp).m, for any structure pointer sp and member m of the pointed-to structure.

http://www.eskimo.com/~scs/cclass/krnotes/sx9b.html (1 of 2) [22/07/2003 5:09:44 PM]

section 6.2: Structures and Functions

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9b.html (2 of 2) [22/07/2003 5:09:44 PM]

section 6.3: Arrays of Structures

section 6.3: Arrays of Structures


page 132
In the previous section we introduced pointers to structures and functions returning structures without
fanfare. But now let's pay attention to the fact that structures fit the pattern of the other types: a structure
is a type, so we can have pointer-to-struct, array-of-struct, and function-returning-struct. (We can also
say, following our ongoing pattern of recursive definitions, that for any list of types t1, t2, t3, ..., we can
make a new type
struct tag {
t1 m1;
t2 m2;
t3 m3;
...
};
which is a structure composed of members of those types.)
page 134
We glossed over the binary search routine on page 58 in section 3.3, so we can skip the details of this
one, too. This illustrates another benefit of breaking functionality out into functions, though: as long as
you know what a function does, you can understand a program that it's in without necessarily
understanding all of it. In this case, binsearch searches an array tab, containing n cells of type
struct key, looking for one whose word field matches the parameter word. If it finds a matching
cell, it returns its index in the array; otherwise, it returns -1.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9c.html [22/07/2003 5:09:45 PM]

The <cw>sizeof</> operator

The sizeof operator


page 135
This may seem like an excessively roundabout or low-level way of finding the number of elements in an
array, but it is the way it's done in C, and it's perfectly safe and straightforward once you get used to it. (I
would, however, be hard-pressed to defend against the accusation that it's a bit too low-level.)
Note that sizeof works on both type names (things like int, char *, struct key, etc.) and
variables (strictly speaking, any expression). Parentheses are required when you're using sizeof with a
type name and optional when you're using it with a variable or expression (just like return), but it's
safe to just always use parentheses.
sizeof returns the size counted in bytes, where the C definition of ``byte'' is ``the size of a char.'' In
other words, sizeof(char) is always 1. (It turns out that it's not necessarily the case, though, that a
byte or a char is 8 bits.) When we start doing our own dynamic memory allocation (which will be pretty
soon), we'll always be needing to know the size of things so that we can allocate space for them, so it's
just as well that we're meeting and getting used to the sizeof operator now.
The sentence ``But the expression in the #define is not evaluated by the preprocessor'' means that, as
far as the preprocessor is concerned, the ``value'' of the macro NKEYS (like the value of any macro) is
just a string of characters like
(sizeof(keytab) / sizeof keytab[0])
which it replaces wherever NKEYS is used, and which will then be evaluated by the compiler as usual, so
it doesn't matter that the preprocessor wouldn't have known how to deal with the sizeof operator, or
how big the keytab array or a struct key were.
A third way of defining NKEYS would be
#define NKEYS (sizeof(keytab) / sizeof *keytab)
Note that the definition of NKEYS depends on the definition of the keytab array (which appears on
page 133), and both of them will have to precede the use of NKEYS in main on page 134. (Also, all three
will have to be in the same source file, unless other steps are taken.)
page 136
Notice that getword has a lot in common with the getop function of the calculator example (section
4.3, page 80).
http://www.eskimo.com/~scs/cclass/krnotes/sx9d.html (1 of 2) [22/07/2003 5:09:47 PM]

The <cw>sizeof</> operator

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9d.html (2 of 2) [22/07/2003 5:09:47 PM]

section 6.4: Pointers to Structures

section 6.4: Pointers to Structures


The bulk of this section illustrates how to rewrite the binsearch function (which we've already been
glossing over) in terms of pointers instead of arrays (an exercise which we've been downplaying). There
are a few important points towards the end of the section, however.
page 138
When we began talking about pointers and arrays, we said that it was important never to access outside
of the defined and allocated bounds of an array, either with an out-of-range index or an out-of-bounds
pointer. There is one exception: you may compute (but not access, or ``dereference'') a pointer to the
imaginary element which sits one past the end of an array. Therefore, a common idiom for accessing an
array using a pointer looks like
int a[10];
int *ip;
for (ip = &a[0]; ip < &a[10]; ip++)
...
or
int a[10];
int *endp = &a[10];
int *ip;
for (ip = a; ip < endp; ip++)
...
The element a[10] does not exist (the allocated elements run from [0] to [9]), but we may compute
the pointer &a[10], and use it in expressions like ip < &a[10] and endp = &a[10].
Deep sentence:
Don't assume, however, that the size of a structure is the sum of the sizes of its members.
If this isn't the sort of thing you'd be likely to assume, you don't have to remember the reason, which is
mildly esoteric (having to do with memory alignment requirements).

http://www.eskimo.com/~scs/cclass/krnotes/sx9e.html (1 of 2) [22/07/2003 5:09:48 PM]

section 6.4: Pointers to Structures

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9e.html (2 of 2) [22/07/2003 5:09:48 PM]

section 6.5: Self-referential Structures

section 6.5: Self-referential Structures


page 139
In section 4.10, we met recursive functions. Now, we're going to meet recursively-defined data
structures. Don't throw up your hands: the two should be easier to understand in combination.
The mention of ``quadratic running time'' is tangential, but it's a useful-enough concept that it might be
worth a bit of explanation. If we were keeping a simple list (``linear array'') in order, each time we had a
new word to install, we'd have to scan over the old list. On average, we'd have to scan over half the old
list. (Even if we used binary search to find the position, we'd still have to move some part of the list to
insert it.) Therefore, the more words that were in the list, the longer it would take to install each new
word. It turns out that the running time of this linear insertion algorithm would grow as the square of the
number of items in the list (that's what ``quadratically'' means). If you doubled the size of the list, the
running time would be four times longer. An algorithm like this may seem to work fine when you run it
on small test inputs, but then when you run it on a real problem consisting of a thousand or ten thousand
or a million words, it bogs down hopelessly.
A binary tree is a great way to keep a set of words (or other values) in sorted order. The definition of a
binary tree is simply that, at each node, all items in the left subtree are less than the item at that node, and
all items in the right subtree are greater. (Note that the top item in the left subtree is not necessarily
immediately less than the item at that node or anything; the immediately-preceding item is merely down
in the left subtree somewhere, along with all the rest of the preceding items. In the ``now is the time''
example, the word ``now'' is neither the first, last, nor middle word in the sorted list; it's merely the word
that happened to be installed first. The word preceding it is ``men''; the word following it is ``of.'' The
first word in the sorted list is ``aid,'' and the last word is ``to.'')
The binary tree may not immediately seem like much of an improvement over the linear array--we still
have to scan over part of the existing tree in order to insert each new word, and the time to add each new
word will get longer as there are more words in the tree. But, if you do the math, it turns out that on
average you have to scan over a much smaller part of the tree, and it's not a simple fraction like half or
one quarter, but rather the log (base two) of the number of items already in the tree. Furthermore,
inserting a new node doesn't involve reshuffling any old data. For these reasons, the running time of
binary tree insertion doesn't slow down nearly as badly as linear insertion does.
By the way, the reason that the word ``binary'' comes up so often is because it simply means ``two.'' The
binary number system has two digits (0 and 1); a binary operator has two operands; binary search
eliminates half (one over two) of the possibilities at each step; a binary tree has two subtrees at each
node.
One other bit of nomenclature: the word ``node'' simply refers to one of the structures in a set of

http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (1 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

structures that is linked together in some way, and as we're about to see, we're going to use a set of linked
structures to implement a binary tree. Just as we talk about a ``cell'' or ``element'' of an array, we talk
about a ``node'' in a tree or linked list.
When we look at the description of the algorithm for finding out whether a word is already in the tree, we
may begin to see why the binary tree is more efficient than the linear list. When searching through a
linear list, each time we discard a value that's not the one we're looking for, we've only discarded that one
value; we still have the entire rest of the list to search. In a binary tree, however, whenever we move
down the tree, we've just eliminated half of the tree. (We might say that a binary tree is a data structure
which makes binary search automatic.) Consider guessing a number between one and a hundred by
asking ``Is it 1? Is it 2? Is it 3?'' etc., versus asking ``Is it less than 50? Is it greater than 25? Is it less than
12?''
page 140
Make sure you're comfortable with the idea of a structure which contains pointers to other instances of
itself. If you draw some little box-and-arrow diagrams for a binary tree, the idea should fall into place
easily. (As the authors point out, what would be impossible would be for a structure to contain not a
pointer but rather another entire instance of itself, because that instance would contain another, and
another, and the structure would be infinitely big.)
page 141
Note that addtree accepts as an argument the tree to be added to, and returns a pointer to a tree,
because it may have to modify the tree in the process of adding a new node to it. If it doesn't have to
modify the tree (more precisely, if it doesn't have to modify the top or root of the tree) it returns the same
pointer it was handed.
Another thing to note is the technique used to mark the edges or ``leaves'' of the tree. We said that a null
pointer was a special pointer value guaranteed not to point anywhere, and it is therefore an excellent
marker to use when a left or right subtree does not exist. Whenever a new node is built, addtree
initializes both subtree pointers (``children'') to null pointers. Later, another chain of calls to addtree
may replace one or the other of these with a new subtree. (Eventually, both might be replaced.)
If you don't completely see how addtree works, leave it for a moment and look at treeprint on the
next page first.
The bottom of page 141 discusses a tremendously important issue: memory allocation. Although we only
have one copy of the addtree function (which may call itself recursively many times), by the time
we're done, we'll have many instances of the tnode structure (one for each unique word in the input).
Therefore, we have to arrange somehow that memory for these multiple instances is properly allocated.
We can't use a local variable of type struct tnode in addtree, because local variables disappear
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (2 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

when their containing function returns. We can't use a static variable of type struct tnode in
addtree, or a global variable of type struct tnode, because then we'd have only one node in the
whole program, and we need many.
What we need is some brand-new memory. Furthermore, we have to arrange it so that each time
addtree builds a brand-new node, it does so in another new piece of brand-new memory. Since each
node contains a pointer (char *) to a string, the memory for that string has to be dynamically allocated,
too. (If we didn't allocate memory for each new string, all the strings would end up being stored in the
word array in main on page 140, and they'd step all over each other, and we'd only be able to see the
last word we read.)
For the moment, we defer the questions of exactly where this brand-new memory is to come from by
defining two functions to do it. talloc is going to return a (pointer to a) brand-new piece of memory
suitable for holding a struct tnode, and strdup is going to return a (pointer to a) brand-new piece
of memory containing a copy of a string.
page 142
treeprint is probably the cleanest, simplest recursive function there is. If you've been putting off
getting comfortable with recursive functions, now is the time.
Suppose it's our job to print a binary tree: we've just been handed a pointer to the base (root) of the tree.
What do we do? The only node we've got easy access to is the root node, but as we saw, that's not the
first or the last element to print or anything; it's generally a random node somewhere in the middle of the
eventual sorted list (distinguished only by the fact that it happened to be inserted first). The node that
needs to be printed first is buried somewhere down in the left subtree, and the node to print just before
the node we've got easy access to is buried somewhere else down in the left subtree, and the node to print
next (after the one we've got) is buried somewhere down in the right subtree. In fact, everything down in
the left subtree is to be printed before the node we've got, and everything down in the right subtree is to
be printed after. A pseudocode description of our task, therefore, might be
print the left subtree (in order)
print the node we're at
print the right subtree (in order)
How can we print the left subtree, in order? The left subtree is, in general, another tree, so printing it out
sounds about as hard as printing an entire tree, which is what we were supposed to do. In fact, it's exactly
as hard: it's the same problem. Are we going in circles? Are we getting anywhere? Yes, we are: the left
subtree, even though it is still a tree, is at least smaller than the full tree we started with. The same is true
of the right subtree. Therefore, we can use a recursive call to do the hard work of printing the subtrees,
and all we have to do is the easy part: print the node we're at. The fact that the subtrees are smaller gives
us the leverage we need to make a recursive algorithm work.
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (3 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

In any recursive function, it is (obviously) important to terminate the recursion, that is, to make sure that
the function doesn't recursively call itself forever. In the case of binary trees, when you reach a ``leaf'' of
the tree (more precisely, when the left or right subtree is a null pointer), there's nothing more to visit, so
the recursion can stop. We can test for this in two different ways, either before or after we make the
``last'' recursive call:
void treeprint(struct tnode *p)
{
if(p->left != NULL)
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
if(p->right != NULL)
treeprint(p->right);
}
or
void treeprint(struct tnode *p)
{
if(p == NULL)
return;
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
treeprint(p->right);
}
Sometimes, there's little difference between one approach and the other. Here, though, the second
approach (which is equivalent to the code on page 142) has a distinct advantage: it will work even if the
very first call is on an empty tree (in this case, if there were no words in the input). As we mentioned
earlier, it's extremely nice if programs work well at their boundary conditions, even if we don't think
those conditions are likely to occur.
(One more thing to notice is that it's quite possible for a node to have a left subtree but not a right, or vice
versa; one example is the node labeled ``of'' in the tree on page 139.)
Another impressive thing about a recursive treeprint function is that it's not just a way of writing it,
or a nifty way of writing it; it's really the only way of writing it. You might try to figure out how to write
a nonrecursive version. Once you've printed something down in the left subtree, how do you know where
to go back up to? Our struct tnode only has pointers down the tree, there aren't any pointers back to
the ``parent'' of each node. If you write a nonrecursive version, you have to keep track of how you got to
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (4 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

where you are, and it's not enough to keep track of the parent of the node you're at; you have to keep a
stack of all the nodes you've passed down through. When you write a recursive version, on the other
hand, the normal function-call stack essentially keeps track of all this for you.
We now return to the problem of dynamic memory allocation. The basic approach builds on something
we've been seeing glimpses of for a few chapters now: we use a general-purpose function which returns a
pointer to a block of n bytes of memory. (The authors presented a primitive version of such a function in
section 5.4, and we used it in the sorting program in section 5.6.) Our problem is then reduced to (1)
remembering to call this allocation function when we need to, and (2) figuring out how many bytes we
need. Problem 1 is stubborn, but problem 2 is solved by the sizeof operator we met in section 6.3.
You don't need to worry about all the details of the ``digression on a problem related to storage
allocators.'' The vast majority of the time, this problem is taken care of for you, because you use the
system library function malloc.
The problem of malloc's return type is not quite as bad as the authors make it out to be. In ANSI C, the
void * type is a ``generic'' pointer type, specifically intended to be used where you need a pointer
which can be a pointer to any data type. Since void * is never a pointer to anything by itself, but is
always intended to be converted (``coerced'') into some other type, it turns out that a cast is not strictly
required: in code like
struct tnode *tp = malloc(sizeof(struct tnode));
or
return malloc(sizeof(struct tnode));
the compiler is willing to convert the pointer types implicitly, without warning you and without requiring
you to insert explicit casts. (If you feel more comfortable with the casts, though, you're welcome to leave
them in.)
page 143
strdup is a handy little function that does two things: it allocates enough memory for one of your
strings, and it copies your string to the new memory, returning a pointer to it. (It encapsulates a pattern
which we first saw in the readlines function on page 109 in section 5.6.) Note the +1 in the call to
malloc! Accidentally calling malloc(strlen(s)) is an easy but serious mistake.
As we mentioned at the beginning of chapter 5, memory allocation can be hard to get right, and is at the
root of many difficulties and bugs in many C programs. Here are some rules and other things to
remember:
1. Make sure you know where things are allocated, either by the compiler or by you. Watch out for
things like the local line array we've been tending to use with getline, and the local word
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (5 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

array on page 140. When a function writes to an array or a pointer supplied by the caller, it
depends on the caller to have allocated storage correctly. When you're the caller, make sure you
pass a valid pointer! Make sure you understand why
char *ptr;
getline(ptr, 100);

2.

3.

4.
5.

6.

is wrong and can't work. (For one thing: what does that 100 mean? If getline is only allowed
to read at most 100 characters, where have we allocated those 100 characters that getline is not
allowed to write to more of than?)
Be aware of any situations where a single array or data structure is used to store multiple different
things, in succession. Think again about the local line array we've been tending to use with
getline, and the local word array on page 140. These arrays are overwritten with each new
line, word, etc., so if you need to keep all of the lines or words around, you must copy them
immediately to allocated memory (as the line-sorting program on pages 108-9 in section 5.6 did,
but as the longest line program on page 29 in section 1.9 and the pattern-matching programs on
page 69 in section 4.1 and pages 116-7 in section 5.10 did not have to do).
Make sure you allocate enough memory! If you allocate memory for an array of 10 things, don't
accidentally store 11 things in it. If you have a string that's 10 characters long, make sure you
always allocate 11 characters for it (including one for the terminating '\0').
When you free (deallocate) memory, make sure that you don't have any pointers lying around
which still point to it (or if you do, make sure not to use them any more).
Always check the return value from memory-allocation functions. Memory is never infinite:
sooner or later, you will run out of memory, and allocation functions generally return a null
pointer when this happens.
When you're not using dynamically-allocated memory any more, do try to free it, if it's convenient
to do so and the program's not just about to exit. Otherwise, you may eventually have so much
memory allocated to stuff you're not using any more that there's no more memory left for new
stuff you need to allocate. (However, on all but a few broken systems, all memory is
automatically and definitively returned to the operating system when your program exits, so if one
of your programs doesn't free some memory, you shouldn't have to worry that it's wasted forever.)

Unfortunately, checking the return values from memory allocation functions (point 5 above) requires a
few more lines of code, so it is often left out of sample code in textbooks, including this one. Here are
versions of main and addtree for the word-counting program (pages 140-1 in the text) which do
check for out-of-memory conditions:
/* word frequency count */
main()
{
struct tnode *root;
char word[MAXWORD];

http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (6 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

root = NULL;
while (getword(word, MAXWORD) != EOF) {
if (isalpha(word[0])) {
root = addtree(root, word);
if(root == NULL) {
printf("out of memory\n");
return 1;
}
}
}
treeprint(root);
return 0;
}
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;
if (p == NULL) {
/* a new word has arrived */
p = talloc();
/* make a new node */
if (p == NULL)
return NULL;
p->word = strdup(w);
if (p->word == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;
/* repeated word */
else if (cond < 0) {
/* less than: into left subtree */
p->left = addtree(p->left, w);
if(p->left == NULL)
return NULL;
}
else {
/* greater than: into right subtree */
p->right = addtree(p->right, w);
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (7 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

if(p->right == NULL)
return NULL;
}
return p;
}
In practice, many programmers would collapse the calls and tests:
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;
if (p == NULL) {
/* a new word has arrived */
if ((p = talloc()) == NULL)
return NULL;
if ((p->word = strdup(w)) == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;
/* repeated word */
else if (cond < 0) {
/* less than: into left subtree */
if ((p->left = addtree(p->left, w)) == NULL)
return NULL;
}
else {
/* greater than: into right subtree */
if ((p->right = addtree(p->right, w)) == NULL)
return NULL;
}
return p;
}

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (8 of 9) [22/07/2003 5:09:51 PM]

section 6.5: Self-referential Structures

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (9 of 9) [22/07/2003 5:09:51 PM]

Chapter 7: Input and Output

Chapter 7: Input and Output


page 151
By ``Input and output facilities are not part of the C language itself,'' we mean that things like printf
are just function calls like any other. C has no built-in input or output statements. For our purposes, the
implications of this fact--that I/O is not built in--is mainly that the compiler may not do as much
checking as we might like it to. If we accidentally write
double d = 1.23;
printf("%d\n", d);
the compiler says, ``Hmm, a function named printf is being called with a string and a double. Okay
by me.'' The compiler does not (and, in general, could not even if it wanted to) notice that the %d format
requires an int.
Although the title of this chapter is ``Input and Output,'' it appears that we'll also be meeting a few other
routines from the standard library.
If you start to do any serious programming on a particular system, you'll undoubtedly discover that it has
a number of more specialized input/output (and other system-related) routines available, which promise
better performance or nicer functionality than the pedestrian routines of C's standard library. You should
resist the temptation to use these nonstandard routines. Because the standard library routines are defined
precisely and ``exist in compatible form on any system where C exists,'' there are some real advantages
to using them. (On the other hand, when you need to do something which C's standard library routines
don't provide, you'll generally turn to your machine's system-specific routines right away, as they may be
your only choice. One common example is when you'd like to read one character immediately, without
waiting for the RETURN key. How you do that depends on what system you're using; it is not defined by
C.)
section 7.1: Standard Input and Output
section 7.2: Formatted Output--Printf
section 7.3: Variable-length Argument Lists
section 7.4: Formatted Input--Scanf
section 7.5: File Access

http://www.eskimo.com/~scs/cclass/krnotes/sx10.html (1 of 2) [22/07/2003 5:09:53 PM]

Chapter 7: Input and Output

section 7.6: Error Handling--Stderr and Exit


section 7.7: Line Input and Output
section 7.8.1: String Operations
section 7.8.2: Character Class Testing and Conversion
section 7.8.3: Ungetc
section 7.8.4: Command Execution
section 7.8.5: Storage Management
section 7.8.6: Mathematical Functions
section 7.8.7: Random Number Generation

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10.html (2 of 2) [22/07/2003 5:09:53 PM]

section 7.1: Standard Input and Output

section 7.1: Standard Input and Output


Note that ``a text stream'' might refer to input (to the program) from the keyboard or output to the screen,
or input and output from files on disk. (For that matter, it can also refer to input and output from other
peripheral devices, or the network.)
Note that the stdio library generally does newline translation for you. If you know that lines are
terminated by a linefeed on Unix and a carriage return on the Macintosh and a carriage-return/linefeed
combination on MS-DOS, you don't have to worry about these things in C, because the line termination
will always appear to a C program to be a single '\n'. (That is, when reading, a single '\n' represents
the end of the line being read, and when writing, writing a '\n' causes the underlying system's actual
end-of-line representation to be written.)
pages 152-153
The ``lower'' program is an example of a filter: it reads its standard input, ``filters'' (that is, processes) it
in some way, and writes the result to its standard output. Filters are designed for (and are only really
useful under) a command-line interface such as the Unix shells or the MS-DOS command.com interface.
Obviously, you would rarely invoke a program like lower by itself, because you would have to type the
input text at it and you could only see the output ephemerally on your screen. To do any real work, you
would always redirect the input:
lower < inputfile
and perhaps the output:
lower < inputfile > outputfile
(notice that spaces may precede and follow the < and > characters). Or, a filter program like lower
might appear in a longer pipeline:
oneprogram | lower | anotherprogram
or
anotherprogram < inputfile | lower | thirdprogram > outputfile
Filters like these are not terribly useful, though, under a Graphical User Interface such as the Macintosh
or Microsoft Windows.

Read sequentially: prev next up top


http://www.eskimo.com/~scs/cclass/krnotes/sx10a.html (1 of 2) [22/07/2003 5:09:54 PM]

section 7.1: Standard Input and Output

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10a.html (2 of 2) [22/07/2003 5:09:54 PM]

section 7.2: Formatted Output -- Printf

section 7.2: Formatted Output -- Printf


pages 153-155
To summarize the important points of this section:

printf's output goes to the standard output, just like putchar.


Everything in printf's format string is either a plain character to be printed as-is, or a %specifier which generally causes one argument to be consumed, formatted, and printed.
(Occasionally, a single %-specifier consumes two or three arguments if the width or precision is *,
or zero arguments if the specifier is %%.)
There's a fairly long list of conversion specifiers; see the table on page 154.
Always be careful that the conversions you request (in the format string) match the arguments you
supply.
You can ``print'' to a string (instead of the standard output) with sprintf. (This is the usual way
of converting numbers to strings in C; the itoa function we were playing with in section 3.6 on
page 64 is nonstandard, and unnecessary.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10b.html [22/07/2003 5:09:55 PM]

section 7.3: Variable-length Argument Lists

section 7.3: Variable-length Argument Lists


This is an advanced section which you don't need to read.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10c.html [22/07/2003 5:09:57 PM]

section 7.4: Formatted Input -- Scanf

section 7.4: Formatted Input -- Scanf


page 157
Somehow we've managed to make it through six chapters without meeting scanf, which it turns out is
just as well.
In the examples in this book so far, all input (from the user, or otherwise) has been done with getchar
or getline. If we needed to input a number, we did things like
char line[MAXLINE];
int number;
getline(line, MAXLINE);
number = atoi(line);
Using scanf, we could ``simplify'' this to
int number;
scanf("%d", &number);
This simplification is convenient and superficially attractive, and it works, as far as it goes. The problem
is that scanf does not work well in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true of scanf: you can have
baffling problems if you try to intermix calls to scanf with calls to getchar or getline. Worse, it
turns out that scanf's error handling is inadequate for many purposes. It tells you whether a conversion
succeeded or not (more precisely, it tells you how many conversions succeeded), but it doesn't tell you
anything more than that (unless you ask very carefully). Like atoi and atof, scanf stops reading
characters when it's processing a %d or %f input and it finds a non-numeric character. Suppose you've
prompted the user to enter a number, and the user accidentally types the letter `x'. scanf might return 0,
indicating that it couldn't convert a number, but the unconvertable text (the `x') remains on the input
stream unless you figure out some other way to remove it.
For these reasons (and several others, which I won't bother to mention) it's generally recommended that
scanf not be used for unstructured input such as user prompts. It's much better to read entire lines with
something like getline (as we've been doing all along) and then process the line somehow. If the line
is supposed to be a single number, you can use atoi or atof to convert it. If the line has more
complicated structure, you can use sscanf (which we'll meet in a minute) to parse it. (It's better to use
sscanf than scanf because when sscanf fails, you have complete control over what you do next.
When scanf fails, on the other hand, you're at the mercy of where in the input stream it has left you.)
With that little diatribe against scanf out of the way, here are a few comments on individual points

http://www.eskimo.com/~scs/cclass/krnotes/sx10d.html (1 of 3) [22/07/2003 5:09:59 PM]

section 7.4: Formatted Input -- Scanf

made in section 7.4.


We've met a few functions (e.g. getline, month_day in section 5.7 on page 111) which return more
than one value; the way they do so is to accept a pointer argument that tells them where (in the caller) to
write the returned value. scanf is the epitome of such functions: it returns potentially many values (one
for each %-specifier in its format string), and for each value converted and returned, it needs a pointer
argument.
The statement on page 157 that ``blanks or tabs'' in the format string ``are ignored'' (which is repeated on
page 159) is a simplification: in actuality, a blank or tab (or newline; actually any whitespace) in the
format string causes scanf to skip whitespace (blanks, tabs, etc.) in the input stream.
A * character in a scanf conversion specifier means something completely different than it does for
printf: for scanf, it means to suppress assignment (i.e. for that conversion specifier, there isn't a
pointer in the argument list to receive the converted value, so the converted value is discarded). With
scanf, there is no direct way of taking a field width from the argument list, as * does for printf.
Conversion specifiers like %d and %f automatically skip leading whitespace while looking for something
to convert. This means that the format strings "%d %d" and "%d%d" act exactly the same--the
whitespace in the first format string causes whitespace to be skipped before the second %d, but the second
%d would have skipped that whitespace anyway. (Yet another scanf foible is that the innocuouslooking format string "%d\n" converts a number and then skips whitespace, which means that it will
gobble up not only a newline following the number it converts, but any number of newlines or
whitespace, and in fact it will keep reading until it finds a non-whitespace character, which it then won't
read. This sounds confusing, but so is scanf's behavior when given a format string like "%d\n". The
moral is simple: don't use trailing \n's in scanf format strings.)
page 158
Notice that, for scanf, the %e, %f, and %g formats are all the same, and signify conversion of a float
value (they accept a pointer argument of type float *). To convert a double, you need to use %le,
%lf, or %lg. (This is quite different from the printf family, which uses %e, %f, and %g for floats
and doubles, though all three request different formats. Furthermore, %le, %lf, and %lg are
technically incorrect for printf, though most compilers probably accept them.)
page 159
More precisely, the reason that you don't need to use a & with monthname is that an array, when it
appears in an expression like this, is automatically converted to a pointer.
The dual-format date conversion example in the middle of page 159 is a nice example of the advantages
of calling getline and then sscanf. At the beginning of this section, I said that ``when sscanf fails,
http://www.eskimo.com/~scs/cclass/krnotes/sx10d.html (2 of 3) [22/07/2003 5:09:59 PM]

section 7.4: Formatted Input -- Scanf

you have complete control over what you do next.'' Here, ``what you do next'' is try calling sscanf
again, on the very same input string (thus effectively backing up to the very beginning of it), using a
different format string, to try parsing the input a different way.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10d.html (3 of 3) [22/07/2003 5:09:59 PM]

section 7.5: File Access

section 7.5: File Access


page 160
We've come an amazingly long way without ever having to open a file (we've been relying exclusively
on those predefined standard input and output streams) but now it's time to take the plunge.
The concept of a file pointer is an important one. It would theoretically be possible to mention the name
of a file each time it was desired to read from or write to it. But such an approach would have a number
of drawbacks. Instead, the usual approach (and the one taken in C's stdio library) is that you mention the
name of the file once, at the time you open it. Thereafter, you use some little token--in this case, the file
pointer--which keeps track (both for your sake and the library's) of which file you're talking about.
Whenever you want to read from or write to one of the files you're working with, you identify that file by
using the file pointer you obtained from fopen when you opened the file. (It is possible to have several
files open, as long as you use distinct variables to store the file pointers.)
Not only do you not need to know the details of a FILE structure, you don't even need to know what the
``buffer'' is that the structure contains the location of.
In general, the only declaration you need for a file pointer is the declaration of the file pointer:
FILE *fp;
You should never need to type the line
FILE *fopen(char *name, char *mode);
because it's provided for you in <stdio.h>.
If you skipped section 6.7, you don't know about typedef, but don't worry. Just assume that FILE is a
type, like int, except one that is defined by <stdio.h> instead of being built into the language.
Furthermore, note that you will never be using variables of type FILE; you will always be using pointers
to this type, or FILE *.
A ``binary file'' is one which is treated as an arbitrary series of byte values, as opposed to a text file. We
won't be working with binary files, but if you ever do, remember to use fopen modes like "rb" and
"wb" when opening them.
page 161
We won't worry too much about error handling for now, but if you start writing production programs, it's
http://www.eskimo.com/~scs/cclass/krnotes/sx10e.html (1 of 2) [22/07/2003 5:10:00 PM]

section 7.5: File Access

something you'll want to learn about. It's extremely annoying for a program to say ``can't open file''
without saying why. (Some particularly unhelpful programs don't even tell you which file they couldn't
open.)
On this page we learn about four new functions, getc, putc, fprintf, and fscanf, which are just
like functions that we've already been using except that they let you specify a file pointer to tell them
which file (or other I/O stream) to read from or write to. (Note that for putc, the extra FILE *
argument comes last, while for fprintf and fscanf, it comes first.)
page 162
cat is about the most basic and important file-handling program there is (even if its name is a bit
obscure). The cat program on page 162 is a bit like the ``hello, world'' program on page 6--it may seem
trivial, but if you can get it to work, you're over the biggest first hurdle when it comes to handling files at
all.
Compare the cat program (and especially its filecopy function) to the file copying program on page
16 of section 1.5.1--cat is essentially the same program, except that it accepts filenames on the
command line.
Since the authors advise calling fclose in part to ``flush the buffer in which putc is collecting
output,'' you may wonder why the program at the top of the page does not call fclose on its output
stream. The reason can be found in the next sentence: an implicit fclose happens automatically for any
streams which remain open when the program exits normally.
In general, it's a good idea to close any streams you open, but not to close the preopened streams such as
stdin and stdout. (Since ``the system'' opened them for you as your program was starting up, it's
appropriate to let it close them for you as your program exits.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10e.html (2 of 2) [22/07/2003 5:10:00 PM]

section 7.6: Error Handling -- Stderr and Exit

section 7.6: Error Handling -- Stderr and Exit


page 163
stdout and stderr are both predefined output streams; for our purposes, the only difference between
them is that stderr is not likely to be redirected by the user, so the error messages printed to stderr
will always appear on the screen, where they can be seen.
page 164
The cryptic note about ``a pattern-matching program'' simply means that if you want to search the source
code of a program for all the exit status values it can return, ``exit'' might be an easier string to search for
than ``return.'' (Every call to exit represents an exit from the program, but not every return statement
does.)
The feof and ferror functions can be used to check for error conditions more carefully. In general,
input routines (such as getchar and getline) return some special value to tell you that they couldn't
read any more. Often, this value is EOF, reinforcing the notion that the only possible reason they couldn't
read any more was because end-of-file had been reached. However, it's also possible that there was a
read error, and you can call feof or ferror to determine whether this was the case. On the output
side, though the output routines generally do return an error indication, few programs bother to check the
return values from every call to functions such as putchar and printf. One way to check for output
errors, without having to check the return value of every function, is to call ferror on the output
stream (which might be stdout) at key points.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10f.html [22/07/2003 5:10:01 PM]

section 7.7: Line Input and Output

section 7.7: Line Input and Output


pages 164-165
To summarize, puts is like fputs except that the stream is assumed to be the standard output
(stdout), and a newline ('\n') is automatically appended. gets is like fgets except that the stream
is assumed to be stdin, and the newline ('\n') is deleted, and there's no way to specify the maximum
line length. This last fact means that you almost never want to use gets at all: since you can't tell it how
big the array it's to read into is, there's no way to guarantee that some unexpectedly-long input line won't
overflow the array, with dire results. (When discussing the drawbacks of gets, it's customary to point
out that the ``Internet worm,'' a program that wreaked havoc in 1988 by breaking into computers all over
the net, was able to do so in part because a key network utility on many Unix systems used gets, and
the worm was able to overflow the buffer in a particularly low, cunning way, with the dire result that the
worm achieved superuser access to the attacked machine.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10g.html [22/07/2003 5:10:03 PM]

section 7.8.1: String Operations

section 7.8.1: String Operations


page 166
One thing to beware of is that strcpy's arguments--more precisely, the strings pointed to by its
arguments--must not overlap.
Another string function we've seen is strstr:
strstr(s,t) return pointer to first t in s, or NULL if not present

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10h.html [22/07/2003 5:10:04 PM]

section 7.8.2: Character Class Testing and Conversion

section 7.8.2: Character Class Testing and


Conversion
One quirk of these functions, which the authors mention briefly, is that although they accept arguments
of type int, it is not legal to pass just any int value to them. If you were to attempt to call
isupper(12345), it might do something bizarre. You should only call these functions with
arguments which represent valid character values. (Also, they are guaranteed to accept the value EOF
gracefully.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10i.html [22/07/2003 5:10:06 PM]

section 7.8.3: Ungetc

section 7.8.3: Ungetc


There's not much more to say about ungetc, but two more stdio functions which might deserve mention
are fread and fwrite.
getc and putc (and getchar and putchar) allow you to read and write a character at a time, while
fgets and fputs read and write a line at a time. The printf family of routines does formatted
output, and the scanf family does formatted input. But what if you want to read or write a big block of
unformatted characters, not necessarily one line long? You could use getc or putc in a loop, but
another solution is to use the fread and fwrite functions, which are (briefly) described in appendix
B1.5 on page 247.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10j.html [22/07/2003 5:10:07 PM]

section 7.8.4: Command Execution

section 7.8.4: Command Execution


page 167
The only thing to add to this brief description of system concerns the disposition of the executed
command's output. (Similar arguments apply to its input.) The output generally goes wherever the calling
program's output goes, though if the calling program has done anything with stdout (such as closing it,
or redirecting it within the program with freopen), those changes will probably not affect the output of
system. One way to achieve redirection of the command executed by system, if the operating system
permits it, is to use redirection notation within the command line passed to system:
system("date > outfile");
Note also that the exit status returned by the program (and hence perhaps by system) does not
necessarily have anything to do with anything printed by the program. One way to capture the output
printed by the program is to use redirection, as above, then open and read the output file.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10k.html [22/07/2003 5:10:09 PM]

section 7.8.5: Storage Management

section 7.8.5: Storage Management


The important thing to know about malloc and free and friends is to be careful when calling them. It
is easy to abuse them, either by using more space than you ask for (that is, writing beyond the ends of an
allocated block) or by continuing to use a pointer after the memory it points to has been freed (perhaps
because you had several pointers to the same block of memory, and you forgot that when you freed one
pointer they all became invalid). malloc-related bugs can be difficult and frustrating to track down, so
it's good to use programming practices which help to assure that the bugs don't happen in the first place.
(One such practice is to make sure that pointer variables are set to NULL when they don't point anywhere,
and to occasionally check pointer values--for instance at entry to an important pointer-using function--to
make sure that they're not NULL.)
As we mentioned on page 142 in section 6.5, it is no longer necessary (that is, in ANSI C) to cast
malloc's value to the appropriate type, though it doesn't hurt to do so.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10l.html [22/07/2003 5:10:10 PM]

section 7.8.6: Mathematical Functions

section 7.8.6: Mathematical Functions


page 168
Note that the pow function is how you do exponentiation in C--C does not have a built-in exponentiation
operator (such as ** or ^ in some other languages).
Before calling these functions, remember to #include <math.h>. (It's always a good idea to
#include the appropriate header(s) before using any library functions, but the math functions are
particularly unlikely to work correctly if you forget.) Also, under Unix, you may have to explicitly
request the math library by adding the -lm option at the end of the command line when
compiling/linking.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10m.html [22/07/2003 5:10:11 PM]

section 7.8.7: Random Number Generation

section 7.8.7: Random Number Generation


There is a typo in some printings; the code for returning a floating-point random number in the interval
[0,1) should be
#define frand() ((double) rand() / (RAND_MAX+1.0))
If you want to get random integers from M to N, you can use something like
M + (int)(frand() * (N-M+1))
``[Setting] the seed for rand'' refers to the fact that, by default, the sequence of pseudo-random numbers
returned by rand is the same each time your program runs. To randomize it, you can call srand at the
beginning of the program, handing it some truly random number, such as a value having to do with the
time of day. (One way is with code like
#include <stdlib.h>
#include <time.h>
srand((unsigned int)time((time_t *)NULL));
which uses the time function mentioned on page 256 in appendix B10.)
One other caveat about rand: don't try to generate random 0/1 values (to simulate a coin flip, perhaps)
with code like
rand() % 2
This looks like it ought to work, but it turns out that on some systems rand isn't always perfectly
random, and returns values which consistently alternate even, odd, even, odd, etc. (In fact, for similar
reasons, you shouldn't usually use rand() % N for any value of N.) A good way to get random 0/1
values would be
(int)(frand() * 2)
based on the other frand() examples above.

Read sequentially: prev up top


http://www.eskimo.com/~scs/cclass/krnotes/sx10n.html (1 of 2) [22/07/2003 5:10:13 PM]

section 7.8.7: Random Number Generation

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/krnotes/sx10n.html (2 of 2) [22/07/2003 5:10:13 PM]

Steve Summit's home page

Steve Summit's home page


I maintain the Usenet comp.lang.c FAQ list.
It's also available as a book.
I used to teach a pair of C programming classes, and their notes are all on line.
(More information about C.)
Other stuff I've written (mostly C-related Usenet posts).
Here are my collections of links, home pages, and other assorted references.
The ISP hosting this page and where I receive my e-mail, eskimo.com, is a rare and special one, perfect
for my needs. If you're looking for an ISP that gives you the access you need (including Unix shells)
without getting in your way or charging too much, check it out.
Steve Summit.

http://www.eskimo.com/~scs/ [22/07/2003 5:10:14 PM]

comp.lang.c Frequently Asked Questions

comp.lang.c Frequently Asked Questions


This collection of hypertext pages is Copyright 1995 by Steve Summit. Content from the book ``C
Programming FAQs: Frequently Asked Questions'' (Addison-Wesley, 1995, ISBN 0-201-84519-9) is
made available here by permission of the author and the publisher as a service to the community. It is
intended to complement the use of the published text and is protected by international copyright laws.
The content is made available here and may be accessed freely for personal use but may not be published
or retransmitted without written permission.
This page is the top of an HTML version of the Usenet comp.lang.c Frequently Asked Questions (FAQ)
list. An FAQ list is a collection of questions commonly asked on Usenet, together with presumably
definitive answers, provided in an attempt to keep repeated questions on the newsgroup down to a low
background drone so that discussion can move on to more interesting matters. Since they distill
knowledge gleaned from many sources and answer questions which are demonstrably Frequent, FAQ
lists serve as useful references outside of their originating Usenet newsgroups. This list is, I dare to
claim, no exception, and the HTML version you're looking at now, as well as other versions referenced
just below, are intended to be useful to C programmers everywhere.
Several other versions of this FAQ list are available, including a book-length version published by
Addison-Wesley. (The book, though longer, also has a few more errors; I've prepared an errata list.) See
also question 20.40.
Like so many web pages, this is very much a ``work in progress.'' I would, of course, like it if it were
perfect, but it's been two years or so since I first started talking about putting this thing on the web, and if
I were to wait until all the glitches were worked out, you might never see it. Each page includes a ``mail
feedback'' button, so you can help me debug it. (At first, you don't have to worry about reporting minor
formatting hiccups; many of these result from lingering imperfections in the programs that generate these
pages, or from the fact that I have not exhaustively researched how various browsers implement the
HTML tags I'm using, or from the fact that I haven't gone the last yard in trying to rig up HTML that
looks good in spite of the fact that HTML doesn't have everything you need to make things look good.)
These pages are synchronized with the posted Usenet version and the Addison-Wesley book version.
Since not all questions appear in all versions, the question numbers are not always contiguous.
[Note to web authors, catalogers, and bookmarkers: the URL <http://www.eskimo.com/~scs/Cfaq/top.html> is the right way to link to these pages. All other URL's implementing this collection are
subject to change.]
You can browse these pages in at least three ways. The table of contents below is of the list's major
sections; these links lead to sub-lists of the questions for those sections. The ``all questions'' link leads to
a list of all the questions; each question is (obviously) linked to its answer. Finally, the ``read
http://www.eskimo.com/~scs/C-faq/top.html (1 of 3) [22/07/2003 5:10:15 PM]

comp.lang.c Frequently Asked Questions

sequentially'' link leads to the first question; you can then follow the ``next'' link at the bottom of each
question's page to read through all of the questions and answers sequentially.
Steve Summit
scs@eskimo.com

1. Declarations and Initializations


2. Structures, Unions, and Enumerations
3. Expressions
4. Pointers
5. Null Pointers
6. Arrays and Pointers
7. Memory Allocation
8. Characters and Strings
9. Boolean Expressions and Variables
10. C Preprocessor
11. ANSI/ISO Standard C
12. Stdio
13. Library Functions
14. Floating Point
15. Variable-Length Argument Lists
16. Strange Problems
http://www.eskimo.com/~scs/C-faq/top.html (2 of 3) [22/07/2003 5:10:15 PM]

comp.lang.c Frequently Asked Questions

17. Style
18. Tools and Resources
19. System Dependencies
20. Miscellaneous
Bibliography
Acknowledgements

All Questions

Read Sequentially

http://www.eskimo.com/~scs/C-faq/top.html (3 of 3) [22/07/2003 5:10:15 PM]

versions of comp.lang.c FAQ list

comp.lang.c FAQ list(s)


You probably just came from there, but there is a browsable, web-based HTML version. (Beware: as of
1999, the web-based version is somewhat out-of-date with respect to the plain-text versions below.)
(Please don't ask me for a downloadable archive of the HTML version, as I'm currently unable to provide
one. Just browse it here, or download one of the versions below.)
An expanded, book-length version, with even longer answers to even more questions, has been published
by Addison-Wesley (ISBN 0-201-84519-9). Printed books, alas, tend to have a few errors; I've prepared
an errata list for this one.
Here is a recent, compressed copy of the ASCII FAQ list, as posted to Usenet (~100k compressed, ~260k
when uncompressed). This is currently the most up-to-date version. [This and the other compressed files
ending in .Z referenced from this page are compressed with the Unix "compress" utility and can be
uncompressed with "uncompress" or "gunzip", versions of which are, I believe, available for all popular
operating systems.]
Here is the abridged version (~26k compressed, ~55k when uncompressed).
Here are the differences from the previous version (compressed, sometimes quite large; or maybe
uncompressed, if they were minimal). Here is a collection of incremental differences with respect to even
older versions. NOTE: All of these diff lists pertain to the versions posted to Usenet, which are not
always synchronized with the web/html version.
Here is a (considerably older) compressed, PostScript rendition (152k compressed). BEWARE: the
question numbers don't match current versions. (Rather than printing it out, you could -- hint, hint -- get
the book.)
There are several translations into other languages:

to German, by Jochen Schoof et al. (If that link doesn't work, try this one.)
to Japanese, by Kinichi Kitano. (I don't know of a URL, but it is or was posted regularly to
fj.comp.lang.c, and has been published by Toppan, ISBN 4-8101-8097-2.)
Seong-Kook Cin has completed a Korean translation, which is at
http://pcrc.hongik.ac.kr/~cinsk/cfaqs/.
A French C FAQ list (not a direct translation of this one) is at http://www.istyinfo.uvsq.fr/~rumeau/fclc/.

Here is an, um, er, ``alternate version'' by Peter Seebach.


http://www.eskimo.com/~scs/C-faq/versions.html (1 of 2) [22/07/2003 5:10:17 PM]

versions of comp.lang.c FAQ list

If you're interested in C++, Marshall Cline maintains a C++ FAQ list.


For web access to other Usenet FAQ lists, visit faqs.org.
scs

http://www.eskimo.com/~scs/C-faq/versions.html (2 of 2) [22/07/2003 5:10:17 PM]

C Programming FAQs Errata

Errata list for "C Programming FAQs: Frequently Asked Questions",


by Steve Summit, Addison-Wesley, 1996, ISBN 0-201-84519-9
(first printing).
A possibly more up-to-date copy of this errata list may be
obtained at any time by anonymous ftp from ftp.eskimo.com
in the file ~scs/C-faq/book/Errata, or on the web at
http://www.eskimo.com/~scs/C-faq/book/Errata.html .
(If you read this years from now and those addresses don't
work, try ftp://ftp.aw.com/cseng/authors/summit/cfaq/ or
http://www.awl.com/cseng/titles/0-201-84519-9 .)
scs 2002-Oct-26
page
----

question
--------

front cover

The ladder has no rungs.

xxix

"woundn't" should be "wouldn't"

1.1

The fourth bulleted guarantee (about the sizes


following the "obvious progression") is
improperly stated. What the C Standard actually
talks about, as in the rest of this answer, is
just the ranges of the standard types, not their
sizes in bits. So the real guarantees (as
summarized below) are that
sizeof(char)
sizeof(short)
sizeof(int)
sizeof(long)

is
is
is
is

at
at
at
at

least
least
least
least

8 bits
16 bits
16 bits
32 bits

and, in C99,
sizeof(long long) is at least 64 bits
3-4

1.3

In C99, the new <inttypes.h> header provides


Standard names for exact-size types: int16_t,
uint32_t, etc.

1.4

In C99, long long is defined as an integer type


with, in effect, at least 64 bits.

1.7

There may be zero definitions of an external

http://www.eskimo.com/~scs/C-faq/book/Errata.html (1 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

function or variable that is not referenced


in any expression.
[Thanks and $1 to James Stern]
7

1.7

"use include to bring" should be


"use #include to bring"

11

1.14

In the second fix, at the bottom of the page,


it could conceivably be necessary to precede
the line
typedef struct node *NODEPTR;
with the line
struct node;
for the
in that
clearly
[Thanks

13

1.15

reason mentioned on page 13, although


case one of the two other fixes would
be preferable.
to James Stern]

In the alternate fix, at the bottom of the page,


it could conceivably be necessary to precede
the typedef declarations with the lines
struct a;
struct b;
although again, putting those typedefs after the
complete structure definitions would clearly be
preferable in that case.
[Thanks to James Stern]

18

1.22

The odd "return 0;" line is not really necessary.

20

1.24

Another possible arrangement is


/* file1.h */
#define ARRAYSZ 3
extern int array[ARRAYSZ];
/* file1.c */
#include "file1.h"
int array[ARRAYSZ];
/* file2.c"

http://www.eskimo.com/~scs/C-faq/book/Errata.html (2 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

#include "file1.h"
[Thanks to Jon Jagger]
23

1.29

[2nd bullet] "everything else termed" should be


"everything else, termed"

24

1.29

[Rule 3] "if the header" should be "if any header".


[Thanks and $1 to James Stern]

24

1.29

[Rule 4] "(i.e., function names)" should be


"(e.g., function names)".
[Thanks and $1 to James Stern]

24

1.29

The text at the bottom of the page suggests that


"future directions" name patterns such as str[a-z]*
are reserved only if their corresponding headers
(e.g. <stdlib.h>) are included. The reserved
function names are unconditionally reserved;
it is only the macro names that are reserved only
if the header is included.
[Thanks and $1 to Mark Brader]

25

1.29

"if you don't include the header files" should be


"if you don't include any header files".

32

2.4

Besides -> and sizeof, the . operator, as well as


declarations of actual structures, also require
the compiler to know more about the structure and
so preclude incomplete or hidden definitions.
[Thanks to James Stern]

33-36

2.6

In C99, a structure can contain a variable-length


array (VLA) as its last member, providing a
well-defined, Standard-compliant alternative.

38

2.10

C99 *does* have a way of generating anonymous


structure values: "compound literals".

40

2.12

When trying to minimize wasted space in structures,


array members should be ordered based on the size
of their primitive types, not their overall size.
[Thanks and $1 to James Stern]

43

2.20

"ANSI/SIO" should be "ANSI/ISO"


In C99, the "designated initializer" mechanism

http://www.eskimo.com/~scs/C-faq/book/Errata.html (3 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

allows any member of a union to be initialized.


50

3.3

Of course, another way to increment i is i += 1.


[Thanks to James Stern]

51

3.4

"higher precedence than *):" should be


"higher precedence than *:"

52

3.6

Delete the close parenthesis at the end of the answer.

57

3.12

In C++, the prefix form ++i is preferred.


[Thanks to James Stern]

68

4.5

The reference to ANSI Sec. 3.3.4 should say


"esp. footnote 44".
[Thanks to Willis Gooch]

72-3

4.10

In C99, it is possible to use a "compound


literal" to generate a pointer to an (unnamed)
constant value.

73

4.11

The reference to K&R2 sec. 5.2 should be pp. 95-7.


[Thanks and $1 to Nikos Triantafillis]

75

4.13

"can interconverted" should be "can be interconverted".


[Thanks and $1 to Howard Ham]

84

5.8

Either the comma or the parentheses in the answer


should be changed.

95

6.2

The typography in the following line is inconsistent


for the "x" of "x[3]".

104-5

6.15

C99 introduces variable-length arrays (VLA's) which,


among other things, *do* allow declaration of a
local array of size matching a passed-in array.

105-7

6.16

In C99, another solution is to use a


variable-length array.

110

6.19

C99's variable-length arrays are also a nice


solution to this problem.

115

7.1

The close parenthesis and period ")." at the bottom


of the page are not part of the #define line.

121

7.9

There is an extra semicolon at the end of the first

http://www.eskimo.com/~scs/C-faq/book/Errata.html (4 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

line of mymalloc's definition.


[Thanks and $1 to Todd Burruss]
126

7.10

Missing "it"; should be "even if it is not


dereferenced".
[Thanks and $1 to Clinton Sheppard]

132

7.30

It would be even safer to add a second test on


nchmax:
if(nchread >= nchmax) {
nchmax += 20;
if(nchread >= nchmax) {
free(retbuf);
return NULL;
}
newbuf = realloc(retbuf, nchmax + 1);
The concern is that, while reading a *very* long line,
nchmax might overflow, wrapping back around to 0.
[Thanks to Mark Brader]

134

7.32

C99's variable-length arrays (VLA's) can be used


to more cleanly accomplish most of the tasks
which alloca used to be put to.

136

8.1

"Although string literal" should be


"Although a string literal"

136

8.2

C can be tricked into seeming to assign an array


as a whole if you hide the array inside a
structure or union.
[Thanks and $1 to James Stern]

143

9.2

The example variable isvegetable should perhaps


be named is_vegetable to avoid naming conflicts
(see question 1.29).
[Thanks and $1 to Jon Jagger]

151

10.4

Extra space in "/* (no trailing ; ) */".

152

10.6

[paragraph below bullets] "bring the header wherever"


should be "bring the header in wherever"

158

10.15

If you have to, you can obviously #define a companion


macro name for each typedef, and use #ifdef with that.
[Thanks to James Stern]

http://www.eskimo.com/~scs/C-faq/book/Errata.html (5 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

161

10.21

The suggested replacement macro should


parenthesize c:
#define CTRL(c) ((c) & 037)
[Thanks and $1 to James Stern]

163-4

10.29

C99 introduces formal support for macros with


variable numbers of arguments.

164-5

10.27

The file parameter of the dbginfo() function and


the fmt parameter of the debug() function could
be of type const char *.
[Thanks to James Stern]

168

11.1

The story has gotten longer: A new revision of


the C Standard, "C99", has been ratified,
superseding the original ANSI C Standard.
This Errata list has been updated to note those
answers in the book which have become dated due
to C99 changes.

169-70

11.2

C99 *is* available in electronic form, for $18


from www.ansi.org .

174

11.10

As written, the "complicated series of assignments"


of course includes some declarations and initializations.
[Thanks to James Stern]

175

11.10

"e.g., (const char) ** in this case" should be


"e.g., (const char **) in this case"
"when the pointers which" should either be
"when the pointers" or "with pointers which"

180

11.19

"questions 20.20" should be "question 20.20"

182

11.25

"The function offers" should be


"The memmove function offers".
[Thanks and $1 to Gordon Burditt]

183-4

11.27

In C99, external identifiers are required


to be unique in the first 32 characters;
C90's extremely Spartan limitation to six
characters has been relaxed.

http://www.eskimo.com/~scs/C-faq/book/Errata.html (6 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

186

11.29

You may also need to rework calls to realloc


that use NULL or 0 as first or second arguments
(see question 7.30).

186

11.29

You may also need to rework conditional compilation


involving #elif.
See also the Rationale's list of "Quiet Changes"
(see question 11.2).
[Thanks to James Stern]

189

11.33

A fourth class of behavior is locale-specific.


[Thanks and $1 to James Stern]

198

12.11

A semicolon is missing after "int i = 0".


The } just before the line "*p = '\0'" is
indented one tab too few.
Two instances of "*--p" have the minus signs merged
so as to appear as one.

201

12.16

[case 2] The variable line is not declared;


it should probably be a char [], suitably
initialized, e.g.:
char line[] = "1 2.3 4.5e6 789e10";
[Thanks and $1 to James Stern]

205

12.19

There's an extraneous double quote in what


should be "intervening whitespace:".

207-8

12.21

The technique of writing to a file may give the


wrong answer if the disk fills up.
[Thanks and $1 to Mark Brader]
The "hope that a future revision of the ANSI/ISO
C Standard will include" the snprintf function
has been fulfilled: C99 does specify it.
As a bonus, the C99 snprintf can be used to predict
the size required for an arbitrary sprintf call,
too -- it can be called with a null pointer
instead of a destination buffer (and 0 as the
size of that nonexistent buffer) and it returns
the number of characters it would have written.

212

12.28

The answer is in the wrong font.

http://www.eskimo.com/~scs/C-faq/book/Errata.html (7 of 12) [22/07/2003 5:10:20 PM]

C Programming FAQs Errata

213

12.30

Updating (overwriting) a text file in-place is


not fully portable; the C Standard leaves it
implementation-defined whether a write to a
text file truncates it at that point.
[Thanks and $1 to Tanmoy Bhattacharya]

224

13.4

"upper- or lowercase" should probably be


"upper or lower case".

225

13.6

Since the fragment calls printf, it must


#include <stdio.h>.
[Thanks and $1 to James Stern]

226

13.6

[last code fragment] A declaration and initialization


char string[] = "this\thas\t\tmissing\tfield";
similar to the one on p. 225 should appear.
[Thanks and $1 to Doug Liu]

227

13.6

Also, since the input string is modified,


it must be writable; see question 1.32.

234

13.14

"time_ts" should perhaps be "time_t's"

240

13.17

The code
srand((unsigned int)time((time_t *)NULL));
though popular and generally effective is, alas,
not strictly conforming. It's theoretically
possible that time_t could be defined as a
floating-point type under some implementation,
and that the time_t values returned by time()
could therefore exceed the range of an unsigned
int in such a way that a well-defined cast to
(unsigned int) is not possible.

242-3

13.20

The attributions listed for methods 2 and 3 are


scrambled. Method 2 is the one described in
the 1958 Box and Muller paper (as well as by
Abramowitz and Stegun, apparently). Method 3
is originally due to Marsaglia.

244

13.21

If you're not familiar with the notation [0, 1),


it means that drand48() returns a number x

http://www.eskimo.com/~scs/C-faq/book/Errata.html (8 of 12) [22/07/2003 5:10:21 PM]

C Programming FAQs Errata

such that 0 <= x and x < 1.


250

14.5

The suggested expression should read


fabs(a - b) <= epsilon * fabs(a)
It performs poorly if a == 0.0 (which is another
argument in favor of "mak[ing] the threshold
a function of b, or of both a and b").

253

14.8

Of course, you can always compute pi using


4*atan(1.0) or acos(-1.0).
[Thanks to James Stern and Clinton Sheppard]

253

14.9

C99 specifies isnan() and several other


classification routines.

254-5

14.11

C99 supports complex as a standard type.

260-1

15.4

The first argument to vstrcat() could be const char *,


as could the fmt argument to miniprintf().
[Thanks to James Stern]

264

15.5

The fmt argument to error() could be const char *.

269-71

15.12

The fmt arguments to faterror(), verror(), and


error() could all be const char *.

274

16.4

[point 2] The problem could be caused by a setbuf


or setvbuf buffer local to any function.
[Thanks and $1 to James Stern]

276

16.7

Variable "s" isn't declared. It's pretty obvious


what it should be, but to make it explicit, change
the struct declaration to
struct mystruct { ... } s;
[Thanks to Peter Hryczanek]

287

18.1

The URL in the list of metrics tools is really


"http://www.qucis.queensu.ca:1999/SoftwareEngineering/Cmetrics.html".
294

18.13

The conventional spelling is "NetBSD".


[Thanks and $1 to Peter Seebach]

http://www.eskimo.com/~scs/C-faq/book/Errata.html (9 of 12) [22/07/2003 5:10:21 PM]

C Programming FAQs Errata

294

18.14

Extra space in site which should be "sunsite.unc.edu".

296

18.16

Extra space in address which should be


"archie@archie.cs.mcgill.ca".

308

19.11

Note that a test using fopen() *is* approximate;


failure does not necessarily indicate nonexistence.

310

19.14

Updating (overwriting) a text file in-place is


not fully portable; the C Standard leaves it
implementation-defined whether a write to a
text file truncates it at that point.
[Thanks and $1 to Tanmoy Bhattacharya]

314

19.23

In C99, the guarantee on the possible size of a


single object has been raised to 64K.

315

19.25

Use of the `volatile' qualifier is often


appropriate when performing memory-mapped I/O.
[Thanks to Lee Crawford]

317

19.27

The return value of system() is not guaranteed


to be the command's exit status.
[Thanks and $1 to Peter Seebach]

318

19.30

If you forget to call pclose, it's probably at


least as likely that you'll run out of file
descriptors as processes.
[Thanks and $1 to Jens Schweikhardt]

319

19.31

argv[0] may also be a null pointer.


[Thanks and $1 to Tanmoy Bhattacharya]

324

19.42

"control characters, such as" should be


"control characters such as"

339-40
342-44

The page break makes the code very hard to follow.


20.13

The tone of this question's answer can be read as


suggesting that efficiency isn't important at all.
That's not the case, of course -- efficiency can
very important, and poorly-written programs can
run abysmally inefficiently.
The point is that there are good ways and bad
ways of achieving an appropriate level of
performance for a given program, and that (for

http://www.eskimo.com/~scs/C-faq/book/Errata.html (10 of 12) [22/07/2003 5:10:21 PM]

C Programming FAQs Errata

example) picking a good algorithm tends to make a


much bigger difference than does microoptimizing
the coding details of a lesser algorithm.
346

20.17

Missing tab in line which should be


#define CODE_NONE

350

20.21

The overbars are misaligned.

355

20.29

"and computes that number" should either be


"computed" or "and is computed".

363

[aggregate] Unions are not aggregates.


[Thanks and $1 to Kinichi Kitano]

368

[parameter] Extraneous semicolon at end of


line which should be
f(int i)

370-1

The glossary entry for "undefined" is misplaced.


[Thanks and $1 to James Stern]

376

The two minus signs in the index entry for


"-- operator" overlap and appear to be one.

379

The pairs of underscores in the index entry for


"__FILE__ macro" overlap and might appear to be one.

382

The pairs of underscores in the index entry for


"__LINE__ macro" overlap and might appear to be one.

back cover

"on the Usenet/Internet on the C FAQ" is muddled


and should say something else.
"com.lang.c" should be "comp.lang.c".
The ftp address for source code should be
ftp://ftp.aw.com/cseng/authors/summit/cfaq .

more information about this book


on-line version of FAQ list

http://www.eskimo.com/~scs/C-faq/book/Errata.html (11 of 12) [22/07/2003 5:10:21 PM]

C Programming FAQs Errata

scs home page

http://www.eskimo.com/~scs/C-faq/book/Errata.html (12 of 12) [22/07/2003 5:10:21 PM]

comp.lang.c Frequently Asked Questions

comp.lang.c Frequently Asked Questions


This collection of hypertext pages is Copyright 1995 by Steve Summit. Content from the book ``C
Programming FAQs: Frequently Asked Questions'' (Addison-Wesley, 1995, ISBN 0-201-84519-9) is
made available here by permission of the author and the publisher as a service to the community. It is
intended to complement the use of the published text and is protected by international copyright laws.
The content is made available here and may be accessed freely for personal use but may not be published
or retransmitted without written permission.
This page is the top of an HTML version of the Usenet comp.lang.c Frequently Asked Questions (FAQ)
list. An FAQ list is a collection of questions commonly asked on Usenet, together with presumably
definitive answers, provided in an attempt to keep repeated questions on the newsgroup down to a low
background drone so that discussion can move on to more interesting matters. Since they distill
knowledge gleaned from many sources and answer questions which are demonstrably Frequent, FAQ
lists serve as useful references outside of their originating Usenet newsgroups. This list is, I dare to
claim, no exception, and the HTML version you're looking at now, as well as other versions referenced
just below, are intended to be useful to C programmers everywhere.
Several other versions of this FAQ list are available, including a book-length version published by
Addison-Wesley. (The book, though longer, also has a few more errors; I've prepared an errata list.) See
also question 20.40.
Like so many web pages, this is very much a ``work in progress.'' I would, of course, like it if it were
perfect, but it's been two years or so since I first started talking about putting this thing on the web, and if
I were to wait until all the glitches were worked out, you might never see it. Each page includes a ``mail
feedback'' button, so you can help me debug it. (At first, you don't have to worry about reporting minor
formatting hiccups; many of these result from lingering imperfections in the programs that generate these
pages, or from the fact that I have not exhaustively researched how various browsers implement the
HTML tags I'm using, or from the fact that I haven't gone the last yard in trying to rig up HTML that
looks good in spite of the fact that HTML doesn't have everything you need to make things look good.)
These pages are synchronized with the posted Usenet version and the Addison-Wesley book version.
Since not all questions appear in all versions, the question numbers are not always contiguous.
[Note to web authors, catalogers, and bookmarkers: the URL <http://www.eskimo.com/~scs/Cfaq/top.html> is the right way to link to these pages. All other URL's implementing this collection are
subject to change.]
You can browse these pages in at least three ways. The table of contents below is of the list's major
sections; these links lead to sub-lists of the questions for those sections. The ``all questions'' link leads to
a list of all the questions; each question is (obviously) linked to its answer. Finally, the ``read
http://www.eskimo.com/~scs/C-faq.top.html (1 of 3) [22/07/2003 5:10:23 PM]

comp.lang.c Frequently Asked Questions

sequentially'' link leads to the first question; you can then follow the ``next'' link at the bottom of each
question's page to read through all of the questions and answers sequentially.
Steve Summit
scs@eskimo.com

1. Declarations and Initializations


2. Structures, Unions, and Enumerations
3. Expressions
4. Pointers
5. Null Pointers
6. Arrays and Pointers
7. Memory Allocation
8. Characters and Strings
9. Boolean Expressions and Variables
10. C Preprocessor
11. ANSI/ISO Standard C
12. Stdio
13. Library Functions
14. Floating Point
15. Variable-Length Argument Lists
16. Strange Problems
http://www.eskimo.com/~scs/C-faq.top.html (2 of 3) [22/07/2003 5:10:23 PM]

comp.lang.c Frequently Asked Questions

17. Style
18. Tools and Resources
19. System Dependencies
20. Miscellaneous
Bibliography
Acknowledgements

All Questions

Read Sequentially

http://www.eskimo.com/~scs/C-faq.top.html (3 of 3) [22/07/2003 5:10:23 PM]

Question 20.40

Question 20.40
Where can I get extra copies of this list? What about back issues?

An up-to-date copy may be obtained from aw.com in directory xxx or ftp.eskimo.com in directory
u/s/scs/C-faq/. You can also just pull it off the net; it is normally posted to comp.lang.c on the first of
each month, with an Expires: line which should keep it around all month. A parallel, abridged version is
available (and posted), as is a list of changes accompanying each significantly updated version.
The various versions of this list are also posted to the newsgroups comp.answers and news.answers .
Several sites archive news.answers postings and other FAQ lists, including this one; two sites are
rtfm.mit.edu (directories pub/usenet/news.answers/C-faq/ and pub/usenet/comp.lang.c/) and ftp.uu.net
(directory usenet/news.answers/C-faq/). An archie server (see question 18.16) should help you find
others; ask it to ``find C-faq''. If you don't have ftp access, a mailserver at rtfm.mit.edu can mail you FAQ
lists: send a message containing the single word help to mail-server@rtfm.mit.edu . See the meta-FAQ
list in news.answers for more information.
An extended version of this FAQ list is being published by Addison-Wesley as C Programming FAQs:
Frequently Asked Questions (ISBN 0-201-84519-9). It should be available in November 1995.
This list is an evolving document of questions which have been Frequent since before the Great
Renaming, not just a collection of this month's interesting questions. Older copies are obsolete and don't
contain much, except the occasional typo, that the current list doesn't.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.40.html [22/07/2003 5:10:24 PM]

Question 18.16

Question 18.16
Where and how can I get copies of all these freely distributable programs?

As the number of available programs, the number of publicly accessible archive sites, and the number of
people trying to access them all grow, this question becomes both easier and more difficult to answer.
There are a number of large, public-spirited archive sites out there, such as ftp.uu.net, archive.umich.edu,
oak.oakland.edu, sumex-aim.stanford.edu, and wuarchive.wustl.edu, which have huge amounts of
software and other information all freely available. For the FSF's GNU project, the central distribution
site is prep.ai.mit.edu . These well-known sites tend to be extremely busy and hard to reach, but there are
also numerous ``mirror'' sites which try to spread the load around.
On the connected Internet, the traditional way to retrieve files from an archive site is with anonymous
ftp. For those without ftp access, there are also several ftp-by-mail servers in operation. More and more,
the world-wide web (WWW) is being used to announce, index, and even transfer large data files. There
are probably yet newer access methods, too.
Those are some of the easy parts of the question to answer. The hard part is in the details--this article
cannot begin to track or list all of the available archive sites or all of the various ways of accessing them.
If you have access to the net at all, you probably have access to more up-to-date information about active
sites and useful access methods than this FAQ list does.
The other easy-and-hard aspect of the question, of course, is simply finding which site has what you're
looking for. There is a tremendous amount of work going on in this area, and there are probably new
indexing services springing up every day. One of the first was ``archie'': for any program or resource
available on the net, if you know its name, an archie server can usually tell you which anonymous ftp
sites have it. Your system may have an archie command, or you can send the mail message ``help'' to
archie@archie.cs.mcgill.ca for information.
If you have access to Usenet, see the regular postings in the comp.sources.unix and comp.sources.misc
newsgroups, which describe the archiving policies for those groups and how to access their archives. The
comp.archives newsgroup contains numerous announcements of anonymous ftp availability of various
items. Finally, the newsgroup comp.sources.wanted is generally a more appropriate place to post queries
for source availability, but check its FAQ list, ``How to find sources,'' before posting there.
See also question 14.12.

http://www.eskimo.com/~scs/C-faq/q18.16.html (1 of 2) [22/07/2003 5:10:26 PM]

Question 18.16

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.16.html (2 of 2) [22/07/2003 5:10:26 PM]

Question 14.12

Question 14.12
I'm looking for some code to do:
Fast Fourier Transforms (FFT's)
matrix arithmetic (multiplication, inversion, etc.)
complex arithmetic

Ajay Shah maintains an index of free numerical software; it is posted periodically, and available where
this FAQ list is archived (see question 20.40). See also question 18.16.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q14.12.html [22/07/2003 5:10:27 PM]

Question 14.11

Question 14.11
What's a good way to implement complex numbers in C?

It is straightforward to define a simple structure and some arithmetic functions to manipulate them. See
also questions 2.7, 2.10, and 14.12.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q14.11.html [22/07/2003 5:10:29 PM]

Question 2.7

Question 2.7
I heard that structures could be assigned to variables and passed to and from functions, but K&R1 says
not.

What K&R1 said was that the restrictions on structure operations would be lifted in a forthcoming
version of the compiler, and in fact structure assignment and passing were fully functional in Ritchie's
compiler even as K&R1 was being published. Although a few early C compilers lacked these operations,
all modern compilers support them, and they are part of the ANSI C standard, so there should be no
reluctance to use them. [footnote]
(Note that when a structure is assigned, passed, or returned, the copying is done monolithically; anything
pointed to by any pointer fields is not copied.)
References: K&R1 Sec. 6.2 p. 121
K&R2 Sec. 6.2 p. 129
ANSI Sec. 3.1.2.5, Sec. 3.2.2.1, Sec. 3.3.16
ISO Sec. 6.1.2.5, Sec. 6.2.2.1, Sec. 6.3.16
H&S Sec. 5.6.2 p. 133

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.7.html [22/07/2003 5:10:31 PM]

Footnote 1

However, passing large structures to and from functions can be expensive (see question 2.9), so you may
want to consider using pointers, instead (as long as you don't need pass-by-value semantics, of course).
back

http://www.eskimo.com/~scs/C-faq/fn1.html [22/07/2003 5:10:32 PM]

Question 2.9

Question 2.9
How are structure passing and returning implemented?

When structures are passed as arguments to functions, the entire structure is typically pushed on the
stack, using as many words as are required. (Programmers often choose to use pointers to structures
instead, precisely to avoid this overhead.) Some compilers merely pass a pointer to the structure, though
they may have to make a local copy to preserve pass-by-value semantics.
Structures are often returned from functions in a location pointed to by an extra, compiler-supplied
``hidden'' argument to the function. Some older compilers used a special, static location for structure
returns, although this made structure-valued functions non-reentrant, which ANSI C disallows.
References: ANSI Sec. 2.2.3
ISO Sec. 5.2.3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.9.html [22/07/2003 5:10:35 PM]

Question 2.8

Question 2.8
Why can't you compare structures?

There is no single, good way for a compiler to implement structure comparison which is consistent with
C's low-level flavor. A simple byte-by-byte comparison could founder on random bits present in unused
``holes'' in the structure (such padding is used to keep the alignment of later fields correct; see question
2.12). A field-by-field comparison might require unacceptable amounts of repetitive code for large
structures.
If you need to compare two structures, you'll have to write your own function to do so, field by field.
References: K&R2 Sec. 6.2 p. 129
ANSI Sec. 4.11.4.1 footnote 136
Rationale Sec. 3.3.9
H&S Sec. 5.6.2 p. 133

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.8.html [22/07/2003 5:10:36 PM]

Question 2.12

Question 2.12
My compiler is leaving holes in structures, which is wasting space and preventing ``binary'' I/O to
external data files. Can I turn off the padding, or otherwise control the alignment of structure fields?

Your compiler may provide an extension to give you this control (perhaps a #pragma; see question
11.20), but there is no standard method.
See also question 20.5.
References: K&R2 Sec. 6.4 p. 138
H&S Sec. 5.6.4 p. 135

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.12.html [22/07/2003 5:10:37 PM]

Question 11.20

Question 11.20
What are #pragmas and what are they good for?

The #pragma directive provides a single, well-defined ``escape hatch'' which can be used for all sorts of
implementation-specific controls and extensions: source listing control, structure packing, warning
suppression (like lint's old /* NOTREACHED */ comments), etc.
References: ANSI Sec. 3.8.6
ISO Sec. 6.8.6
H&S Sec. 3.7 p. 61

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.20.html [22/07/2003 5:10:38 PM]

Question 11.19

Question 11.19
I'm getting strange syntax errors inside lines I've #ifdeffed out.

Under ANSI C, the text inside a ``turned off'' #if, #ifdef, or #ifndef must still consist of ``valid
preprocessing tokens.'' This means that there must be no newlines inside quotes, and no unterminated
comments or quotes (note particularly that an apostrophe within a contracted word looks like the
beginning of a character constant). Therefore, natural-language comments and pseudocode should always
be written between the ``official'' comment delimiters /* and */. (But see question 20.20, and also
10.25.)
References: ANSI Sec. 2.1.1.2, Sec. 3.1
ISO Sec. 5.1.1.2, Sec. 6.1
H&S Sec. 3.2 p. 40

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.19.html [22/07/2003 5:10:39 PM]

Question 20.20

Question 20.20
Why don't C comments nest? How am I supposed to comment out code containing comments? Are
comments legal inside quoted strings?

C comments don't nest mostly because PL/I's comments, which C's are borrowed from, don't either.
Therefore, it is usually better to ``comment out'' large sections of code, which might contain comments,
with #ifdef or #if 0 (but see question 11.19).
The character sequences /* and */ are not special within double-quoted strings, and do not therefore
introduce comments, because a program (particularly one which is generating C code as output) might
want to print them.
Note also that // comments, as in C++, are not currently legal in C, so it's not a good idea to use them in
C programs (even if your compiler supports them as an extension).
References: K&R1 Sec. A2.1 p. 179
K&R2 Sec. A2.2 p. 192
ANSI Sec. 3.1.9 (esp. footnote 26), Appendix E
ISO Sec. 6.1.9, Annex F
Rationale Sec. 3.1.9
H&S Sec. 2.2 pp. 18-9
PCS Sec. 10 p. 130

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.20.html [22/07/2003 5:10:41 PM]

Question 20.19

Question 20.19
Are the outer parentheses in return statements really optional?

Yes.
Long ago, in the early days of C, they were required, and just enough people learned C then, and wrote
code which is still in circulation, that the notion that they might still be required is widespread.
(As it happens, parentheses are optional with the sizeof operator, too, as long as its operand is a
variable or a unary expression.)
References: K&R1 Sec. A18.3 p. 218
ANSI Sec. 3.3.3, Sec. 3.6.6
ISO Sec. 6.3.3, Sec. 6.6.6
H&S Sec. 8.9 p. 254

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.19.html [22/07/2003 5:10:42 PM]

Question 20.18

Question 20.18
Is there a way to have non-constant case labels (i.e. ranges or arbitrary expressions)?

No. The switch statement was originally designed to be quite simple for the compiler to translate,
therefore case labels are limited to single, constant, integral expressions. You can attach several case
labels to the same statement, which will let you cover a small range if you don't mind listing all cases
explicitly.
If you want to select on arbitrary ranges or non-constant expressions, you'll have to use an if/else chain.
See also questions question 20.17.
References: K&R1 Sec. 3.4 p. 55
K&R2 Sec. 3.4 p. 58
ANSI Sec. 3.6.4.2
ISO Sec. 6.6.4.2
Rationale Sec. 3.6.4.2
H&S Sec. 8.7 p. 248

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.18.html [22/07/2003 5:10:43 PM]

Question 20.17

Question 20.17
Is there a way to switch on strings?

Not directly. Sometimes, it's appropriate to use a separate function to map strings to integer codes, and
then switch on those. Otherwise, of course, you can fall back on strcmp and a conventional if/else
chain. See also questions 10.12, 20.18, and 20.29.
References: K&R1 Sec. 3.4 p. 55
K&R2 Sec. 3.4 p. 58
ANSI Sec. 3.6.4.2
ISO Sec. 6.6.4.2
H&S Sec. 8.7 p. 248

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.17.html [22/07/2003 5:10:47 PM]

Question 10.12

Question 10.12
How can I construct preprocessor #if expressions which compare strings?

You can't do it directly; preprocessor #if arithmetic uses only integers. You can #define several
manifest constants, however, and implement conditionals on those.
See also question 20.17.
References: K&R2 Sec. 4.11.3 p. 91
ANSI Sec. 3.8.1
ISO Sec. 6.8.1
H&S Sec. 7.11.1 p. 225

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.12.html [22/07/2003 5:10:48 PM]

Question 10.11

Question 10.11
I seem to be missing the system header file <sgtty.h>. Can someone send me a copy?

Standard headers exist in part so that definitions appropriate to your compiler, operating system, and
processor can be supplied. You cannot just pick up a copy of someone else's header file and expect it to
work, unless that person is using exactly the same environment. Ask your compiler vendor why the file
was not provided (or to send a replacement copy).

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.11.html [22/07/2003 5:10:49 PM]

Question 10.9

Question 10.9
I'm getting strange syntax errors on the very first declaration in a file, but it looks fine.

Perhaps there's a missing semicolon at the end of the last declaration in the last header file you're
#including. See also questions 2.18 and 11.29.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.9.html [22/07/2003 5:10:51 PM]

Question 2.18

Question 2.18
This program works correctly, but it dumps core after it finishes. Why?
struct list {
char *item;
struct list *next;
}
/* Here is the main program. */
main(argc, argv)
{ ... }

A missing semicolon causes main to be declared as returning a structure. (The connection is hard to see
because of the intervening comment.) Since structure-valued functions are usually implemented by
adding a hidden return pointer (see question 2.9), the generated code for main() tries to accept three
arguments, although only two are passed (in this case, by the C start-up code). See also questions 10.9
and 16.4.
References: CT&P Sec. 2.3 pp. 21-2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.18.html [22/07/2003 5:10:52 PM]

Question 11.21

Question 11.21
What does ``#pragma once'' mean? I found it in some header files.

It is an extension implemented by some preprocessors to help make header files idempotent; it is


essentially equivalent to the #ifndef trick mentioned in question 10.7.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.21.html [22/07/2003 5:10:54 PM]

Question 10.7

Question 10.7
Is it acceptable for one header file to #include another?

It's a question of style, and thus receives considerable debate. Many people believe that ``nested
#include files'' are to be avoided: the prestigious Indian Hill Style Guide (see question 17.9)
disparages them; they can make it harder to find relevant definitions; they can lead to multiple-definition
errors if a file is #included twice; and they make manual Makefile maintenance very difficult. On the
other hand, they make it possible to use header files in a modular way (a header file can #include
what it needs itself, rather than requiring each #includer to do so); a tool like grep (or a tags file)
makes it easy to find definitions no matter where they are; a popular trick along the lines of:
#ifndef HFILENAME_USED
#define HFILENAME_USED
...header file contents...
#endif
(where a different bracketing macro name is used for each header file) makes a header file ``idempotent''
so that it can safely be #included multiple times; and automated Makefile maintenance tools (which
are a virtual necessity in large projects anyway; see question 18.1) handle dependency generation in the
face of nested #include files easily. See also question 17.10.
References: Rationale Sec. 4.1.2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.7.html [22/07/2003 5:10:56 PM]

Question 17.9

Question 17.9
Where can I get the ``Indian Hill Style Guide'' and other coding standards?

Various documents are available for anonymous ftp from:


Site:

File or directory:

cs.washington.edu
pub/cstyle.tar.Z
(the updated Indian Hill guide)
ftp.cs.toronto.edu

ftp.cs.umd.edu

doc/programming
(including Henry Spencer's
``10 Commandments for C Programmers'')
pub/style-guide

You may also be interested in the books The Elements of Programming Style, Plum Hall Programming
Guidelines, and C Style: Standards and Guidelines; see the Bibliography. (The Standards and Guidelines
book is not in fact a style guide, but a set of guidelines on selecting and creating style guides.)
See also question 18.9.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q17.9.html [22/07/2003 5:10:58 PM]

Question 18.9

Question 18.9
Are there any C tutorials or other resources on the net?

There are several of them:


``Notes for C programmers,'' by Christopher Sawtell, are available from ftp.funet.fi in
pub/languages/C/tutorials/sawtell_C.tar.gz.
Tim Love's ``C for Programmers'' is at
http://www.eng.cam.ac.uk/help/tpl/languages/C/teaching_C/teaching_C.html .
The Coronado Enterprises C tutorials are available on Simtel mirrors in pub/msdos/c/ or on the web at
http://www.swcp.com/~dodrill/controlled/cdoc/cmain.html.
Rick Rowe has a tutorial which is available from ftp.netcom.com as pub/rowe/tutorde.zip or
ftp.wustl.edu as pub/MSDOS_UPLOADS/programming/c_language/ctutorde.zip .
There is evidently a web-based course at http://www.strath.ac.uk/IT/Docs/Ccourse/ccourse.html .
Finally, on some Unix machines you can try typing learn c at the shell prompt.
[Disclaimer: I have not reviewed these tutorials; I have heard that at least one of them contains a number
of errors. Also, this sort of information rapidly becomes out-of-date; these addresses may not work by the
time you read this and try them.]
Several of these tutorials, plus a great deal of other information about C, are accessible via the web at
http://www.lysator.liu.se/c/index.html .
Vinit Carpenter maintains a list of resources for learning C and C++; it is posted to comp.lang.c and
comp.lang.c++, and archived where this FAQ list is (see question 20.40), or on the web at
http://www.cyberdiem.com/vin/learn.html .
See also question 18.10.

Read sequentially: prev next up top


http://www.eskimo.com/~scs/C-faq/q18.9.html (1 of 2) [22/07/2003 5:11:00 PM]

Question 18.9

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.9.html (2 of 2) [22/07/2003 5:11:00 PM]

Question 18.10

Question 18.10
What's a good book for learning C?

There are far too many books on C to list here; it's impossible to rate them all. Many people believe that
the best one was also the first: The C Programming Language, by Kernighan and Ritchie (``K&R,'' now
in its second edition). Opinions vary on K&R's suitability as an initial programming text: many of us did
learn C from it, and learned it well; some, however, feel that it is a bit too clinical as a first tutorial for
those without much programming background.
An excellent reference manual is C: A Reference Manual, by Samuel P. Harbison and Guy L. Steele,
now in its fourth edition.
Though not suitable for learning C from scratch, this FAQ list has been published in book form; see the
Bibliography.
Mitch Wright maintains an annotated bibliography of C and Unix books; it is available for anonymous
ftp from ftp.rahul.net in directory pub/mitch/YABL/.
This FAQ list's editor maintains a collection of previous answers to this question, which is available upon
request. See also question 18.9.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.10.html [22/07/2003 5:11:01 PM]

Question 18.13

Question 18.13
Where can I find the sources of the standard C libraries?

One source (though not public domain) is The Standard C Library, by P.J. Plauger (see the
Bibliography). Implementations of all or part of the C library have been written and are readily available
as part of the netBSD and GNU (also Linux) projects. See also question 18.16.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.13.html [22/07/2003 5:11:03 PM]

Question 18.14

Question 18.14
I need code to parse and evaluate expressions.

Two available packages are ``defunc,'' posted to comp.sources.misc in December, 1993 (V41 i32,33), to
alt.sources in January, 1994, and available from sunsite.unc.edu in
pub/packages/development/libraries/defunc-1.3.tar.Z, and ``parse,'' at lamont.ldgo.columbia.edu. Other
options include the S-Lang interpreter, available via anonymous ftp from amy.tch.harvard.edu in
pub/slang, and the shareware Cmm (``C-minus-minus'' or ``C minus the hard stuff''). See also question
18.16.
There is also some parsing/evaluation code in Software Solutions in C (chapter 12, pp. 235-55).

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.14.html [22/07/2003 5:11:04 PM]

Question 18.15

Question 18.15
Where can I get a BNF or YACC grammar for C?

The definitive grammar is of course the one in the ANSI standard; see question 11.2. Another grammar
(along with one for C++) by Jim Roskind is in pub/c++grammar1.1.tar.Z at ics.uci.edu . A fleshed-out,
working instance of the ANSI grammar (due to Jeff Lee) is on ftp.uu.net (see question 18.16) in
usenet/net.sources/ansi.c.grammar.Z (including a companion lexer). The FSF's GNU C compiler contains
a grammar, as does the appendix to K&R2.
The comp.compilers archives contain more information about grammars; see question 18.3.
References: K&R1 Sec. A18 pp. 214-219
K&R2 Sec. A13 pp. 234-239
ANSI Sec. A.2
ISO Sec. B.2
H&S pp. 423-435 Appendix B

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.15.html [22/07/2003 5:11:06 PM]

Question 11.2

Question 11.2
How can I get a copy of the Standard?

[Late-breaking news: I've been told that copies of the new C99 can be obtained directly from
www.ansi.org; the price for an electronic document is only US $18.00.]
Copies are available in the United States from
American National Standards Institute
11 W. 42nd St., 13th floor
New York, NY 10036 USA
(+1) 212 642 4900

and
Global Engineering Documents
15 Inverness Way E
Englewood, CO 80112 USA
(+1) 303 397 2715
(800) 854 7179 (U.S. & Canada)
In other countries, contact the appropriate national standards body, or ISO in Geneva at:
ISO Sales
Case Postale 56
CH-1211 Geneve 20
Switzerland
(or see URL http://www.iso.ch or check the comp.std.internat FAQ list, Standards.Faq).
At the time of this writing, the cost is $130.00 from ANSI or $410.00 from Global. Copies of the original
X3.159 (including the Rationale) may still be available at $205.00 from ANSI or $162.50 from Global.
Note that ANSI derives revenues to support its operations from the sale of printed standards, so
electronic copies are not available.
In the U.S., it may be possible to get a copy of the original ANSI X3.159 (including the Rationale) as
``FIPS PUB 160'' from
http://www.eskimo.com/~scs/C-faq/q11.2.html (1 of 2) [22/07/2003 5:11:08 PM]

Question 11.2

National Technical Information Service (NTIS)


U.S. Department of Commerce
Springfield, VA 22161
703 487 4650
The mistitled Annotated ANSI C Standard, with annotations by Herbert Schildt, contains most of the text
of ISO 9899; it is published by Osborne/McGraw-Hill, ISBN 0-07-881952-0, and sells in the U.S. for
approximately $40. It has been suggested that the price differential between this work and the official
standard reflects the value of the annotations: they are plagued by numerous errors and omissions, and a
few pages of the Standard itself are missing. Many people on the net recommend ignoring the
annotations entirely. A review of the annotations (``annotated annotations'') by Clive Feather can be
found on the web at http://www.lysator.liu.se/c/schildt.html .
The text of the Rationale (not the full Standard) can be obtained by anonymous ftp from ftp.uu.net (see
question 18.16) in directory doc/standards/ansi/X3.159-1989, and is also available on the web at
http://www.lysator.liu.se/c/rat/title.html . The Rationale has also been printed by Silicon Press, ISBN 0929306-07-4.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.2.html (2 of 2) [22/07/2003 5:11:08 PM]

Question 11.1

Question 11.1
What is the ``ANSI C Standard?''

In 1983, the American National Standards Institute (ANSI) commissioned a committee, X3J11, to
standardize the C language. After a long, arduous process, including several widespread public reviews,
the committee's work was finally ratified as ANS X3.159-1989 on December 14, 1989, and published in
the spring of 1990. For the most part, ANSI C standardizes existing practice, with a few additions from
C++ (most notably function prototypes) and support for multinational character sets (including the
controversial trigraph sequences). The ANSI C standard also formalizes the C run-time library support
routines.
More recently, the Standard has been adopted as an international standard, ISO/IEC 9899:1990, and this
ISO Standard replaces the earlier X3.159 even within the United States. Its sections are numbered
differently (briefly, ISO sections 5 through 7 correspond roughly to the old ANSI sections 2 through 4).
As an ISO Standard, it is subject to ongoing revision through the release of Technical Corrigenda and
Normative Addenda.
In 1994, Technical Corrigendum 1 amended the Standard in about 40 places, most of them minor
corrections or clarifications. More recently, Normative Addendum 1 added about 50 pages of new
material, mostly specifying new library functions for internationalization. The production of Technical
Corrigenda is an ongoing process, and a second one is expected in late 1995. In addition, both ANSI and
ISO require periodic review of their standards. This process is beginning in 1995, and will likely result in
a completely revised standard (nicknamed ``C9X'' on the assumption of completion by 1999).
The original ANSI Standard included a ``Rationale,'' explaining many of its decisions, and discussing a
number of subtle points, including several of those covered here. (The Rationale was ``not part of ANSI
Standard X3.159-1989, but... included for information only,'' and is not included with the ISO Standard.)

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.1.html [22/07/2003 5:11:09 PM]

Question 10.26

Question 10.26
How can I write a macro which takes a variable number of arguments?

One popular trick is to define and invoke the macro with a single, parenthesized ``argument'' which in the
macro expansion becomes the entire argument list, parentheses and all, for a function such as printf:
#define DEBUG(args) (printf("DEBUG: "), printf args)
if(n != 0) DEBUG(("n is %d\n", n));
The obvious disadvantage is that the caller must always remember to use the extra parentheses.
gcc has an extension which allows a function-like macro to accept a variable number of arguments, but
it's not standard. Other possible solutions are to use different macros (DEBUG1, DEBUG2, etc.)
depending on the number of arguments, to play games with commas:
#define DEBUG(args) (printf("DEBUG: "), printf(args))
#define _ ,
DEBUG("i = %d" _ i)
It is often better to use a bona-fide function, which can take a variable number of arguments in a welldefined way. See questions 15.4 and 15.5.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.26.html [22/07/2003 5:11:11 PM]

Question 15.4

Question 15.4
How can I write a function that takes a variable number of arguments?

Use the facilities of the <stdarg.h> header.


Here is a function which concatenates an arbitrary number of strings into malloc'ed memory:
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>

/* for malloc, NULL, size_t */


/* for va_ stuff */
/* for strcat et al. */

char *vstrcat(char *first, ...)


{
size_t len;
char *retbuf;
va_list argp;
char *p;
if(first == NULL)
return NULL;
len = strlen(first);
va_start(argp, first);
while((p = va_arg(argp, char *)) != NULL)
len += strlen(p);
va_end(argp);
retbuf = malloc(len + 1);

/* +1 for trailing \0 */

if(retbuf == NULL)
return NULL;

/* error */

(void)strcpy(retbuf, first);
va_start(argp, first);

/* restart for second scan */

while((p = va_arg(argp, char *)) != NULL)


(void)strcat(retbuf, p);
http://www.eskimo.com/~scs/C-faq/q15.4.html (1 of 2) [22/07/2003 5:11:12 PM]

Question 15.4

va_end(argp);
return retbuf;
}
Usage is something like
char *str = vstrcat("Hello, ", "world!", (char *)NULL);
Note the cast on the last argument; see questions 5.2 and 15.3. (Also note that the caller must free the
returned, malloc'ed storage.)
See also question 15.7.
References: K&R2 Sec. 7.3 p. 155, Sec. B7 p. 254
ANSI Sec. 4.8
ISO Sec. 7.8
Rationale Sec. 4.8
H&S Sec. 11.4 pp. 296-9
CT&P Sec. A.3 pp. 139-141
PCS Sec. 11 pp. 184-5, Sec. 13 p. 242

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q15.4.html (2 of 2) [22/07/2003 5:11:12 PM]

Question 5.2

Question 5.2
How do I get a null pointer in my programs?

According to the language definition, a constant 0 in a pointer context is converted into a null pointer at
compile time. That is, in an initialization, assignment, or comparison when one side is a variable or
expression of pointer type, the compiler can tell that a constant 0 on the other side requests a null pointer,
and generate the correctly-typed null pointer value. Therefore, the following fragments are perfectly
legal:
char *p = 0;
if(p != 0)
(See also question 5.3.)
However, an argument being passed to a function is not necessarily recognizable as a pointer context,
and the compiler may not be able to tell that an unadorned 0 ``means'' a null pointer. To generate a null
pointer in a function call context, an explicit cast may be required, to force the 0 to be recognized as a
pointer. For example, the Unix system call execl takes a variable-length, null-pointer-terminated list of
character pointer arguments, and is correctly called like this:
execl("/bin/sh", "sh", "-c", "date", (char *)0);
If the (char *) cast on the last argument were omitted, the compiler would not know to pass a null
pointer, and would pass an integer 0 instead. (Note that many Unix manuals get this example wrong .)
When function prototypes are in scope, argument passing becomes an ``assignment context,'' and most
casts may safely be omitted, since the prototype tells the compiler that a pointer is required, and of which
type, enabling it to correctly convert an unadorned 0. Function prototypes cannot provide the types for
variable arguments in variable-length argument lists however, so explicit casts are still required for those
arguments. (See also question 15.3.) It is safest to properly cast all null pointer constants in function
calls: to guard against varargs functions or those without prototypes, to allow interim use of non-ANSI
compilers, and to demonstrate that you know what you are doing. (Incidentally, it's also a simpler rule to
remember.)
Summary:
Unadorned 0 okay:

Explicit cast required:

http://www.eskimo.com/~scs/C-faq/q5.2.html (1 of 2) [22/07/2003 5:11:15 PM]

Question 5.2

initialization

function call,
no prototype in scope

assignment
comparison

variable argument in
varargs function call

function call,
prototype in scope,
fixed argument
References: K&R1 Sec. A7.7 p. 190, Sec. A7.14 p. 192
K&R2 Sec. A7.10 p. 207, Sec. A7.17 p. 209
ANSI Sec. 3.2.2.3
ISO Sec. 6.2.2.3
H&S Sec. 4.6.3 p. 95, Sec. 6.2.7 p. 171

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.2.html (2 of 2) [22/07/2003 5:11:15 PM]

Question 5.3

Question 5.3
Is the abbreviated pointer comparison ``if(p)'' to test for non-null pointers valid? What if the internal
representation for null pointers is nonzero?

When C requires the Boolean value of an expression (in the if, while, for, and do statements, and
with the &&, ||, !, and ?: operators), a false value is inferred when the expression compares equal to
zero, and a true value otherwise. That is, whenever one writes
if(expr)
where ``expr'' is any expression at all, the compiler essentially acts as if it had been written as
if((expr) != 0)
Substituting the trivial pointer expression ``p'' for ``expr,'' we have
if(p)

is equivalent to

if(p != 0)

and this is a comparison context, so the compiler can tell that the (implicit) 0 is actually a null pointer
constant, and use the correct null pointer value. There is no trickery involved here; compilers do work
this way, and generate identical code for both constructs. The internal representation of a null pointer
does not matter.
The boolean negation operator, !, can be described as follows:
!expr

is essentially equivalent to
or to
((expr) == 0)

(expr)?0:1

which leads to the conclusion that


if(!p)

is equivalent to

if(p == 0)

``Abbreviations'' such as if(p), though perfectly legal, are considered by some to be bad style (and by
others to be good style; see question 17.10).
See also question 9.2.
References: K&R2 Sec. A7.4.7 p. 204
http://www.eskimo.com/~scs/C-faq/q5.3.html (1 of 2) [22/07/2003 5:11:19 PM]

Question 5.3

ANSI Sec. 3.3.3.3, Sec. 3.3.9, Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15, Sec. 3.6.4.1, Sec. 3.6.5
ISO Sec. 6.3.3.3, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5
H&S Sec. 5.3.2 p. 122

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.3.html (2 of 2) [22/07/2003 5:11:19 PM]

Question 17.10

Question 17.10
Some people say that goto's are evil and that I should never use them. Isn't that a bit extreme?

Programming style, like writing style, is somewhat of an art and cannot be codified by inflexible rules,
although discussions about style often seem to center exclusively around such rules.
In the case of the goto statement, it has long been observed that unfettered use of goto's quickly leads
to unmaintainable spaghetti code. However, a simple, unthinking ban on the goto statement does not
necessarily lead immediately to beautiful programming: an unstructured programmer is just as capable of
constructing a Byzantine tangle without using any goto's (perhaps substituting oddly-nested loops and
Boolean control variables, instead).
Most observations or ``rules'' about programming style usually work better as guidelines than rules, and
work much better if programmers understand what the guidelines are trying to accomplish. Blindly
avoiding certain constructs or following rules without understanding them can lead to just as many
problems as the rules were supposed to avert.
Furthermore, many opinions on programming style are just that: opinions. It's usually futile to get
dragged into ``style wars,'' because on certain issues (such as those referred to in questions 9.2, 5.3, 5.9,
and 10.7), opponents can never seem to agree, or agree to disagree, or stop arguing.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q17.10.html [22/07/2003 5:11:20 PM]

Question 9.2

Question 9.2
Isn't #defining TRUE to be 1 dangerous, since any nonzero value is considered ``true'' in C? What if a
built-in logical or relational operator ``returns'' something other than 1?

It is true (sic) that any nonzero value is considered true in C, but this applies only ``on input'', i.e. where a
Boolean value is expected. When a Boolean value is generated by a built-in operator, it is guaranteed to
be 1 or 0. Therefore, the test
if((a == b) == TRUE)
would work as expected (as long as TRUE is 1), but it is obviously silly. In general, explicit tests against
TRUE and FALSE are inappropriate, because some library functions (notably isupper, isalpha, etc.)
return, on success, a nonzero value which is not necessarily 1. (Besides, if you believe that ``if((a ==
b) == TRUE)'' is an improvement over ``if(a == b)'', why stop there? Why not use ``if(((a ==
b) == TRUE) == TRUE)''?) A good rule of thumb is to use TRUE and FALSE (or the like) only for
assignment to a Boolean variable or function parameter, or as the return value from a Boolean function,
but never in a comparison.
The preprocessor macros TRUE and FALSE (and, of course, NULL) are used for code readability, not
because the underlying values might ever change. (See also questions 5.3 and 5.10.)
On the other hand, Boolean values and definitions can evidently be confusing, and some programmers
feel that TRUE and FALSE macros only compound the confusion. (See also question 5.9.)
References: K&R1 Sec. 2.6 p. 39, Sec. 2.7 p. 41
K&R2 Sec. 2.6 p. 42, Sec. 2.7 p. 44, Sec. A7.4.7 p. 204, Sec. A7.9 p. 206
ANSI Sec. 3.3.3.3, Sec. 3.3.8, Sec. 3.3.9, Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15, Sec. 3.6.4.1, Sec. 3.6.5
ISO Sec. 6.3.3.3, Sec. 6.3.8, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5
H&S Sec. 7.5.4 pp. 196-7, Sec. 7.6.4 pp. 207-8, Sec. 7.6.5 pp. 208-9, Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 2189, Sec. 8.5 pp. 238-9, Sec. 8.6 pp. 241-4
``What the Tortoise Said to Achilles''

Read sequentially: prev next up top

http://www.eskimo.com/~scs/C-faq/q9.2.html (1 of 2) [22/07/2003 5:11:22 PM]

Question 9.2

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q9.2.html (2 of 2) [22/07/2003 5:11:22 PM]

Question 5.10

Question 5.10
But wouldn't it be better to use NULL (rather than 0), in case the value of NULL changes, perhaps on a
machine with nonzero internal null pointers?

No. (Using NULL may be preferable, but not for this reason.) Although symbolic constants are often used
in place of numbers because the numbers might change, this is not the reason that NULL is used in place
of 0. Once again, the language guarantees that source-code 0's (in pointer contexts) generate null
pointers. NULL is used only as a stylistic convention. See questions 5.5 and 9.2.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.10.html [22/07/2003 5:11:23 PM]

Question 5.5

Question 5.5
How should NULL be defined on a machine which uses a nonzero bit pattern as the internal
representation of a null pointer?

The same as on any other machine: as 0 (or ((void *)0)).


Whenever a programmer requests a null pointer, either by writing ``0'' or ``NULL,'' it is the compiler's
responsibility to generate whatever bit pattern the machine uses for that null pointer. Therefore, #defining
NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other: the
compiler must always be able to generate the machine's correct null pointers in response to unadorned 0's
seen in pointer contexts. See also questions 5.2, 5.10, and 5.17.
References: ANSI Sec. 4.1.5
ISO Sec. 7.1.6
Rationale Sec. 4.1.5

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.5.html [22/07/2003 5:11:24 PM]

Question 15.3

Question 15.3
I had a frustrating problem which turned out to be caused by the line
printf("%d", n);
where n was actually a long int. I thought that ANSI function prototypes were supposed to guard
against argument type mismatches like this.

When a function accepts a variable number of arguments, its prototype does not (and cannot) provide any
information about the number and types of those variable arguments. Therefore, the usual protections do
not apply in the variable-length part of variable-length argument lists: the compiler cannot perform
implicit conversions or (in general) warn about mismatches.
See also questions 5.2, 11.3, 12.9, and 15.2.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q15.3.html [22/07/2003 5:11:25 PM]

Question 11.3

Question 11.3
My ANSI compiler complains about a mismatch when it sees
extern int func(float);
int func(x)
float x;
{ ...

You have mixed the new-style prototype declaration ``extern int func(float);'' with the oldstyle definition ``int func(x) float x;''. It is usually safe to mix the two styles (see question
11.4), but not in this case.
Old C (and ANSI C, in the absence of prototypes, and in variable-length argument lists; see question
15.2) ``widens'' certain arguments when they are passed to functions. floats are promoted to double,
and characters and short integers are promoted to int. (For old-style function definitions, the values are
automatically converted back to the corresponding narrower types within the body of the called function,
if they are declared that way there.)
This problem can be fixed either by using new-style syntax consistently in the definition:
int func(float x) { ... }
or by changing the new-style prototype declaration to match the old-style definition:
extern int func(double);
(In this case, it would be clearest to change the old-style definition to use double as well, as long as the
address of that parameter is not taken.)
It may also be safer to avoid ``narrow'' (char, short int, and float) function arguments and return
types altogether.
See also question 1.25.
References: K&R1 Sec. A7.1 p. 186
K&R2 Sec. A7.3.2 p. 202
ANSI Sec. 3.3.2.2, Sec. 3.5.4.3
http://www.eskimo.com/~scs/C-faq/q11.3.html (1 of 2) [22/07/2003 5:11:26 PM]

Question 11.3

ISO Sec. 6.3.2.2, Sec. 6.5.4.3


Rationale Sec. 3.3.2.2, Sec. 3.5.4.3
H&S Sec. 9.2 pp. 265-7, Sec. 9.4 pp. 272-3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.3.html (2 of 2) [22/07/2003 5:11:26 PM]

Question 11.4

Question 11.4
Can you mix old-style and new-style function syntax?

Doing so is perfectly legal, as long as you're careful (see especially question 11.3). Note however that oldstyle syntax is marked as obsolescent, so official support for it may be removed some day.
References: ANSI Sec. 3.7.1, Sec. 3.9.5
ISO Sec. 6.7.1, Sec. 6.9.5
H&S Sec. 9.2.2 pp. 265-7, Sec. 9.2.5 pp. 269-70

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.4.html [22/07/2003 5:11:27 PM]

Question 11.5

Question 11.5
Why does the declaration
extern f(struct x *p);
give me an obscure warning message about ``struct x introduced in prototype scope''?

In a quirk of C's normal block scoping rules, a structure declared (or even mentioned) for the first time
within a prototype cannot be compatible with other structures declared in the same source file (it goes out
of scope at the end of the prototype).
To resolve the problem, precede the prototype with the vacuous-looking declaration
struct x;
which places an (incomplete) declaration of struct x at file scope, so that all following declarations
involving struct x can at least be sure they're referring to the same struct x.
References: ANSI Sec. 3.1.2.1, Sec. 3.1.2.6, Sec. 3.5.2.3
ISO Sec. 6.1.2.1, Sec. 6.1.2.6, Sec. 6.5.2.3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.5.html [22/07/2003 5:11:29 PM]

Question 11.8

Question 11.8
I don't understand why I can't use const values in initializers and array dimensions, as in
const int n = 5;
int a[n];

The const qualifier really means ``read-only;'' an object so qualified is a run-time object which cannot
(normally) be assigned to. The value of a const-qualified object is therefore not a constant expression
in the full sense of the term. (C is unlike C++ in this regard.) When you need a true compile-time
constant, use a preprocessor #define.
References: ANSI Sec. 3.4
ISO Sec. 6.4
H&S Secs. 7.11.2,7.11.3 pp. 226-7

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.8.html [22/07/2003 5:11:30 PM]

Question 11.9

Question 11.9
What's the difference between const char *p and char * const p?

const char *p declares a pointer to a constant character (you can't change the character); char *
const p declares a constant pointer to a (variable) character (i.e. you can't change the pointer).
Read these ``inside out'' to understand them; see also question 1.21.
References: ANSI Sec. 3.5.4.1 examples
ISO Sec. 6.5.4.1
Rationale Sec. 3.5.4.1
H&S Sec. 4.4.4 p. 81

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.9.html [22/07/2003 5:11:31 PM]

Question 1.21

Question 1.21
How do I declare an array of N pointers to functions returning pointers to functions returning pointers to
characters?

The first part of this question can be answered in at least three ways:
1. char *(*(*a[N])())();
2. Build the declaration up incrementally, using typedefs:
typedef char *pc;
typedef pc fpc();
typedef fpc *pfpc;
typedef pfpc fpfpc();
typedef fpfpc *pfpfpc;
pfpfpc a[N];

/*
/*
/*
/*
/*
/*

pointer to char */
function returning pointer to char */
pointer to above */
function returning... */
pointer to... */
array of... */

3. Use the cdecl program, which turns English into C and vice versa:
cdecl> declare a as array of pointer to function returning
pointer to function returning pointer to char
char *(*(*a[])())()
cdecl can also explain complicated declarations, help with casts, and indicate which set of parentheses
the arguments go in (for complicated function definitions, like the one above). Versions of cdecl are in
volume 14 of comp.sources.unix (see question 18.16) and K&R2.
Any good book on C should explain how to read these complicated C declarations ``inside out'' to understand them
(``declaration mimics use'').
The pointer-to-function declarations in the examples above have not included parameter type information. When
the parameters have complicated types, declarations can really get messy. (Modern versions of cdecl can help
here, too.)
References: K&R2 Sec. 5.12 p. 122
ANSI Sec. 3.5ff (esp. Sec. 3.5.4)
ISO Sec. 6.5ff (esp. Sec. 6.5.4)
H&S Sec. 4.5 pp. 85-92, Sec. 5.10.1 pp. 149-50

Read sequentially: prev next up top

http://www.eskimo.com/~scs/C-faq/q1.21.html (1 of 2) [22/07/2003 5:11:36 PM]

Question 1.21

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q1.21.html (2 of 2) [22/07/2003 5:11:36 PM]

Question 1.14

Question 1.14
I can't seem to define a linked list successfully. I tried
typedef struct {
char *item;
NODEPTR next;
} *NODEPTR;
but the compiler gave me error messages. Can't a structure in C contain a pointer to itself?

Structures in C can certainly contain pointers to themselves; the discussion and example in section 6.5 of
K&R make this clear. The problem with the NODEPTR example is that the typedef has not been defined
at the point where the next field is declared. To fix this code, first give the structure a tag (``struct
node''). Then, declare the next field as a simple struct node *, or disentangle the typedef
declaration from the structure definition, or both. One corrected version would be
struct node {
char *item;
struct node *next;
};
typedef struct node *NODEPTR;
and there are at least three other equivalently correct ways of arranging it.
A similar problem, with a similar solution, can arise when attempting to declare a pair of typedef'ed
mutually referential structures.
See also question 2.1.
References: K&R1 Sec. 6.5 p. 101
K&R2 Sec. 6.5 p. 139
ANSI Sec. 3.5.2, Sec. 3.5.2.3, esp. examples
ISO Sec. 6.5.2, Sec. 6.5.2.3
H&S Sec. 5.6.1 pp. 132-3

http://www.eskimo.com/~scs/C-faq/q1.14.html (1 of 2) [22/07/2003 5:11:37 PM]

Question 1.14

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q1.14.html (2 of 2) [22/07/2003 5:11:37 PM]

Question 2.1

Question 2.1
What's the difference between these two declarations?
struct x1 { ... };
typedef struct { ... } x2;

The first form declares a structure tag; the second declares a typedef. The main difference is that the
second declaration is of a slightly more abstract type--its users don't necessarily know that it is a
structure, and the keyword struct is not used when declaring instances of it.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.1.html [22/07/2003 5:11:39 PM]

Question 1.34

Question 1.34
I finally figured out the syntax for declaring pointers to functions, but now how do I initialize one?

Use something like


extern int func();
int (*fp)() = func;
When the name of a function appears in an expression like this, it ``decays'' into a pointer (that is, it has
its address implicitly taken), much as an array name does.
An explicit declaration for the function is normally needed, since implicit external function declaration
does not happen in this case (because the function name in the initialization is not part of a function call).
See also question 4.12.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q1.34.html [22/07/2003 5:11:40 PM]

Question 4.12

Question 4.12
I've seen different methods used for calling functions via pointers. What's the story?

Originally, a pointer to a function had to be ``turned into'' a ``real'' function, with the * operator (and an
extra pair of parentheses, to keep the precedence straight), before calling:
int r, func(), (*fp)() = func;
r = (*fp)();
It can also be argued that functions are always called via pointers, and that ``real'' function names always
decay implicitly into pointers (in expressions, as they do in initializations; see question 1.34). This
reasoning, made widespread through pcc and adopted in the ANSI standard, means that
r = fp();
is legal and works correctly, whether fp is the name of a function or a pointer to one. (The usage has
always been unambiguous; there is nothing you ever could have done with a function pointer followed by
an argument list except call the function pointed to.) An explicit * is still allowed (and recommended, if
portability to older compilers is important).
See also question 1.34.
References: K&R1 Sec. 5.12 p. 116
K&R2 Sec. 5.11 p. 120
ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
Rationale Sec. 3.3.2.2
H&S Sec. 5.8 p. 147, Sec. 7.4.3 p. 190

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q4.12.html [22/07/2003 5:11:42 PM]

Question 4.11

Question 4.11
Does C even have ``pass by reference''?

Not really. Strictly speaking, C always uses pass by value. You can simulate pass by reference yourself,
by defining functions which accept pointers and then using the & operator when calling, and the compiler
will essentially simulate it for you when you pass an array to a function (by passing a pointer instead, see
question 6.4 et al.), but C has nothing truly equivalent to formal pass by reference or C++ reference
parameters. (However, function-like preprocessor macros do provide a form of ``call by name''.)
See also questions 4.8 and 20.1.
References: K&R1 Sec. 1.8 pp. 24-5, Sec. 5.2 pp. 91-3
K&R2 Sec. 1.8 pp. 27-8, Sec. 5.2 pp. 91-3
ANSI Sec. 3.3.2.2, esp. footnote 39
ISO Sec. 6.3.2.2
H&S Sec. 9.5 pp. 273-4

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q4.11.html [22/07/2003 5:11:43 PM]

Question 6.4

Question 6.4
Then why are array and pointer declarations interchangeable as function formal parameters?

It's supposed to be a convenience.


Since arrays decay immediately into pointers, an array is never actually passed to a function. Allowing
pointer parameters to be declared as arrays is a simply a way of making it look as though the array was
being passed--a programmer may wish to emphasize that a parameter is traditionally treated as if it were
an array, or that an array (strictly speaking, the address) is traditionally passed. As a convenience,
therefore, any parameter declarations which ``look like'' arrays, e.g.
f(a)
char a[];
{ ... }
are treated by the compiler as if they were pointers, since that is what the function will receive if an array
is passed:
f(a)
char *a;
{ ... }
This conversion holds only within function formal parameter declarations, nowhere else. If the
conversion bothers you, avoid it; many people have concluded that the confusion it causes outweighs the
small advantage of having the declaration ``look like'' the call or the uses within the function.
See also question 6.21.
References: K&R1 Sec. 5.3 p. 95, Sec. A10.1 p. 205
K&R2 Sec. 5.3 p. 100, Sec. A8.6.3 p. 218, Sec. A10.1 p. 226
ANSI Sec. 3.5.4.3, Sec. 3.7.1, Sec. 3.9.6
ISO Sec. 6.5.4.3, Sec. 6.7.1, Sec. 6.9.6
H&S Sec. 9.3 p. 271
CT&P Sec. 3.3 pp. 33-4

Read sequentially: prev next up top


http://www.eskimo.com/~scs/C-faq/q6.4.html (1 of 2) [22/07/2003 5:11:45 PM]

Question 6.4

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.4.html (2 of 2) [22/07/2003 5:11:45 PM]

Question 6.21

Question 6.21
Why doesn't sizeof properly report the size of an array when the array is a parameter to a function?

The compiler pretends that the array parameter was declared as a pointer (see question 6.4), and sizeof
reports the size of the pointer.
References: H&S Sec. 7.5.2 p. 195

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.21.html [22/07/2003 5:11:47 PM]

Question 6.20

Question 6.20
How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when
passing them to functions?

There is no single perfect method. Given the declarations


int
int
int
int
int

array[NROWS][NCOLUMNS];
**array1;
/* ragged */
**array2;
/* contiguous */
*array3;
/* "flattened" */
(*array4)[NCOLUMNS];

with the pointers initialized as in the code fragments in question 6.16, and functions declared as
f1(int a[][NCOLUMNS], int nrows, int ncolumns);
f2(int *aryp, int nrows, int ncolumns);
f3(int **pp, int nrows, int ncolumns);
where f1 accepts a conventional two-dimensional array, f2 accepts a ``flattened'' two-dimensional
array, and f3 accepts a pointer-to-pointer, simulated array (see also questions 6.18 and 6.19), the
following calls should work as expected:
f1(array, NROWS, NCOLUMNS);
f1(array4, nrows, NCOLUMNS);
f2(&array[0][0], NROWS, NCOLUMNS);
f2(*array, NROWS, NCOLUMNS);
f2(*array2, nrows, ncolumns);
f2(array3, nrows, ncolumns);
f2(*array4, nrows, NCOLUMNS);
f3(array1, nrows, ncolumns);
f3(array2, nrows, ncolumns);
The following two calls would probably work on most systems, but involve questionable casts, and work
only if the dynamic ncolumns matches the static NCOLUMNS:
f1((int (*)[NCOLUMNS])(*array2), nrows, ncolumns);
f1((int (*)[NCOLUMNS])array3, nrows, ncolumns);

http://www.eskimo.com/~scs/C-faq/q6.20.html (1 of 2) [22/07/2003 5:11:49 PM]

Question 6.20

It must again be noted that passing &array[0][0] (or, equivalently, *array) to f2 is not strictly
conforming; see question 6.19.
If you can understand why all of the above calls work and are written as they are, and if you understand
why the combinations that are not listed would not work, then you have a very good understanding of
arrays and pointers in C.
Rather than worrying about all of this, one approach to using multidimensional arrays of various sizes is
to make them all dynamic, as in question 6.16. If there are no static multidimensional arrays--if all arrays
are allocated like array1 or array2 in question 6.16--then all functions can be written like f3.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.20.html (2 of 2) [22/07/2003 5:11:49 PM]

Question 6.16

Question 6.16
How can I dynamically allocate a multidimensional array?

It is usually best to allocate an array of pointers, and then initialize each pointer to a dynamicallyallocated ``row.'' Here is a two-dimensional example:
#include <stdlib.h>
int **array1 = (int **)malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
array1[i] = (int *)malloc(ncolumns * sizeof(int));
(In real code, of course, all of malloc's return values would be checked.)
You can keep the array's contents contiguous, while making later reallocation of individual rows
difficult, with a bit of explicit pointer arithmetic:
int **array2 = (int **)malloc(nrows * sizeof(int *));
array2[0] = (int *)malloc(nrows * ncolumns * sizeof(int));
for(i = 1; i < nrows; i++)
array2[i] = array2[0] + i * ncolumns;
In either case, the elements of the dynamic array can be accessed with normal-looking array subscripts:
arrayx[i][j] (for 0 <= i < NROWS and 0 <= j < NCOLUMNS).
If the double indirection implied by the above schemes is for some reason unacceptable, you can
simulate a two-dimensional array with a single, dynamically-allocated one-dimensional array:
int *array3 = (int *)malloc(nrows * ncolumns * sizeof(int));
However, you must now perform subscript calculations manually, accessing the i,jth element with
array3[i * ncolumns + j]. (A macro could hide the explicit calculation, but invoking it would
require parentheses and commas which wouldn't look exactly like multidimensional array syntax, and the
macro would need access to at least one of the dimensions, as well. See also question 6.19.)
Finally, you could use pointers to arrays:
int (*array4)[NCOLUMNS] =
http://www.eskimo.com/~scs/C-faq/q6.16.html (1 of 2) [22/07/2003 5:11:51 PM]

Question 6.16

(int (*)[NCOLUMNS])malloc(nrows * sizeof(*array4));


but the syntax starts getting horrific and at most one dimension may be specified at run time.
With all of these techniques, you may of course need to remember to free the arrays (which may take
several steps; see question 7.23) when they are no longer needed, and you cannot necessarily intermix
dynamically-allocated arrays with conventional, statically-allocated ones (see question 6.20, and also
question 6.18).
All of these techniques can also be extended to three or more dimensions.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.16.html (2 of 2) [22/07/2003 5:11:51 PM]

Question 6.19

Question 6.19
How do I write functions which accept two-dimensional arrays when the ``width'' is not known at
compile time?

It's not easy. One way is to pass in a pointer to the [0][0] element, along with the two dimensions, and
simulate array subscripting ``by hand:''
f2(aryp, nrows, ncolumns)
int *aryp;
int nrows, ncolumns;
{ ... array[i][j] is accessed as aryp[i * ncolumns + j] ... }
This function could be called with the array from question 6.18 as
f2(&array[0][0], NROWS, NCOLUMNS);
It must be noted, however, that a program which performs multidimensional array subscripting ``by
hand'' in this way is not in strict conformance with the ANSI C Standard; according to an official
interpretation, the behavior of accessing (&array[0][0])[x] is not defined for x >= NCOLUMNS.
gcc allows local arrays to be declared having sizes which are specified by a function's arguments, but
this is a nonstandard extension.
When you want to be able to use a function on multidimensional arrays of various sizes, one solution is
to simulate all the arrays dynamically, as in question 6.16.
See also questions 6.18, 6.20, and 6.15.
References: ANSI Sec. 3.3.6
ISO Sec. 6.3.6

Read sequentially: prev next up top

http://www.eskimo.com/~scs/C-faq/q6.19.html (1 of 2) [22/07/2003 5:11:53 PM]

Question 6.19

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.19.html (2 of 2) [22/07/2003 5:11:53 PM]

Question 6.18

Question 6.18
My compiler complained when I passed a two-dimensional array to a function expecting a pointer to a
pointer.

The rule (see question 6.3) by which arrays decay into pointers is not applied recursively. An array of
arrays (i.e. a two-dimensional array in C) decays into a pointer to an array, not a pointer to a pointer.
Pointers to arrays can be confusing, and must be treated carefully; see also question 6.13. (The confusion
is heightened by the existence of incorrect compilers, including some old versions of pcc and pccderived lints, which improperly accept assignments of multi-dimensional arrays to multi-level
pointers.)
If you are passing a two-dimensional array to a function:
int array[NROWS][NCOLUMNS];
f(array);
the function's declaration must match:
f(int a[][NCOLUMNS])
{ ... }
or
f(int (*ap)[NCOLUMNS])
{ ... }

/* ap is a pointer to an array */

In the first declaration, the compiler performs the usual implicit parameter rewriting of ``array of array''
to ``pointer to array'' (see questions 6.3 and 6.4); in the second form the pointer declaration is explicit.
Since the called function does not allocate space for the array, it does not need to know the overall size,
so the number of rows, NROWS, can be omitted. The ``shape'' of the array is still important, so the column
dimension NCOLUMNS (and, for three- or more dimensional arrays, the intervening ones) must be
retained.
If a function is already declared as accepting a pointer to a pointer, it is probably meaningless to pass a
two-dimensional array directly to it.
See also questions 6.12 and 6.15.

http://www.eskimo.com/~scs/C-faq/q6.18.html (1 of 2) [22/07/2003 5:11:55 PM]

Question 6.18

References: K&R1 Sec. 5.10 p. 110


K&R2 Sec. 5.9 p. 113
H&S Sec. 5.4.3 p. 126

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.18.html (2 of 2) [22/07/2003 5:11:55 PM]

Question 6.3

Question 6.3
So what is meant by the ``equivalence of pointers and arrays'' in C?

Much of the confusion surrounding arrays and pointers in C can be traced to a misunderstanding of this
statement. Saying that arrays and pointers are ``equivalent'' means neither that they are identical nor even
interchangeable.
``Equivalence'' refers to the following key definition:
An lvalue of type array-of-T which appears in an expression decays (with three
exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-toT.
(The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal
initializer for a character array.)
As a consequence of this definition, the compiler doesn't apply the array subscripting operator [] that
differently to arrays and pointers, after all. In an expression of the form a[i], the array decays into a
pointer, following the rule above, and is then subscripted just as would be a pointer variable in the
expression p[i] (although the eventual memory accesses will be different, as explained in question 6.2).
If you were to assign the array's address to the pointer:
p = a;
then p[3] and a[3] would access the same element.
See also question 6.8.
References: K&R1 Sec. 5.3 pp. 93-6
K&R2 Sec. 5.3 p. 99
ANSI Sec. 3.2.2.1, Sec. 3.3.2.1, Sec. 3.3.6
ISO Sec. 6.2.2.1, Sec. 6.3.2.1, Sec. 6.3.6
H&S Sec. 5.4.1 p. 124

Read sequentially: prev next up top

http://www.eskimo.com/~scs/C-faq/q6.3.html (1 of 2) [22/07/2003 5:11:57 PM]

Question 6.3

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.3.html (2 of 2) [22/07/2003 5:11:57 PM]

Question 6.2

Question 6.2
But I heard that char a[] was identical to char *a.

Not at all. (What you heard has to do with formal parameters to functions; see question 6.4.) Arrays are
not pointers. The array declaration char a[6] requests that space for six characters be set aside, to be
known by the name ``a.'' That is, there is a location named ``a'' at which six characters can sit. The
pointer declaration char *p, on the other hand, requests a place which holds a pointer, to be known by
the name ``p.'' This pointer can point almost anywhere: to any char, or to any contiguous array of
chars, or nowhere (see also questions 5.1 and 1.30).
As usual, a picture is worth a thousand words. The declarations
char a[] = "hello";
char *p = "world";
would initialize data structures which could be represented like this:
+---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
+---+---+---+---+---+---+
+-----+
+---+---+---+---+---+---+
p: | *======> | w | o | r | l | d |\0 |
+-----+
+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an
array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits
code to start at the location ``a,'' move three past it, and fetch the character there. When it sees the
expression p[3], it emits code to start at the location ``p,'' fetch the pointer value there, add three to the
pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of)
the object named a, while p[3] is three places past the object pointed to by p. In the example above,
both a[3] and p[3] happen to be the character 'l', but the compiler gets there differently.
References: K&R2 Sec. 5.5 p. 104
CT&P Sec. 4.5 pp. 64-5

Read sequentially: prev next up top


http://www.eskimo.com/~scs/C-faq/q6.2.html (1 of 2) [22/07/2003 5:11:58 PM]

Question 6.2

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.2.html (2 of 2) [22/07/2003 5:11:58 PM]

Question 5.1

Question 5.1
What is this infamous null pointer, anyway?

The language definition states that for each pointer type, there is a special value--the ``null pointer''-which is distinguishable from all other pointer values and which is ``guaranteed to compare unequal to a
pointer to any object or function.'' That is, the address-of operator & will never yield a null pointer, nor
will a successful call to malloc. (malloc does return a null pointer when it fails, and this is a typical
use of null pointers: as a ``special'' pointer value with some other meaning, usually ``not allocated'' or
``not pointing anywhere yet.'')
A null pointer is conceptually different from an uninitialized pointer. A null pointer is known not to point
to any object or function; an uninitialized pointer might point anywhere. See also questions 1.30, 7.1, and
7.31.
As mentioned above, there is a null pointer for each pointer type, and the internal values of null pointers
for different types may be different. Although programmers need not know the internal values, the
compiler must always be informed which type of null pointer is required, so that it can make the
distinction if necessary (see questions 5.2, 5.5, and 5.6).
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
ANSI Sec. 3.2.2.3
ISO Sec. 6.2.2.3
Rationale Sec. 3.2.2.3
H&S Sec. 5.3.2 pp. 121-3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.1.html [22/07/2003 5:12:00 PM]

Question 1.30

Question 1.30
What can I safely assume about the initial values of variables which are not explicitly initialized? If
global variables start out as ``zero,'' is that good enough for null pointers and floating-point zeroes?

Variables with static duration (that is, those declared outside of functions, and those declared with the
storage class static), are guaranteed initialized (just once, at program startup) to zero, as if the
programmer had typed ``= 0''. Therefore, such variables are initialized to the null pointer (of the correct
type; see also section 5) if they are pointers, and to 0.0 if they are floating-point.
Variables with automatic duration (i.e. local variables without the static storage class) start out
containing garbage, unless they are explicitly initialized. (Nothing useful can be predicted about the
garbage.)
Dynamically-allocated memory obtained with malloc and realloc is also likely to contain garbage,
and must be initialized by the calling program, as appropriate. Memory obtained with calloc is all-bits0, but this is not necessarily useful for pointer or floating-point values (see question 7.31, and section 5).
References: K&R1 Sec. 4.9 pp. 82-4
K&R2 Sec. 4.9 pp. 85-86
ANSI Sec. 3.5.7, Sec. 4.10.3.1, Sec. 4.10.5.3
ISO Sec. 6.5.7, Sec. 7.10.3.1, Sec. 7.10.5.3
H&S Sec. 4.2.8 pp. 72-3, Sec. 4.6 pp. 92-3, Sec. 4.6.2 pp. 94-5, Sec. 4.6.3 p. 96, Sec. 16.1 p. 386

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q1.30.html [22/07/2003 5:12:02 PM]

Null Pointers

5. Null Pointers
5.1 What is this infamous null pointer, anyway?
5.2 How do I get a null pointer in my programs?
5.3 Is the abbreviated pointer comparison ``if(p)'' to test for non-null pointers valid?
5.4 What is NULL and how is it #defined?
5.5 How should NULL be defined on a machine which uses a nonzero bit pattern as the internal
representation of a null pointer?
5.6 If NULL were defined as ``((char *)0),'' wouldn't that make function calls which pass an uncast
NULL work?
5.9 If NULL and 0 are equivalent as null pointer constants, which should I use?
5.10 But wouldn't it be better to use NULL, in case the value of NULL changes?
5.12 I use the preprocessor macro "#define Nullptr(type) (type *)0" to help me build null
pointers of the correct type.
5.13 This is strange. NULL is guaranteed to be 0, but the null pointer is not?
5.14 Why is there so much confusion surrounding null pointers?
5.15 I'm confused. I just can't understand all this null pointer stuff.
5.16 Given all the confusion surrounding null pointers, wouldn't it be easier simply to require them to be
represented internally by zeroes?
5.17 Seriously, have any actual machines really used nonzero null pointers?
5.20 What does a run-time ``null pointer assignment'' error mean?

http://www.eskimo.com/~scs/C-faq/s5.html (1 of 2) [22/07/2003 5:12:03 PM]

Null Pointers

top

http://www.eskimo.com/~scs/C-faq/s5.html (2 of 2) [22/07/2003 5:12:03 PM]

Question 5.4

Question 5.4
What is NULL and how is it #defined?

As a matter of style, many programmers prefer not to have unadorned 0's scattered through their
programs. Therefore, the preprocessor macro NULL is #defined (by <stdio.h> or <stddef.h>)
with the value 0, possibly cast to (void *) (see also question 5.6). A programmer who wishes to make
explicit the distinction between 0 the integer and 0 the null pointer constant can then use NULL
whenever a null pointer is required.
Using NULL is a stylistic convention only; the preprocessor turns NULL back into 0 which is then
recognized by the compiler, in pointer contexts, as before. In particular, a cast may still be necessary
before NULL (as before 0) in a function call argument. The table under question 5.2 above applies for
NULL as well as 0 (an unadorned NULL is equivalent to an unadorned 0).
NULL should only be used for pointers; see question 5.9.
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
ANSI Sec. 4.1.5, Sec. 3.2.2.3
ISO Sec. 7.1.6, Sec. 6.2.2.3
Rationale Sec. 4.1.5
H&S Sec. 5.3.2 p. 122, Sec. 11.1 p. 292

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.4.html [22/07/2003 5:12:05 PM]

Question 5.6

Question 5.6
If NULL were defined as follows:
#define NULL ((char *)0)
wouldn't that make function calls which pass an uncast NULL work?

Not in general. The problem is that there are machines which use different internal representations for
pointers to different types of data. The suggested definition would make uncast NULL arguments to
functions expecting pointers to characters work correctly, but pointer arguments of other types would
still be problematical, and legal constructions such as
FILE *fp = NULL;
could fail.
Nevertheless, ANSI C allows the alternate definition
#define NULL ((void *)0)
for NULL. Besides potentially helping incorrect programs to work (but only on machines with
homogeneous pointers, thus questionably valid assistance), this definition may catch programs which use
NULL incorrectly (e.g. when the ASCII NUL character was really intended; see question 5.9).
References: Rationale Sec. 4.1.5

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.6.html [22/07/2003 5:12:07 PM]

Question 5.9

Question 5.9
If NULL and 0 are equivalent as null pointer constants, which should I use?

Many programmers believe that NULL should be used in all pointer contexts, as a reminder that the value
is to be thought of as a pointer. Others feel that the confusion surrounding NULL and 0 is only
compounded by hiding 0 behind a macro, and prefer to use unadorned 0 instead. There is no one right
answer. (See also questions 9.2 and 17.10.) C programmers must understand that NULL and 0 are
interchangeable in pointer contexts, and that an uncast 0 is perfectly acceptable. Any usage of NULL (as
opposed to 0) should be considered a gentle reminder that a pointer is involved; programmers should not
depend on it (either for their own understanding or the compiler's) for distinguishing pointer 0's from
integer 0's.
NULL should not be used when another kind of 0 is required, even though it might work, because doing
so sends the wrong stylistic message. (Furthermore, ANSI allows the definition of NULL to be ((void
*)0), which will not work at all in non-pointer contexts.) In particular, do not use NULL when the
ASCII null character (NUL) is desired. Provide your own definition
#define NUL '\0'
if you must.
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.9.html [22/07/2003 5:12:08 PM]

Question 5.17

Question 5.17
Seriously, have any actual machines really used nonzero null pointers, or different representations for
pointers to different types?

The Prime 50 series used segment 07777, offset 0 for the null pointer, at least for PL/I. Later models used
segment 0, offset 0 for null pointers in C, necessitating new instructions such as TCNP (Test C Null
Pointer), evidently as a sop to all the extant poorly-written C code which made incorrect assumptions.
Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char
*'s) than word pointers (int *'s).
The Eclipse MV series from Data General has three architecturally supported pointer formats (word,
byte, and bit pointers), two of which are used by C compilers: byte pointers for char * and void *,
and word pointers for everything else.
Some Honeywell-Bull mainframes use the bit pattern 06000 for (internal) null pointers.
The CDC Cyber 180 Series has 48-bit pointers consisting of a ring, segment, and offset. Most users (in
ring 11) have null pointers of 0xB00000000000. It was common on old CDC ones-complement machines
to use an all-one-bits word as a special flag for all kinds of data, including invalid addresses.
The old HP 3000 series uses a different addressing scheme for byte addresses than for word addresses;
like several of the machines above it therefore uses different representations for char * and void *
pointers than for other pointers.
The Symbolics Lisp Machine, a tagged architecture, does not even have conventional numeric pointers; it
uses the pair <NIL, 0> (basically a nonexistent <object, offset> handle) as a C null pointer.
Depending on the ``memory model'' in use, 8086-family processors (PC compatibles) may use 16-bit
data pointers and 32-bit function pointers, or vice versa.
Some 64-bit Cray machines represent int * in the lower 48 bits of a word; char * additionally uses
the upper 16 bits to indicate a byte address within a word.
References: K&R1 Sec. A14.4 p. 211

Read sequentially: prev next up top


http://www.eskimo.com/~scs/C-faq/q5.17.html (1 of 2) [22/07/2003 5:12:09 PM]

Question 5.17

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.17.html (2 of 2) [22/07/2003 5:12:09 PM]

Question 5.16

Question 5.16
Given all the confusion surrounding null pointers, wouldn't it be easier simply to require them to be
represented internally by zeroes?

If for no other reason, doing so would be ill-advised because it would unnecessarily constrain
implementations which would otherwise naturally represent null pointers by special, nonzero bit patterns,
particularly when those values would trigger automatic hardware traps for invalid accesses.
Besides, what would such a requirement really accomplish? Proper understanding of null pointers does
not require knowledge of the internal representation, whether zero or nonzero. Assuming that null
pointers are internally zero does not make any code easier to write (except for a certain ill-advised usage
of calloc; see question 7.31). Known-zero internal pointers would not obviate casts in function calls,
because the size of the pointer might still be different from that of an int. (If ``nil'' were used to request
null pointers, as mentioned in question 5.14, the urge to assume an internal zero representation would not
even arise.)

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.16.html [22/07/2003 5:12:11 PM]

Question 7.31

Question 7.31
What's the difference between calloc and malloc? Is it safe to take advantage of calloc's zerofilling? Does free work on memory allocated with calloc, or do you need a cfree?

calloc(m, n) is essentially equivalent to


p = malloc(m * n);
memset(p, 0, m * n);
The zero fill is all-bits-zero, and does not therefore guarantee useful null pointer values (see section 5 of
this list) or floating-point zero values. free is properly used to free the memory allocated by calloc.
References: ANSI Sec. 4.10.3 to 4.10.3.2
ISO Sec. 7.10.3 to 7.10.3.2
H&S Sec. 16.1 p. 386, Sec. 16.2 p. 386
PCS Sec. 11 pp. 141,142

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.31.html [22/07/2003 5:12:13 PM]

Question 7.30

Question 7.30
Is it legal to pass a null pointer as the first argument to realloc? Why would you want to?

ANSI C sanctions this usage (and the related realloc(..., 0), which frees), although several earlier
implementations do not support it, so it may not be fully portable. Passing an initially-null pointer to
realloc can make it easier to write a self-starting incremental allocation algorithm.
References: ANSI Sec. 4.10.3.4
ISO Sec. 7.10.3.4
H&S Sec. 16.3 p. 388

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.30.html [22/07/2003 5:12:14 PM]

Question 7.27

Question 7.27
So can I query the malloc package to find out how big an allocated block is?

Not portably.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.27.html [22/07/2003 5:12:17 PM]

Question 7.26

Question 7.26
How does free know how many bytes to free?

The malloc/free implementation remembers the size of each block allocated and returned, so it is not
necessary to remind it of the size when freeing.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.26.html [22/07/2003 5:12:20 PM]

Question 7.25

Question 7.25
I have a program which mallocs and later frees a lot of memory, but memory usage (as reported by
ps) doesn't seem to go back down.

Most implementations of malloc/free do not return freed memory to the operating system (if there
is one), but merely make it available for future malloc calls within the same program.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.25.html [22/07/2003 5:12:21 PM]

Question 7.24

Question 7.24
Must I free allocated memory before the program exits?

You shouldn't have to. A real operating system definitively reclaims all memory when a program exits.
Nevertheless, some personal computers are said not to reliably recover memory, and all that can be
inferred from the ANSI/ISO C Standard is that this is a ``quality of implementation issue.''
References: ANSI Sec. 4.10.3.2
ISO Sec. 7.10.3.2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.24.html [22/07/2003 5:12:23 PM]

Question 7.23

Question 7.23
I'm allocating structures which contain pointers to other dynamically-allocated objects. When I free a
structure, do I have to free each subsidiary pointer first?

Yes. In general, you must arrange that each pointer returned from malloc be individually passed to
free, exactly once (if it is freed at all).
A good rule of thumb is that for each call to malloc in a program, you should be able to point at the call
to free which frees the memory allocated by that malloc call.
See also question 7.24.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.23.html [22/07/2003 5:12:24 PM]

Question 7.22

Question 7.22
When I call malloc to allocate memory for a local pointer, do I have to explicitly free it?

Yes. Remember that a pointer is different from what it points to. Local variables are deallocated when
the function returns, but in the case of a pointer variable, this means that the pointer is deallocated, not
what it points to. Memory allocated with malloc always persists until you explicitly free it. In general,
for every call to malloc, there should be a corresponding call to free.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.22.html [22/07/2003 5:12:27 PM]

Question 7.21

Question 7.21
Why isn't a pointer null after calling free?
How unsafe is it to use (assign, compare) a pointer value after it's been freed?

When you call free, the memory pointed to by the passed pointer is freed, but the value of the pointer
in the caller remains unchanged, because C's pass-by-value semantics mean that called functions never
permanently change the values of their arguments. (See also question 4.8.)
A pointer value which has been freed is, strictly speaking, invalid, and any use of it, even if is not
dereferenced can theoretically lead to trouble, though as a quality of implementation issue, most
implementations will probably not go out of their way to generate exceptions for innocuous uses of
invalid pointers.
References: ANSI Sec. 4.10.3
ISO Sec. 7.10.3
Rationale Sec. 3.2.2.3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.21.html [22/07/2003 5:12:28 PM]

Question 4.8

Question 4.8
I have a function which accepts, and is supposed to initialize, a pointer:
void f(ip)
int *ip;
{
static int dummy = 5;
ip = &dummy;
}
But when I call it like this:
int *ip;
f(ip);
the pointer in the caller remains unchanged.

Are you sure the function initialized what you thought it did? Remember that arguments in C are passed
by value. The called function altered only the passed copy of the pointer. You'll either want to pass the
address of the pointer (the function will end up accepting a pointer-to-a-pointer), or have the function
return the pointer.
See also questions 4.9 and 4.11.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q4.8.html [22/07/2003 5:12:30 PM]

Question 4.9

Question 4.9
Can I use a void ** pointer to pass a generic pointer to a function by reference?

Not portably. There is no generic pointer-to-pointer type in C. void * acts as a generic pointer only
because conversions are applied automatically when other pointer types are assigned to and from void
*'s; these conversions cannot be performed (the correct underlying pointer type is not known) if an
attempt is made to indirect upon a void ** value which points at something other than a void *.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q4.9.html [22/07/2003 5:12:34 PM]

Question 4.10

Question 4.10
I have a function
extern int f(int *);
which accepts a pointer to an int. How can I pass a constant by reference? A call like
f(&5);
doesn't seem to work.

You can't do this directly. You will have to declare a temporary variable, and then pass its address to the
function:
int five = 5;
f(&five);
See also questions 2.10, 4.8, and 20.1.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q4.10.html [22/07/2003 5:12:35 PM]

Question 2.10

Question 2.10
How can I pass constant values to functions which accept structure arguments?

C has no way of generating anonymous structure values. You will have to use a temporary structure
variable or a little structure-building function; see question 14.11 for an example. (gcc provides
structure constants as an extension, and the mechanism will probably be added to a future revision of the
C Standard.) See also question 4.10.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.10.html [22/07/2003 5:12:37 PM]

Question 2.6

Question 2.6
I came across some code that declared a structure like this:
struct name {
int namelen;
char namestr[1];
};
and then did some tricky allocation to make the namestr array act like it had several elements. Is this
legal or portable?

This technique is popular, although Dennis Ritchie has called it ``unwarranted chumminess with the C
implementation.'' An official interpretation has deemed that it is not strictly conforming with the C
Standard. (A thorough treatment of the arguments surrounding the legality of the technique is beyond the
scope of this list.) It does seem to be portable to all known implementations. (Compilers which check
array bounds carefully might issue warnings.)
Another possibility is to declare the variable-size element very large, rather than very small; in the case
of the above example:
...
char namestr[MAXSIZE];
...
where MAXSIZE is larger than any name which will be stored. However, it looks like this technique is
disallowed by a strict interpretation of the Standard as well.
References: Rationale Sec. 3.5.4.2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.6.html [22/07/2003 5:12:39 PM]

Question 2.4

Question 2.4
What's the best way of implementing opaque (abstract) data types in C?

One good way is for clients to use structure pointers (perhaps additionally hidden behind typedefs)
which point to structure types which are not publicly defined.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.4.html [22/07/2003 5:12:41 PM]

Question 2.3

Question 2.3
Can a structure contain a pointer to itself?

Most certainly. See question 1.14.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.3.html [22/07/2003 5:12:43 PM]

Question 2.2

Question 2.2
Why doesn't
struct x { ... };
x thestruct;
work?

C is not C++. Typedef names are not automatically generated for structure tags. See also question 2.1.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.2.html [22/07/2003 5:12:44 PM]

Structures, Unions, and Enumerations

2. Structures, Unions, and Enumerations


2.1 What's the difference between struct x1 { ... }; and typedef struct { ... } x2;
?
2.2 Why doesn't "struct x { ... }; x thestruct;" work?
2.3 Can a structure contain a pointer to itself?
2.4 What's the best way of implementing opaque (abstract) data types in C?
2.6 I came across some code that declared a structure with the last member an array of one element, and
then did some tricky allocation to make it act like the array had several elements. Is this legal or
portable?
2.7 I heard that structures could be assigned to variables and passed to and from functions, but K&R1
says not.
2.8 Why can't you compare structures?
2.9 How are structure passing and returning implemented?
2.10 Can I pass constant values to functions which accept structure arguments?
2.11 How can I read/write structures from/to data files?
2.12 How can I turn off structure padding?
2.13 Why does sizeof report a larger size than I expect for a structure type?
2.14 How can I determine the byte offset of a field within a structure?
2.15 How can I access structure fields by name at run time?
2.18 I have a program which works correctly, but dumps core after it finishes. Why?
2.20 Can I initialize unions?

http://www.eskimo.com/~scs/C-faq/s2.html (1 of 2) [22/07/2003 5:12:47 PM]

Structures, Unions, and Enumerations

2.22 What is the difference between an enumeration and a set of preprocessor #defines?
2.24 Is there an easy way to print enumeration values symbolically?

top

http://www.eskimo.com/~scs/C-faq/s2.html (2 of 2) [22/07/2003 5:12:47 PM]

Question 20.1

Question 20.1
How can I return multiple values from a function?

Either pass pointers to several locations which the function can fill in, or have the function return a
structure containing the desired values, or (in a pinch) consider global variables. See also questions 2.7,
4.8, and 7.5.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.1.html [22/07/2003 5:12:49 PM]

Question 7.5

Question 7.5
I have a function that is supposed to return a string, but when it returns to its caller, the returned string is
garbage.

Make sure that the pointed-to memory is properly allocated. The returned pointer should be to a staticallyallocated buffer, or to a buffer passed in by the caller, or to memory obtained with malloc, but not to a
local (automatic) array. In other words, never do something like
char *itoa(int n)
{
char retbuf[20];
sprintf(retbuf, "%d", n);
return retbuf;
}

/* WRONG */
/* WRONG */

One fix (which is imperfect, especially if the function in question is called recursively, or if several of its
return values are needed simultaneously) would be to declare the return buffer as
static char retbuf[20];
See also questions 12.21 and 20.1.
References: ANSI Sec. 3.1.2.4
ISO Sec. 6.1.2.4

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q7.5.html [22/07/2003 5:12:50 PM]

Question 12.21

Question 12.21
How can I tell how much destination buffer space I'll need for an arbitrary sprintf call? How can I
avoid overflowing the destination buffer with sprintf?

There are not (yet) any good answers to either of these excellent questions, and this represents perhaps
the biggest deficiency in the traditional stdio library.
When the format string being used with sprintf is known and relatively simple, you can usually
predict a buffer size in an ad-hoc way. If the format consists of one or two %s's, you can count the fixed
characters in the format string yourself (or let sizeof count them for you) and add in the result of
calling strlen on the string(s) to be inserted. You can conservatively estimate the size that %d will
expand to with code like:
#include <limits.h>
char buf[(sizeof(int) * CHAR_BIT + 2) / 3 + 1 + 1];
sprintf(buf, "%d", n);
(This code computes the number of characters required for a base-8 representation of a number; a base10 expansion is guaranteed to take as much room or less.)
When the format string is more complicated, or is not even known until run time, predicting the buffer
size becomes as difficult as reimplementing sprintf, and correspondingly error-prone (and
inadvisable). A last-ditch technique which is sometimes suggested is to use fprintf to print the same
text to a bit bucket or temporary file, and then to look at fprintf's return value or the size of the file
(but see question 19.12).
If there's any chance that the buffer might not be big enough, you won't want to call sprintf without
some guarantee that the buffer will not overflow and overwrite some other part of memory. Several
stdio's (including GNU and 4.4bsd) provide the obvious snprintf function, which can be used like
this:
snprintf(buf, bufsize, "You typed \"%s\"", answer);
and we can hope that a future revision of the ANSI/ISO C Standard will include this function.

Read sequentially: prev next up top


http://www.eskimo.com/~scs/C-faq/q12.21.html (1 of 2) [22/07/2003 5:12:52 PM]

Question 12.21

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q12.21.html (2 of 2) [22/07/2003 5:12:52 PM]

Question 19.12

Question 19.12
How can I find out the size of a file, prior to reading it in?

If the ``size of a file'' is the number of characters you'll be able to read from it in C, it is difficult or
impossible to determine this number exactly).
Under Unix, the stat call will give you an exact answer. Several other systems supply a Unix-like
stat which will give an approximate answer. You can fseek to the end and then use ftell, but these
tend to have the same problems: fstat is not portable, and generally tells you the same thing stat
tells you; ftell is not guaranteed to return a byte count except for binary files. Some systems provide
routines called filesize or filelength, but these are not portable, either.
Are you sure you have to determine the file's size in advance? Since the most accurate way of
determining the size of a file as a C program will see it is to open the file and read it, perhaps you can
rearrange the code to learn the size as it reads.
References: ANSI Sec. 4.9.9.4
ISO Sec. 7.9.9.4
H&S Sec. 15.5.1
PCS Sec. 12 p. 213
POSIX Sec. 5.6.2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.12.html [22/07/2003 5:12:54 PM]

Question 19.11

Question 19.11
How can I check whether a file exists? I want to warn the user if a requested input file is missing.

It's surprisingly difficult to make this determination reliably and portably. Any test you make can be
invalidated if the file is created or deleted (i.e. by some other process) between the time you make the
test and the time you try to open the file.
Three possible test routines are stat, access, and fopen. (To make an approximate test for file
existence with fopen, just open for reading and close immediately.) Of these, only fopen is widely
portable, and access, where it exists, must be used carefully if the program uses the Unix set-UID
feature.
Rather than trying to predict in advance whether an operation such as opening a file will succeed, it's
often better to try it, check the return value, and complain if it fails. (Obviously, this approach won't work
if you're trying to avoid overwriting an existing file, unless you've got something like the O_EXCL file
opening option available, which does just what you want in this case.)
References: PCS Sec. 12 pp. 189,213
POSIX Sec. 5.3.1, Sec. 5.6.2, Sec. 5.6.3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.11.html [22/07/2003 5:12:58 PM]

Question 19.10

Question 19.10
How can I do graphics?

Once upon a time, Unix had a fairly nice little set of device-independent plot routines described in plot(3)
and plot(5), but they've largely fallen into disuse.
If you're programming for MS-DOS, you'll probably want to use libraries conforming to the VESA or
BGI standards.
If you're trying to talk to a particular plotter, making it draw is usually a matter of sending it the
appropriate escape sequences; see also question 19.9. The vendor may supply a C-callable library, or you
may be able to find one on the net.
If you're programming for a particular window system (Macintosh, X windows, Microsoft Windows),
you will use its facilities; see the relevant documentation or newsgroup or FAQ list.
References: PCS Sec. 5.4 pp. 75-77

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.10.html [22/07/2003 5:13:00 PM]

Question 19.9

Question 19.9
How do I send escape sequences to control a terminal or other device?

If you can figure out how to send characters to the device at all (see question 19.8), it's easy enough to
send escape sequences. In ASCII, the ESC code is 033 (27 decimal), so code like
fprintf(ofd, "\033[J");
sends the sequence ESC [ J .

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.9.html [22/07/2003 5:13:02 PM]

Question 19.8

Question 19.8
How can I direct output to the printer?

Under Unix, either use popen (see question 19.30) to write to the lp or lpr program, or perhaps open
a special file like /dev/lp. Under MS-DOS, write to the (nonstandard) predefined stdio stream
stdprn, or open the special files PRN or LPT1.
References: PCS Sec. 5.3 pp. 72-74

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.8.html [22/07/2003 5:13:03 PM]

Question 19.30

Question 19.30
How can I invoke another program or command and trap its output?

Unix and some other systems provide a popen routine, which sets up a stdio stream on a pipe connected
to the process running a command, so that the output can be read (or the input supplied). (Also,
remember to call pclose.)
If you can't use popen, you may be able to use system, with the output going to a file which you then
open and read.
If you're using Unix and popen isn't sufficient, you can learn about pipe, dup, fork, and exec.
(One thing that probably would not work, by the way, would be to use freopen.)
References: PCS Sec. 11 p. 169

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.30.html [22/07/2003 5:13:05 PM]

Question 19.27

Question 19.27
How can I invoke another program (a standalone executable, or an operating system command) from
within a C program?

Use the library function system, which does exactly that. Note that system's return value is the
command's exit status, and usually has nothing to do with the output of the command. Note also that
system accepts a single string representing the command to be invoked; if you need to build up a
complex command line, you can use sprintf. See also question 19.30.
References: K&R1 Sec. 7.9 p. 157
K&R2 Sec. 7.8.4 p. 167, Sec. B6 p. 253
ANSI Sec. 4.10.4.5
ISO Sec. 7.10.4.5
H&S Sec. 19.2 p. 407
PCS Sec. 11 p. 179

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.27.html [22/07/2003 5:13:06 PM]

Question 19.25

Question 19.25
How can I access memory (a memory-mapped device, or graphics memory) located at a certain address?

Set a pointer, of the appropriate type, to the right number (using an explicit cast to assure the compiler
that you really do intend this nonportable conversion):
unsigned int *magicloc = (unsigned int *)0x12345678;
Then, *magicloc refers to the location you want. (Under MS-DOS, you may find a macro like
MK_FP() handy for working with segments and offsets.)
References: K&R1 Sec. A14.4 p. 210
K&R2 Sec. A6.6 p. 199
ANSI Sec. 3.3.4
ISO Sec. 6.3.4
Rationale Sec. 3.3.4
H&S Sec. 6.2.7 pp. 171-2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.25.html [22/07/2003 5:13:09 PM]

Question 19.24

Question 19.24
What does the error message ``DGROUP data allocation exceeds 64K'' mean, and what can I do about it?
I thought that using large model meant that I could use more than 64K of data!

Even in large memory models, MS-DOS compilers apparently toss certain data (strings, some initialized
global or static variables) into a default data segment, and it's this segment that is overflowing. Either
use less global data, or, if you're already limiting yourself to reasonable amounts (and if the problem is
due to something like the number of strings), you may be able to coax the compiler into not using the
default data segment for so much. Some compilers place only ``small'' data objects in the default data
segment, and give you a way (e.g. the /Gt option under Microsoft compilers) to configure the threshold
for ``small.''

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.24.html [22/07/2003 5:13:11 PM]

Question 19.23

Question 19.23
How can I allocate arrays or structures bigger than 64K?

A reasonable computer ought to give you transparent access to all available memory. If you're not so
lucky, you'll either have to rethink your program's use of memory, or use various system-specific
techniques.
64K is (still) a pretty big chunk of memory. No matter how much memory your computer has available,
it's asking a lot to be able to allocate huge amounts of it contiguously. (The C Standard does not
guarantee that a single object can be larger than 32K.) Often it's a good idea to use data structures which
don't require that all memory be contiguous. For dynamically-allocated multidimensional arrays, you can
use pointers to pointers, as illustrated in question 6.16. Instead of a large array of structures, you can use
a linked list, or an array of pointers to structures.
If you're using a PC-compatible (8086-based) system, and running up against a 640K limit, consider
using ``huge'' memory model, or expanded or extended memory, or malloc variants such as halloc or
farmalloc, or a 32-bit ``flat'' compiler (e.g. djgpp, see question 18.3), or some kind of a DOS
extender, or another operating system.
References: ANSI Sec. 2.2.4.1
ISO Sec. 5.2.4.1

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.23.html [22/07/2003 5:13:12 PM]

Question 6.1

Question 6.1
I had the definition char a[6] in one source file, and in another I declared extern char *a. Why
didn't it work?

The declaration extern char *a simply does not match the actual definition. The type pointer-totype-T is not the same as array-of-type-T. Use extern char a[].
References: ANSI Sec. 3.5.4.2
ISO Sec. 6.5.4.2
CT&P Sec. 3.3 pp. 33-4, Sec. 4.5 pp. 64-5

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q6.1.html [22/07/2003 5:13:14 PM]

Question 5.20

Question 5.20
What does a run-time ``null pointer assignment'' error mean? How do I track it down?

This message, which typically occurs with MS-DOS compilers (see, therefore, section 19) means that
you've written, via a null (perhaps because uninitialized) pointer, to location 0. (See also question 16.8.)
A debugger may let you set a data breakpoint or watchpoint or something on location 0. Alternatively,
you could write a bit of code to stash away a copy of 20 or so bytes from location 0, and periodically
check that the memory at location 0 hasn't changed.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q5.20.html [22/07/2003 5:13:16 PM]

System Dependencies

19. System Dependencies


19.1 How can I read a single character from the keyboard without waiting for the RETURN key?
19.2 How can I find out how many characters are available for reading, or do a non-blocking read?
19.3 How can I display a percentage-done indication that updates itself in place, or show one of those
``twirling baton'' progress indicators?
19.4 How can I clear the screen, or print things in inverse video, or move the cursor?
19.5 How do I read the arrow keys? What about function keys?
19.6 How do I read the mouse?
19.7 How can I do serial (``comm'') port I/O?
19.8 How can I direct output to the printer?
19.9 How do I send escape sequences to control a terminal or other device?
19.10 How can I do graphics?
19.11 How can I check whether a file exists?
19.12 How can I find out the size of a file, prior to reading it in?
19.13 How can a file be shortened in-place without completely clearing or rewriting it?
19.14 How can I insert or delete a line in the middle of a file?
19.15 How can I recover the file name given an open file descriptor?
19.16 How can I delete a file?
19.17 What's wrong with the call "fopen("c:\newdir\file.dat", "r")"?

http://www.eskimo.com/~scs/C-faq/s19.html (1 of 3) [22/07/2003 5:13:17 PM]

System Dependencies

19.18 How can I increase the allowable number of simultaneously open files?
19.20 How can I read a directory in a C program?
19.22 How can I find out how much memory is available?
19.23 How can I allocate arrays or structures bigger than 64K?
19.24 What does the error message ``DGROUP exceeds 64K'' mean?
19.25 How can I access memory located at a certain address?
19.27 How can I invoke another program from within a C program?
19.30 How can I invoke another program and trap its output?
19.31 How can my program discover the complete pathname to the executable from which it was
invoked?
19.32 How can I automatically locate a program's configuration files in the same directory as the
executable?
19.33 How can a process change an environment variable in its caller?
19.36 How can I read in an object file and jump to routines in it?
19.37 How can I implement a delay, or time a user's response, with sub-second resolution?
19.38 How can I trap or ignore keyboard interrupts like control-C?
19.39 How can I handle floating-point exceptions gracefully?
19.40 How do I... Use sockets? Do networking? Write client/server applications?
19.40b How do I use BIOS calls? How can I write ISR's? How can I create TSR's?
19.41 But I can't use all these nonstandard, system-dependent functions, because my program has to be
ANSI compatible!

http://www.eskimo.com/~scs/C-faq/s19.html (2 of 3) [22/07/2003 5:13:17 PM]

System Dependencies

top

http://www.eskimo.com/~scs/C-faq/s19.html (3 of 3) [22/07/2003 5:13:17 PM]

Question 19.1

Question 19.1
How can I read a single character from the keyboard without waiting for the RETURN key? How can I
stop characters from being echoed on the screen as they're typed?

Alas, there is no standard or portable way to do these things in C. Concepts such as screens and
keyboards are not even mentioned in the Standard, which deals only with simple I/O ``streams'' of
characters.
At some level, interactive keyboard input is usually collected and presented to the requesting program a
line at a time. This gives the operating system a chance to support input line editing
(backspace/delete/rubout, etc.) in a consistent way, without requiring that it be built into every program.
Only when the user is satisfied and presses the RETURN key (or equivalent) is the line made available to
the calling program. Even if the calling program appears to be reading input a character at a time (with
getchar or the like), the first call blocks until the user has typed an entire line, at which point
potentially many characters become available and many character requests (e.g. getchar calls) are
satisfied in quick succession.
When a program wants to read each character immediately as it arrives, its course of action will depend
on where in the input stream the line collection is happening and how it can be disabled. Under some
systems (e.g. MS-DOS, VMS in some modes), a program can use a different or modified set of OS-level
input calls to bypass line-at-a-time input processing. Under other systems (e.g. Unix, VMS in other
modes), the part of the operating system responsible for serial input (often called the ``terminal driver'')
must be placed in a mode which turns off line-at-a-time processing, after which all calls to the usual
input routines (e.g. read, getchar, etc.) will return characters immediately. Finally, a few systems
(particularly older, batch-oriented mainframes) perform input processing in peripheral processors which
cannot be told to do anything other than line-at-a-time input.
Therefore, when you need to do character-at-a-time input (or disable keyboard echo, which is an
analogous problem), you will have to use a technique specific to the system you're using, assuming it
provides one. Since comp.lang.c is oriented towards topics that C does deal with, you will usually get
better answers to these questions by referring to a system-specific newsgroup such as
comp.unix.questions or comp.os.msdos.programmer, and to the FAQ lists for these groups. Note that the
answers are often not unique even across different variants of a system; bear in mind when answering
system-specific questions that the answer that applies to your system may not apply to everyone else's.
However, since these questions are frequently asked here, here are brief answers for some common
situations.
Some versions of curses have functions called cbreak, noecho, and getch which do what you want.
http://www.eskimo.com/~scs/C-faq/q19.1.html (1 of 2) [22/07/2003 5:13:19 PM]

Question 19.1

If you're specifically trying to read a short password without echo, you might try getpass. Under Unix,
you can use ioctl to play with the terminal driver modes (CBREAK or RAW under ``classic'' versions;
ICANON, c_cc[VMIN] and c_cc[VTIME] under System V or POSIX systems; ECHO under all
versions), or in a pinch, system and the stty command. (For more information, see <sgtty.h> and
tty(4) under classic versions, <termio.h> and termio(4) under System V, or <termios.h> and
termios(4) under POSIX.) Under MS-DOS, use getch or getche, or the corresponding BIOS
interrupts. Under VMS, try the Screen Management (SMG$) routines, or curses, or issue low-level
$QIO's with the IO$_READVBLK function code (and perhaps IO$M_NOECHO, and others) to ask for
one character at a time. (It's also possible to set character-at-a-time or ``pass through'' modes in the VMS
terminal driver.) Under other operating systems, you're on your own.
(As an aside, note that simply using setbuf or setvbuf to set stdin to unbuffered will not
generally serve to allow character-at-a-time input.)
If you're trying to write a portable program, a good approach is to define your own suite of three
functions to (1) set the terminal driver or input system into character-at-a-time mode (if necessary), (2)
get characters, and (3) return the terminal driver to its initial state when the program is finished. (Ideally,
such a set of functions might be part of the C Standard, some day.) The extended versions of this FAQ
list (see question 20.40) contain examples of such functions for several popular systems.
See also question 19.2.
References: PCS Sec. 10 pp. 128-9, Sec. 10.1 pp. 130-1
POSIX Sec. 7

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.1.html (2 of 2) [22/07/2003 5:13:19 PM]

Question 19.2

Question 19.2
How can I find out if there are characters available for reading (and if so, how many)? Alternatively, how
can I do a read that will not block if there are no characters available?

These, too, are entirely operating-system-specific. Some versions of curses have a nodelay function.
Depending on your system, you may also be able to use ``nonblocking I/O'', or a system call named
select or poll, or the FIONREAD ioctl, c_cc[VTIME], or kbhit, or rdchk, or the O_NDELAY
option to open or fcntl. See also question 19.1.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.2.html [22/07/2003 5:13:37 PM]

Question 19.3

Question 19.3
How can I display a percentage-done indication that updates itself in place, or show one of those
``twirling baton'' progress indicators?

These simple things, at least, you can do fairly portably. Printing the character '\r' will usually give
you a carriage return without a line feed, so that you can overwrite the current line. The character '\b'
is a backspace, and will usually move the cursor one position to the left.
References: ANSI Sec. 2.2.2
ISO Sec. 5.2.2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.3.html [22/07/2003 5:13:39 PM]

Question 19.4

Question 19.4
How can I clear the screen?
How can I print things in inverse video?
How can I move the cursor to a specific x, y position?

Such things depend on the terminal type (or display) you're using. You will have to use a library such as
termcap, terminfo, or curses, or some system-specific routines, to perform these operations.
For clearing the screen, a halfway portable solution is to print a form-feed character ('\f'), which will
cause some displays to clear. Even more portable would be to print enough newlines to scroll everything
away. As a last resort, you could use system (see question 19.27) to invoke an operating system clearscreen command.
References: PCS Sec. 5.1.4 pp. 54-60, Sec. 5.1.5 pp. 60-62

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.4.html [22/07/2003 5:13:41 PM]

Question 19.5

Question 19.5
How do I read the arrow keys? What about function keys?

Terminfo, some versions of termcap, and some versions of curses have support for these non-ASCII
keys. Typically, a special key sends a multicharacter sequence (usually beginning with ESC, '\033');
parsing these can be tricky. (curses will do the parsing for you, if you call keypad first.)
Under MS-DOS, if you receive a character with value 0 (not '0'!) while reading the keyboard, it's a flag
indicating that the next character read will be a code indicating a special key. See any DOS programming
guide for lists of keyboard codes. (Very briefly: the up, left, right, and down arrow keys are 72, 75, 77,
and 80, and the function keys are 59 through 68.)
References: PCS Sec. 5.1.4 pp. 56-7

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.5.html [22/07/2003 5:13:42 PM]

Question 19.6

Question 19.6
How do I read the mouse?

Consult your system documentation, or ask on an appropriate system-specific newsgroup (but check its
FAQ list first). Mouse handling is completely different under the X window system, MS-DOS, the
Macintosh, and probably every other system.
References: PCS Sec. 5.5 pp. 78-80

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.6.html [22/07/2003 5:13:44 PM]

Question 19.7

Question 19.7
How can I do serial (``comm'') port I/O?

It's system-dependent. Under Unix, you typically open, read, and write a device file in /dev, and use the
facilities of the terminal driver to adjust its characteristics. (See also questions 19.1 and 19.2.) Under MSDOS, you can use the predefined stream stdaux, or a special file like COM1, or some primitive BIOS
interrupts, or (if you require decent performance) any number of interrupt-driven serial I/O packages.
Several netters recommend the book C Programmer's Guide to Serial Communications, by Joe
Campbell.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.7.html [22/07/2003 5:13:46 PM]

Question 19.13

Question 19.13
How can a file be shortened in-place without completely clearing or rewriting it?

BSD systems provide ftruncate, several others supply chsize, and a few may provide a (possibly
undocumented) fcntl option F_FREESP. Under MS-DOS, you can sometimes use write(fd, "",
0). However, there is no portable solution, nor a way to delete blocks at the beginning. See also question
19.14.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.13.html [22/07/2003 5:13:47 PM]

Question 19.14

Question 19.14
How can I insert or delete a line (or record) in the middle of a file?

Short of rewriting the file, you probably can't. The usual solution is simply to rewrite the file. (Instead of
deleting records, you might consider simply marking them as deleted, to avoid rewriting.) See also
questions 12.30 and 19.13.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q19.14.html [22/07/2003 5:13:48 PM]

Question 12.30

Question 12.30
I'm trying to update a file in place, by using fopen mode "r+", reading a certain string, and writing
back a modified string, but it's not working.

Be sure to call fseek before you write, both to seek back to the beginning of the string you're trying to
overwrite, and because an fseek or fflush is always required between reading and writing in the
read/write "+" modes. Also, remember that you can only overwrite characters with the same number of
replacement characters; see also question 19.14.
References: ANSI Sec. 4.9.5.3
ISO Sec. 7.9.5.3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q12.30.html [22/07/2003 5:13:50 PM]

Question 12.26

Question 12.26
How can I flush pending input so that a user's typeahead isn't read at the next prompt? Will
fflush(stdin) work?

fflush is defined only for output streams. Since its definition of ``flush'' is to complete the writing of
buffered characters (not to discard them), discarding unread input would not be an analogous meaning
for fflush on input streams.
There is no standard way to discard unread characters from a stdio input stream, nor would such a way be
sufficient unread characters can also accumulate in other, OS-level input buffers.
References: ANSI Sec. 4.9.5.2
ISO Sec. 7.9.5.2
H&S Sec. 15.2

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q12.26.html [22/07/2003 5:13:51 PM]

Question 12.25

Question 12.25
What's the difference between fgetpos/fsetpos and ftell/fseek?
What are fgetpos and fsetpos good for?

fgetpos and fsetpos use a special typedef, fpos_t, for representing offsets (positions) in a file.
The type behind this typedef, if chosen appropriately, can represent arbitrarily large offsets, allowing
fgetpos and fsetpos to be used with arbitrarily huge files. ftell and fseek, on the other hand,
use long int, and are therefore limited to offsets which can be represented in a long int. See also
question 1.4.
References: K&R2 Sec. B1.6 p. 248
ANSI Sec. 4.9.1, Secs. 4.9.9.1,4.9.9.3
ISO Sec. 7.9.1, Secs. 7.9.9.1,7.9.9.3
H&S Sec. 15.5 p. 252

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q12.25.html [22/07/2003 5:13:53 PM]

Question 1.4

Question 1.4
What should the 64-bit type on new, 64-bit machines be?

Some vendors of C products for 64-bit machines support 64-bit long ints. Others fear that too much
existing code is written to assume that ints and longs are the same size, or that one or the other of
them is exactly 32 bits, and introduce a new, nonstandard, 64-bit long long (or __longlong) type
instead.
Programmers interested in writing portable code should therefore insulate their 64-bit type needs behind
appropriate typedefs. Vendors who feel compelled to introduce a new, longer integral type should
advertise it as being ``at least 64 bits'' (which is truly new, a type traditional C does not have), and not
``exactly 64 bits.''
References: ANSI Sec. F.5.6
ISO Sec. G.5.6

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q1.4.html [22/07/2003 5:19:38 PM]

Question 1.1

Question 1.1
How do you decide which integer type to use?

If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very
important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If welldefined overflow characteristics are important and negative values are not, or if you want to steer clear of
sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned
types. (Beware when mixing signed and unsigned values in expressions, though.)
Although character types (especially unsigned char) can be used as ``tiny'' integers, doing so is
sometimes more trouble than it's worth, due to unpredictable sign extension and increased code size.
(Using unsigned char can help; see question 12.1 for a related problem.)
A similar space/time tradeoff applies when deciding between float and double. None of the above
rules apply if the address of a variable is taken and must have a particular type.
If for some reason you need to declare something with an exact size (usually the only good reason for
doing so is when attempting to conform to some externally-imposed storage layout, but see question
20.5), be sure to encapsulate the choice behind an appropriate typedef.
References: K&R1 Sec. 2.2 p. 34
K&R2 Sec. 2.2 p. 36, Sec. A4.2 pp. 195-6, Sec. B11 p. 257
ANSI Sec. 2.2.4.2.1, Sec. 3.1.2.5
ISO Sec. 5.2.4.2.1, Sec. 6.1.2.5
H&S Secs. 5.1,5.2 pp. 110-114

Read sequentially: next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q1.1.html [22/07/2003 5:19:42 PM]

Question 12.1

Question 12.1
What's wrong with this code?
char c;
while((c = getchar()) != EOF) ...

For one thing, the variable to hold getchar's return value must be an int. getchar can return all
possible character values, as well as EOF. By passing getchar's return value through a char, either a
normal character might be misinterpreted as EOF, or the EOF might be altered (particularly if type char
is unsigned) and so never seen.
References: K&R1 Sec. 1.5 p. 14
K&R2 Sec. 1.5.1 p. 16
ANSI Sec. 3.1.2.5, Sec. 4.9.1, Sec. 4.9.7.5
ISO Sec. 6.1.2.5, Sec. 7.9.1, Sec. 7.9.7.5
H&S Sec. 5.1.3 p. 116, Sec. 15.1, Sec. 15.6
CT&P Sec. 5.1 p. 70
PCS Sec. 11 p. 157

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q12.1.html [22/07/2003 5:19:44 PM]

Question 11.35

Question 11.35
People keep saying that the behavior of i = i++ is undefined, but I just tried it on an ANSIconforming compiler, and got the results I expected.

A compiler may do anything it likes when faced with undefined behavior (and, within limits, with
implementation-defined and unspecified behavior), including doing what you expect. It's unwise to
depend on it, though. See also questions 11.32, 11.33, and 11.34.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.35.html [22/07/2003 5:19:46 PM]

Question 11.32

Question 11.32
Why won't the Frobozz Magic C Compiler, which claims to be ANSI compliant, accept this code? I
know that the code is ANSI, because gcc accepts it.

Many compilers support a few non-Standard extensions, gcc more so than most. Are you sure that the
code being rejected doesn't rely on such an extension? It is usually a bad idea to perform experiments
with a particular compiler to determine properties of a language; the applicable standard may permit
variations, or the compiler may be wrong. See also question 11.35.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.32.html [22/07/2003 5:19:48 PM]

Question 11.31

Question 11.31
Does anyone have a tool for converting old-style C programs to ANSI C, or vice versa, or for
automatically generating prototypes?

Two programs, protoize and unprotoize, convert back and forth between prototyped and ``old style''
function definitions and declarations. (These programs do not handle full-blown translation between
``Classic'' C and ANSI C.) These programs are part of the FSF's GNU C compiler distribution; see
question 18.3.
The unproto program (/pub/unix/unproto5.shar.Z on ftp.win.tue.nl) is a filter which sits between the
preprocessor and the next compiler pass, converting most of ANSI C to traditional C on-the-fly.
The GNU GhostScript package comes with a little program called ansi2knr.
Before converting ANSI C back to old-style, beware that such a conversion cannot always be made both
safely and automatically. ANSI C introduces new features and complexities not found in K&R C. You'll
especially need to be careful of prototyped function calls; you'll probably need to insert explicit casts.
See also questions 11.3 and 11.29.
Several prototype generators exist, many as modifications to lint. A program called CPROTO was
posted to comp.sources.misc in March, 1992. There is another program called ``cextract.'' Many vendors
supply simple utilities like these with their compilers. See also question 18.16. (But be careful when
generating prototypes for old functions with ``narrow'' parameters; see question 11.3.)
Finally, are you sure you really need to convert lots of old code to ANSI C? The old-style function
syntax is still acceptable, and a hasty conversion can easily introduce bugs. (See question 11.3.)

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q11.31.html [22/07/2003 5:19:50 PM]

Question 18.3

Question 18.3
What's a free or cheap C compiler I can use?

A popular and high-quality free C compiler is the FSF's GNU C compiler, or gcc. It is available by
anonymous ftp from prep.ai.mit.edu in directory pub/gnu, or at several other FSF archive sites. An MSDOS port, djgpp, is also available; it can be found in the Simtel and Oakland archives and probably many
others, usually in a directory like pub/msdos/djgpp/ or simtel/msdos/djgpp/.
There is a shareware compiler called PCC, available as PCC12C.ZIP .
A very inexpensive MS-DOS compiler is Power C from Mix Software, 1132 Commerce Drive,
Richardson, TX 75801, USA, 214-783-6001.
Another recently-developed compiler is lcc, available for anonymous ftp from ftp.cs.princeton.edu in
pub/lcc.
Archives associated with comp.compilers contain a great deal of information about available compilers,
interpreters, grammars, etc. (for many languages). The comp.compilers archives (including an FAQ list),
maintained by the moderator, John R. Levine, are at iecc.com . A list of available compilers and related
resources, maintained by Mark Hopkins, Steven Robenalt, and David Muir Sharnoff, is at ftp.idiom.com
in pub/compilers-list/. (See also the comp.compilers directory in the news.answers archives at
rtfm.mit.edu and ftp.uu.net; see question 20.40.)
See also question 18.16.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.3.html [22/07/2003 5:19:52 PM]

Question 18.2

Question 18.2
How can I track down these pesky malloc problems?

A number of debugging packages exist to help track down malloc problems; one popular one is Conor
P. Cahill's ``dbmalloc,'' posted to comp.sources.misc in 1992, volume 32. Others are ``leak,'' available in
volume 27 of the comp.sources.unix archives; JMalloc.c and JMalloc.h in the ``Snippets'' collection; and
MEMDEBUG from ftp.crpht.lu in pub/sources/memdebug . See also question 18.16.
A number of commercial debugging tools exist, and can be invaluable in tracking down malloc-related
and other stubborn problems:

Bounds-Checker for DOS, from Nu-Mega Technologies, P.O. Box 7780, Nashua, NH 030607780, USA, 603-889-2386.
CodeCenter (formerly Saber-C) from Centerline Software (formerly Saber), 10 Fawcett Street,
Cambridge, MA 02138-1110, USA, 617-498-3000.
Insight, from ParaSoft Corporation, 2500 E. Foothill Blvd., Pasadena, CA 91107, USA, 818-7929941, insight@parasoft.com .
Purify, from Pure Software, 1309 S. Mary Ave., Sunnyvale, CA 94087, USA, 800-224-7873, infohome@pure.com .
SENTINEL, from AIB Software, 46030 Manekin Plaza, Dulles, VA 20166, USA, 703-430-9247,
800-296-3000, info@aib.com .

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.2.html [22/07/2003 5:19:54 PM]

Question 18.1

Question 18.1
I need some C development tools.

Here is a crude list of some which are available.


a C cross-reference generator
cflow, cxref, calls, cscope, xscope, or ixfw
a C beautifier/pretty-printer
cb, indent, GNU indent, or vgrind
a revision control or configuration management tool
RCS or SCCS
a C source obfuscator (shrouder)
obfus, shroud, or opqcp
a ``make'' dependency generator
makedepend, or try cc -M or cpp -M
tools to compute code metrics
ccount, Metre, lcount, or csize, or see URL http://www.qucis.queensu.ca:1999/SoftwareEngineering/Cmetrics.html ; there is also a package sold by McCabe and Associates
a C lines-of-source counter
this can be done very crudely with the standard Unix utility wc, and considerably better with
grep -c ";"
a prototype generator
see question 11.31
a tool to track down malloc problems
see question 18.2
a ``selective'' C preprocessor
see question 10.18
language translation tools
see questions 11.31 and 20.26
C verifiers (lint)
see question 18.7
a C compiler!
see question 18.3
(This list of tools is by no means complete; if you know of tools not mentioned, you're welcome to
contact this list's maintainer.)
Other lists of tools, and discussion about them, can be found in the Usenet newsgroups comp.compilers
http://www.eskimo.com/~scs/C-faq/q18.1.html (1 of 2) [22/07/2003 5:20:19 PM]

Question 18.1

and comp.software-eng .
See also questions 18.16 and 18.3.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q18.1.html (2 of 2) [22/07/2003 5:20:19 PM]

Question 10.18

Question 10.18
I inherited some code which contains far too many #ifdef's for my taste. How can I preprocess the
code to leave only one conditional compilation set, without running it through the preprocessor and
expanding all of the #include's and #define's as well?

There are programs floating around called unifdef, rmifdef, and scpp (``selective C
preprocessor'') which do exactly this. See question 18.16.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.18.html [22/07/2003 5:20:22 PM]

Question 10.16

Question 10.16
How can I use a preprocessor #if expression to tell if a machine is big-endian or little-endian?

You probably can't. (Preprocessor arithmetic uses only long integers, and there is no concept of
addressing. ) Are you sure you need to know the machine's endianness explicitly? Usually it's better to
write code which doesn't care ). See also question 20.9.
References: ANSI Sec. 3.8.1
ISO Sec. 6.8.1
H&S Sec. 7.11.1 p. 225

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q10.16.html [22/07/2003 5:20:25 PM]

Question 20.9

Question 20.9
How can I determine whether a machine's byte order is big-endian or little-endian?

One way is to use a pointer:


int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else
printf("big-endian\n");
It's also possible to use a union.
See also question 10.16.
References: H&S Sec. 6.1.2 pp. 163-4

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.9.html [22/07/2003 5:20:26 PM]

Question 20.8

Question 20.8
How can I implement sets or arrays of bits?

Use arrays of char or int, with a few macros to access the desired bit at the proper index. Here are
some simple macros to use with arrays of char:
#include <limits.h>
#define
#define
#define
#define

/* for CHAR_BIT */

BITMASK(b) (1 << ((b) % CHAR_BIT))


BITSLOT(b) ((b) / CHAR_BIT)
BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b))

(If you don't have <limits.h>, try using 8 for CHAR_BIT.)


References: H&S Sec. 7.6.7 pp. 211-216

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.8.html [22/07/2003 5:20:48 PM]

Question 20.6

Question 20.6
If I have a char * variable pointing to the name of a function, how can I call that function?

The most straightforward thing to do is to maintain a correspondence table of names and function
pointers:
int func(), anotherfunc();
struct { char *name; int (*funcptr)(); } symtab[] = {
"func",
func,
"anotherfunc", anotherfunc,
};
Then, search the table for the name, and call via the associated function pointer. See also questions 2.15
and 19.36.
References: PCS Sec. 11 p. 168

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q20.6.html [22/07/2003 5:21:58 PM]

Question 2.15

Question 2.15
How can I access structure fields by name at run time?

Build a table of names and offsets, using the offsetof() macro. The offset of field b in struct a
is
offsetb = offsetof(struct a, b)
If structp is a pointer to an instance of this structure, and field b is an int (with offset as computed
above), b's value can be set indirectly with
*(int *)((char *)structp + offsetb) = value;

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.15.html [22/07/2003 5:22:01 PM]

Question 2.14

Question 2.14
How can I determine the byte offset of a field within a structure?

ANSI C defines the offsetof() macro, which should be used if available; see <stddef.h>. If you
don't have it, one possible implementation is
#define offsetof(type, mem) ((size_t) \
((char *)&((type *)0)->mem - (char *)(type *)0))
This implementation is not 100% portable; some compilers may legitimately refuse to accept it.
See question 2.15 for a usage hint.
References: ANSI Sec. 4.1.5
ISO Sec. 7.1.6
Rationale Sec. 3.5.4.2
H&S Sec. 11.1 pp. 292-3

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.14.html [22/07/2003 5:22:03 PM]

Question 2.13

Question 2.13
Why does sizeof report a larger size than I expect for a structure type, as if there were padding at the
end?

Structures may have this padding (as well as internal padding), if necessary, to ensure that alignment
properties will be preserved when an array of contiguous structures is allocated. Even when the structure
is not part of an array, the end padding remains, so that sizeof can always return a consistent size. See
question 2.12.
References: H&S Sec. 5.6.7 pp. 139-40

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995 // mail feedback

http://www.eskimo.com/~scs/C-faq/q2.13.html [22/07/2003 5:22:08 PM]

C Programming FAQs: Frequently Asked Questions

C Programming FAQs: Frequently Asked


Questions

``I think it's safe to say that no person today can hope to achieve basic life competence
without consulting my work on a regular basis.''
--- Cecil Adams
A major revision and expansion of the comp.lang.c FAQ list was published in late 1995 by AddisonWesley, to wit:
Author: Steve Summit
Title: C Programming FAQs: Frequently Asked Questions
Publisher: Addison-Wesley
Copyright: 1996
ISBN: 0-201-84519-9
Most technical bookstores, as well as several of the large chains, are carrying this book. If you can't find
it, you can order it (in the U.S., at least) direct from Addison-Wesley by calling 800-282-0693. I'm sure
you could also order it on-line from Amazon.com or other on-line booksellers.
Addison-Wesley has web pages describing this book as well as many others. You can also browse the
content corresponding to the on-line version of the FAQ list (but note that this corresponds to only about
half of the actual book!).
As the Preface mentions, ``it can be as hard to eradicate the last error from a large manuscript as it is to
stamp out the last bug in a program.'' If you've already obtained a copy of the book, you'll want to skim
this errata list. (
Updated with C99 changes.)
I'd like to publicly thank Addison-Wesley for their support of the FAQ list, for giving me the opportunity
to put it out in book form, and for being very easy to work with. All first-time authors should have it so
http://www.eskimo.com/~scs/C-faq/book/ (1 of 2) [22/07/2003 5:22:15 PM]

C Programming FAQs: Frequently Asked Questions

lucky.

scs home page

http://www.eskimo.com/~scs/C-faq/book/ (2 of 2) [22/07/2003 5:22:15 PM]

C Programming Notes

C Programming Notes
Introductory C Programming Class Notes, Chapter 1
Steve Summit

These notes are part of the UW Experimental College course on Introductory C Programming. They are
based on notes prepared (beginning in Spring, 1995) to supplement the book The C Programming
Language, by Brian Kernighan and Dennis Ritchie, or K&R as the book and its authors are affectionately
known. (The second edition was published in 1988 by Prentice-Hall, ISBN 0-13-110362-8.) These notes
are now (as of Winter, 1995-6) intended to be stand-alone, although the sections are still cross-referenced
to those of K&R, for the reader who wants to pursue a more in-depth exposition.

Chapter 1: Introduction
Chapter 2: Basic Data Types and Operators
Chapter 3: Statements and Control Flow
Chapter 4: More about Declarations (and Initialization)
Chapter 5: Functions and Program Structure
Chapter 6: Basic I/O
Chapter 7: More Operators
Chapter 8: Strings
Chapter 9: The C Preprocessor
Chapter 10: Pointers
Chapter 11: Memory Allocation
Chapter 12: Input and Output
http://www.eskimo.com/~scs/cclass/notes/top.html (1 of 2) [22/07/2003 5:22:17 PM]

C Programming Notes

Chapter 13: Reading the Command Line


Chapter 14: What's Next?

Read Sequentially

This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/top.html (2 of 2) [22/07/2003 5:22:17 PM]

Chapter 1: Introduction

Chapter 1: Introduction
C is (as K&R admit) a relatively small language, but one which (to its admirers, anyway) wears well. C's
small, unambitious feature set is a real advantage: there's less to learn; there isn't excess baggage in the
way when you don't need it. It can also be a disadvantage: since it doesn't do everything for you, there's a
lot you have to do yourself. (Actually, this is viewed by many as an additional advantage: anything the
language doesn't do for you, it doesn't dictate to you, either, so you're free to do that something however
you want.)
C is sometimes referred to as a ``high-level assembly language.'' Some people think that's an insult, but
it's actually a deliberate and significant aspect of the language. If you have programmed in assembly
language, you'll probably find C very natural and comfortable (although if you continue to focus too
heavily on machine-level details, you'll probably end up with unnecessarily nonportable programs). If
you haven't programmed in assembly language, you may be frustrated by C's lack of certain higher-level
features. In either case, you should understand why C was designed this way: so that seemingly-simple
constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine
language constructions when compiled. If you write a C program simply and succinctly, it is likely to
result in a succinct, efficient machine language executable. If you find that the executable program
resulting from a C program is not efficient, it's probably because of something silly you did, not because
of something the compiler did behind your back which you have no control over. In any case, there's no
point in complaining about C's low-level flavor: C is what it is.
A programming language is a tool, and no tool can perform every task unaided. If you're building a
house, and I'm teaching you how to use a hammer, and you ask how to assemble rafters and trusses into
gables, that's a legitimate question, but the answer has fallen out of the realm of ``How do I use a
hammer?'' and into ``How do I build a house?''. In the same way, we'll see that C does not have built-in
features to perform every function that we might ever need to do while programming.
As mentioned above, C imposes relatively few built-in ways of doing things on the programmer. Some
common tasks, such as manipulating strings, allocating memory, and doing input/output (I/O), are
performed by calling on library functions. Other tasks which you might want to do, such as creating or
listing directories, or interacting with a mouse, or displaying windows or other user-interface elements,
or doing color graphics, are not defined by the C language at all. You can do these things from a C
program, of course, but you will be calling on services which are peculiar to your programming
environment (compiler, processor, and operating system) and which are not defined by the C standard.
Since this course is about portable C programming, it will also be steering clear of facilities not provided
in all C environments.
Another aspect of C that's worth mentioning here is that it is, to put it bluntly, a bit dangerous. C does
not, in general, try hard to protect a programmer from mistakes. If you write a piece of code which will
(through some oversight of yours) do something wildly different from what you intended it to do, up to

http://www.eskimo.com/~scs/cclass/notes/sx1.html (1 of 2) [22/07/2003 5:22:20 PM]

Chapter 1: Introduction

and including deleting your data or trashing your disk, and if it is possible for the compiler to compile it,
it generally will. You won't get warnings of the form ``Do you really mean to...?'' or ``Are you sure you
really want to...?''. C is often compared to a sharp knife: it can do a surgically precise job on some
exacting task you have in mind, but it can also do a surgically precise job of cutting off your finger. It's
up to you to use it carefully.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that C is not a good
teaching language. C aficionados love this aspect of C because it means that C does not try to protect
them from themselves: when they know what they're doing, even if it's risky or obscure, they can do it.
Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.
This is another aspect of the language which it's fairly pointless to complain about. If you take care and
pay attention, you can avoid many of the pitfalls. These notes will point out many of the obvious (and not
so obvious) trouble spots.
1.1 A First Example
1.2 Second Example
1.3 Program Structure

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx1.html (2 of 2) [22/07/2003 5:22:20 PM]

1.1 A First Example

1.1 A First Example


[This section corresponds to K&R Sec. 1.1]
The best way to learn programming is to dive right in and start writing real programs. This way, concepts
which would otherwise seem abstract make sense, and the positive feedback you get from getting even a
small program to work gives you a great incentive to improve it or write the next one.
Diving in with ``real'' programs right away has another advantage, if only pragmatic: if you're using a
conventional compiler, you can't run a fragment of a program and see what it does; nothing will run until
you have a complete (if tiny or trivial) program. You can't learn everything you'd need to write a
complete program all at once, so you'll have to take some things ``on faith'' and parrot them in your first
programs before you begin to understand them. (You can't learn to program just one expression or
statement at a time any more than you can learn to speak a foreign language one word at a time. If all you
know is a handful of words, you can't actually say anything: you also need to know something about the
language's word order and grammar and sentence structure and declension of articles and verbs.)
Besides the occasional necessity to take things on faith, there is a more serious potential drawback of this
``dive in and program'' approach: it's a small step from learning-by-doing to learning-by-trial-and-error,
and when you learn programming by trial-and-error, you can very easily learn many errors. When you're
not sure whether something will work, or you're not even sure what you could use that might work, and
you try something, and it does work, you do not have any guarantee that what you tried worked for the
right reason. You might just have ``learned'' something that works only by accident or only on your
compiler, and it may be very hard to un-learn it later, when it stops working.
Therefore, whenever you're not sure of something, be very careful before you go off and try it ``just to
see if it will work.'' Of course, you can never be absolutely sure that something is going to work before
you try it, otherwise we'd never have to try things. But you should have an expectation that something is
going to work before you try it, and if you can't predict how to do something or whether something
would work and find yourself having to determine it experimentally, make a note in your mind that
whatever you've just learned (based on the outcome of the experiment) is suspect.
The first example program in K&R is the first example program in any language: print or display a
simple string, and exit. Here is my version of K&R's ``hello, world'' program:
#include <stdio.h>
main()
{
printf("Hello, world!\n");
return 0;
}
http://www.eskimo.com/~scs/cclass/notes/sx1a.html (1 of 5) [22/07/2003 5:22:24 PM]

1.1 A First Example

If you have a C compiler, the first thing to do is figure out how to type this program in and compile it and
run it and see where its output went. (If you don't have a C compiler yet, the first thing to do is to find
one.)
The first line is practically boilerplate; it will appear in almost all programs we write. It asks that some
definitions having to do with the ``Standard I/O Library'' be included in our program; these definitions
are needed if we are to call the library function printf correctly.
The second line says that we are defining a function named main. Most of the time, we can name our
functions anything we want, but the function name main is special: it is the function that will be
``called'' first when our program starts running. The empty pair of parentheses indicates that our main
function accepts no arguments, that is, there isn't any information which needs to be passed in when the
function is called.
The braces { and } surround a list of statements in C. Here, they surround the list of statements making
up the function main.
The line
printf("Hello, world!\n");
is the first statement in the program. It asks that the function printf be called; printf is a library
function which prints formatted output. The parentheses surround printf's argument list: the
information which is handed to it which it should act on. The semicolon at the end of the line terminates
the statement.
(printf's name reflects the fact that C was first developed when Teletypes and other printing terminals
were still in widespread use. Today, of course, video displays are far more common. printf's ``prints''
to the standard output, that is, to the default location for program output to go. Nowadays, that's almost
always a video screen or a window on that screen. If you do have a printer, you'll typically have to do
something extra to get a program to print to it.)
printf's first (and, in this case, only) argument is the string which it should print. The string, enclosed
in double quotes "", consists of the words ``Hello, world!'' followed by a special sequence: \n. In
strings, any two-character sequence beginning with the backslash \ represents a single special character.
The sequence \n represents the ``new line'' character, which prints a carriage return or line feed or
whatever it takes to end one line of output and move down to the next. (This program only prints one line
of output, but it's still important to terminate it.)
The second line in the main function is

http://www.eskimo.com/~scs/cclass/notes/sx1a.html (2 of 5) [22/07/2003 5:22:24 PM]

1.1 A First Example

return 0;
In general, a function may return a value to its caller, and main is no exception. When main returns
(that is, reaches its end and stops functioning), the program is at its end, and the return value from main
tells the operating system (or whatever invoked the program that main is the main function of) whether
it succeeded or not. By convention, a return value of 0 indicates success.
This program may look so absolutely trivial that it seems as if it's not even worth typing it in and trying
to run it, but doing so may be a big (and is certainly a vital) first hurdle. On an unfamiliar computer, it
can be arbitrarily difficult to figure out how to enter a text file containing program source, or how to
compile and link it, or how to invoke it, or what happened after (if?) it ran. The most experienced C
programmers immediately go back to this one, simple program whenever they're trying out a new system
or a new way of entering or building programs or a new way of printing output from within programs. As
Kernighan and Ritchie say, everything else is comparatively easy.
How you compile and run this (or any) program is a function of the compiler and operating system you're
using. The first step is to type it in, exactly as shown; this may involve using a text editor to create a file
containing the program text. You'll have to give the file a name, and all C compilers (that I've ever heard
of) require that files containing C source end with the extension .c. So you might place the program text
in a file called hello.c.
The second step is to compile the program. (Strictly speaking, compilation consists of two steps,
compilation proper followed by linking, but we can overlook this distinction at first, especially because
the compiler often takes care of initiating the linking step automatically.) On many Unix systems, the
command to compile a C program from a source file hello.c is
cc -o hello hello.c
You would type this command at the Unix shell prompt, and it requests that the cc (C compiler) program
be run, placing its output (i.e. the new executable program it creates) in the file hello, and taking its
input (i.e. the source code to be compiled) from the file hello.c.
The third step is to run (execute, invoke) the newly-built hello program. Again on a Unix system, this
is done simply by typing the program's name:
hello
Depending on how your system is set up (in particular, on whether the current directory is searched for
executables, based on the PATH variable), you may have to type
./hello

http://www.eskimo.com/~scs/cclass/notes/sx1a.html (3 of 5) [22/07/2003 5:22:24 PM]

1.1 A First Example

to indicate that the hello program is in the current directory (as opposed to some ``bin'' directory full
of executable programs, elsewhere).
You may also have your choice of C compilers. On many Unix machines, the cc command is an older
compiler which does not recognize modern, ANSI Standard C syntax. An old compiler will accept the
simple programs we'll be starting with, but it will not accept most of our later programs. If you find
yourself getting baffling compilation errors on programs which you've typed in exactly as they're shown,
it probably indicates that you're using an older compiler. On many machines, another compiler called
acc or gcc is available, and you'll want to use it, instead. (Both acc and gcc are typically invoked the
same as cc; that is, the above cc command would instead be typed, say, gcc -o hello hello.c .)
(One final caveat about Unix systems: don't name your test programs test, because there's already a
standard command called test, and you and the command interpreter will get badly confused if you try
to replace the system's test command with your own, not least because your own almost certainly does
something completely different.)
Under MS-DOS, the compilation procedure is quite similar. The name of the command you type will
depend on your compiler (e.g. cl for the Microsoft C compiler, tc or bcc for Borland's Turbo C, etc.).
You may have to manually perform the second, linking step, perhaps with a command named link or
tlink. The executable file which the compiler/linker creates will have a name ending in .exe (or
perhaps .com), but you can still invoke it by typing the base name (e.g. hello). See your compiler
documentation for complete details; one of the manuals should contain a demonstration of how to enter,
compile, and run a small program that prints some simple output, just as we're trying to describe here.
In an integrated or ``visual'' progamming environment, such as those on the Macintosh or under various
versions of Microsoft Windows, the steps you take to enter, compile, and run a program are somewhat
different (and, theoretically, simpler). Typically, there is a way to open a new source window, type
source code into it, give it a file name, and add it to the program (or ``project'') you're building. If
necessary, there will be a way to specify what other source files (or ``modules'') make up the program.
Then, there's a button or menu selection which compiles and runs the program, all from within the
programming environment. (There will also be a way to create a standalone executable file which you
can run from outside the environment.) In a PC-compatible environment, you may have to choose
between creating DOS programs or Windows programs. (If you have troubles pertaining to the printf
function, try specifying a target environment of MS-DOS. Supposedly, some compilers which are
targeted at Windows environments won't let you call printf, because until you call some fancier
functions to request that a window be created, there's no window for printf to print to.) Again, check
the introductory or tutorial manual that came with the programming package; it should walk you through
the steps necessary to get your first program running.

Read sequentially: prev next up top


http://www.eskimo.com/~scs/cclass/notes/sx1a.html (4 of 5) [22/07/2003 5:22:24 PM]

1.1 A First Example

This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx1a.html (5 of 5) [22/07/2003 5:22:24 PM]

1.2 Second Example

1.2 Second Example


Our second example is of little more practical use than the first, but it introduces a few more
programming language elements:
#include <stdio.h>
/* print a few numbers, to illustrate a simple loop */
main()
{
int i;
for(i = 0; i < 10; i = i + 1)
printf("i is %d\n", i);
return 0;
}
As before, the line #include <stdio.h> is boilerplate which is necessary since we're calling the
printf function, and main() and the pair of braces {} indicate and delineate the function named
main we're (again) writing.
The first new line is the line
/* print a few numbers, to illustrate a simple loop */
which is a comment. Anything between the characters /* and */ is ignored by the compiler, but may be
useful to a person trying to read and understand the program. You can add comments anywhere you want
to in the program, to document what the program is, what it does, who wrote it, how it works, what the
various functions are for and how they work, what the various variables are for, etc.
The second new line, down within the function main, is
int i;
which declares that our function will use a variable named i. The variable's type is int, which is a plain
integer.
Next, we set up a loop:
for(i = 0; i < 10; i = i + 1)
http://www.eskimo.com/~scs/cclass/notes/sx1b.html (1 of 2) [22/07/2003 5:22:26 PM]

1.2 Second Example

The keyword for indicates that we are setting up a ``for loop.'' A for loop is controlled by three
expressions, enclosed in parentheses and separated by semicolons. These expressions say that, in this
case, the loop starts by setting i to 0, that it continues as long as i is less than 10, and that after each
iteration of the loop, i should be incremented by 1 (that is, have 1 added to its value).
Finally, we have a call to the printf function, as before, but with several differences. First, the call to
printf is within the body of the for loop. This means that control flow does not pass once through the
printf call, but instead that the call is performed as many times as are dictated by the for loop. In this
case, printf will be called several times: once when i is 0, once when i is 1, once when i is 2, and so
on until i is 9, for a total of 10 times.
A second difference in the printf call is that the string to be printed, "i is %d", contains a percent
sign. Whenever printf sees a percent sign, it indicates that printf is not supposed to print the exact
text of the string, but is instead supposed to read another one of its arguments to decide what to print. The
letter after the percent sign tells it what type of argument to expect and how to print it. In this case, the
letter d indicates that printf is to expect an int, and to print it in decimal. Finally, we see that
printf is in fact being called with another argument, for a total of two, separated by commas. The
second argument is the variable i, which is in fact an int, as required by %d. The effect of all of this is
that each time it is called, printf will print a line containing the current value of the variable i:
i is 0
i is 1
i is 2
...
After several trips through the loop, i will eventually equal 9. After that trip through the loop, the third
control expression i = i + 1 will increment its value to 10. The condition i < 10 is no longer true,
so no more trips through the loop are taken. Instead, control flow jumps down to the statement following
the for loop, which is the return statement. The main function returns, and the program is finished.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx1b.html (2 of 2) [22/07/2003 5:22:26 PM]

1.3 Program Structure

1.3 Program Structure


We'll have more to say later about program structure, but for now let's observe a few basics. A program
consists of one or more functions; it may also contain global variables. (Our two example programs so
far have contained one function apiece, and no global variables.) At the top of a source file are typically a
few boilerplate lines such as #include <stdio.h>, followed by the definitions (i.e. code) for the
functions. (It's also possible to split up the several functions making up a larger program into several
source files, as we'll see in a later chapter.)
Each function is further composed of declarations and statements, in that order. When a sequence of
statements should act as one (for example, when they should all serve together as the body of a loop)
they can be enclosed in braces (just as for the outer body of the entire function). The simplest kind of
statement is an expression statement, which is an expression (presumably performing some useful
operation) followed by a semicolon. Expressions are further composed of operators, objects (variables),
and constants.
C source code consists of several lexical elements. Some are words, such as for, return, main, and
i, which are either keywords of the language (for, return) or identifiers (names) we've chosen for our
own functions and variables (main, i). There are constants such as 1 and 10 which introduce new
values into the program. There are operators such as =, +, and >, which manipulate variables and values.
There are other punctuation characters (often called delimiters), such as parentheses and squiggly braces
{}, which indicate how the other elements of the program are grouped. Finally, all of the preceding
elements can be separated by whitespace: spaces, tabs, and the ``carriage returns'' between lines.
The source code for a C program is, for the most part, ``free form.'' This means that the compiler does not
care how the code is arranged: how it is broken into lines, how the lines are indented, or whether
whitespace is used between things like variable names and other punctuation. (Lines like #include
<stdio.h> are an exception; they must appear alone on their own lines, generally unbroken. Only
lines beginning with # are affected by this rule; we'll see other examples later.) You can use whitespace,
indentation, and appropriate line breaks to make your programs more readable for yourself and other
people (even though the compiler doesn't care). You can place explanatory comments anywhere in your
program--any text between the characters /* and */ is ignored by the compiler. (In fact, the compiler
pretends that all it saw was whitespace.) Though comments are ignored by the compiler, well-chosen
comments can make a program much easier to read (for its author, as well as for others).
The usage of whitespace is our first style issue. It's typical to leave a blank line between different parts of
the program, to leave a space on either side of operators such as + and =, and to indent the bodies of
loops and other control flow constructs. Typically, we arrange the indentation so that the subsidiary
statements controlled by a loop statement (the ``loop body,'' such as the printf call in our second
example program) are all aligned with each other and placed one tab stop (or some consistent number of
spaces) to the right of the controlling statement. This indentation (like all whitespace) is not required by
the compiler, but it makes programs much easier to read. (However, it can also be misleading, if used
http://www.eskimo.com/~scs/cclass/notes/sx1c.html (1 of 3) [22/07/2003 5:28:35 PM]

1.3 Program Structure

incorrectly or in the face of inadvertent mistakes. The compiler will decide what ``the body of the loop''
is based on its own rules, not the indentation, so if the indentation does not match the compiler's
interpretation, confusion is inevitable.)
To drive home the point that the compiler doesn't care about indentation, line breaks, or other
whitespace, here are a few (extreme) examples: The fragments
for(i = 0; i < 10; i = i + 1)
printf("%d\n", i);
and
for(i = 0; i < 10; i = i + 1) printf("%d\n", i);
and
for(i=0;i<10;i=i+1)printf("%d\n",i);
and
for(i = 0; i < 10; i = i + 1)
printf("%d\n", i);
and
for
=
i
;
i
)
"%d\n"
)

(
0
<
i
+
printf
,
;

i
;
10
=
1
(
i

and
for
(i=0;
i<10;i=
i+1)printf
("%d\n", i);

http://www.eskimo.com/~scs/cclass/notes/sx1c.html (2 of 3) [22/07/2003 5:28:35 PM]

1.3 Program Structure

are all treated exactly the same way by the compiler.


Some programmers argue forever over the best set of ``rules'' for indentation and other aspects of
programming style, calling to mind the old philosopher's debates about the number of angels that could
dance on the head of a pin. Style issues (such as how a program is laid out) are important, but they're not
something to be too dogmatic about, and there are also other, deeper style issues besides mere layout and
typography. Kernighan and Ritchie take a fairly moderate stance:
Although C compilers do not care about how a program looks, proper indentation and
spacing are critical in making programs easy for people to read. We recommend writing
only one statement per line, and using blanks around operators to clarify grouping. The
position of braces is less important, although people hold passionate beliefs. We have
chosen one of several popular styles. Pick a style that suits you, then use it consistently.
There is some value in having a reasonably standard style (or a few standard styles) for code layout.
Please don't take the above advice to ``pick a style that suits you'' as an invitation to invent your own
brand-new style. If (perhaps after you've been programming in C for a while) you have specific
objections to specific facets of existing styles, you're welcome to modify them, but if you don't have any
particular leanings, you're probably best off copying an existing style at first. (If you want to place your
own stamp of originality on the programs that you write, there are better avenues for your creativity than
inventing a bizarre layout; you might instead try to make the logic easier to follow, or the user interface
easier to use, or the code freer of bugs.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx1c.html (3 of 3) [22/07/2003 5:28:35 PM]

Chapter 2: Basic Data Types and Operators

Chapter 2: Basic Data Types and


Operators
The type of a variable determines what kinds of values it may take on. An operator computes new values
out of old ones. An expression consists of variables, constants, and operators combined to perform some
useful computation. In this chapter, we'll learn about C's basic types, how to write constants and declare
variables of these types, and what the basic operators are.
As Kernighan and Ritchie say, ``The type of an object determines the set of values it can have and what
operations can be performed on it.'' This is a fairly formal, mathematical definition of what a type is, but
it is traditional (and meaningful). There are several implications to remember:
1. The ``set of values'' is finite. C's int type can not represent all of the integers; its float type
can not represent all floating-point numbers.
2. When you're using an object (that is, a variable) of some type, you may have to remember what
values it can take on and what operations you can perform on it. For example, there are several
operators which play with the binary (bit-level) representation of integers, but these operators are
not meaningful for and may not be applied to floating-point operands.
3. When declaring a new variable and picking a type for it, you have to keep in mind the values and
operations you'll be needing.
In other words, picking a type for a variable is not some abstract academic exercise; it's closely
connected to the way(s) you'll be using that variable.
2.1 Types
2.2 Constants
2.3 Declarations
2.4 Variable Names
2.5 Arithmetic Operators
2.6 Assignment Operators
2.7 Function Calls

http://www.eskimo.com/~scs/cclass/notes/sx2.html (1 of 2) [22/07/2003 5:28:40 PM]

Chapter 2: Basic Data Types and Operators

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2.html (2 of 2) [22/07/2003 5:28:40 PM]

2.1 Types

2.1 Types
[This section corresponds to K&R Sec. 2.2]
There are only a few basic data types in C. The first ones we'll be encountering and using are:

char a character
int an integer, in the range -32,767 to 32,767
long int a larger integer (up to +-2,147,483,647)
float a floating-point number
double a floating-point number, with more precision and perhaps greater range than float

If you can look at this list of basic types and say to yourself, ``Oh, how simple, there are only a few
types, I won't have to worry much about choosing among them,'' you'll have an easy time with
declarations. (Some masochists wish that the type system were more complicated so that they could
specify more things about each variable, but those of us who would rather not have to specify these extra
things each time are glad that we don't have to.)
The ranges listed above for types int and long int are the guaranteed minimum ranges. On some
systems, either of these types (or, indeed, any C type) may be able to hold larger values, but a program
that depends on extended ranges will not be as portable. Some programmers become obsessed with
knowing exactly what the sizes of data objects will be in various situations, and go on to write programs
which depend on these exact sizes. Determining or controlling the size of an object is occasionally
important, but most of the time we can sidestep size issues and let the compiler do most of the worrying.
(From the ranges listed above, we can determine that type int must be at least 16 bits, and that type
long int must be at least 32 bits. But neither of these sizes is exact; many systens have 32-bit ints,
and some systems have 64-bit long ints.)
You might wonder how the computer stores characters. The answer involves a character set, which is
simply a mapping between some set of characters and some set of small numeric codes. Most machines
today use the ASCII character set, in which the letter A is represented by the code 65, the ampersand & is
represented by the code 38, the digit 1 is represented by the code 49, the space character is represented
by the code 32, etc. (Most of the time, of course, you have no need to know or even worry about these
particular code values; they're automatically translated into the right shapes on the screen or printer when
characters are printed out, and they're automatically generated when you type characters on the keyboard.
Eventually, though, we'll appreciate, and even take some control over, exactly when these translations-from characters to their numeric codes--are performed.) Character codes are usually small--the largest
code value in ASCII is 126, which is the ~ (tilde or circumflex) character. Characters usually fit in a byte,
which is usually 8 bits. In C, type char is defined as occupying one byte, so it is usually 8 bits.

http://www.eskimo.com/~scs/cclass/notes/sx2a.html (1 of 2) [22/07/2003 5:28:42 PM]

2.1 Types

Most of the simple variables in most programs are of types int, long int, or double. Typically,
we'll use int and double for most purposes, and long int any time we need to hold integer values
greater than 32,767. As we'll see, even when we're manipulating individual characters, we'll usually use
an int variable, for reasons to be discussed later. Therefore, we'll rarely use individual variables of type
char; although we'll use plenty of arrays of char.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2a.html (2 of 2) [22/07/2003 5:28:42 PM]

2.2 Constants

2.2 Constants
[This section corresponds to K&R Sec. 2.3]
A constant is just an immediate, absolute value found in an expression. The simplest constants are
decimal integers, e.g. 0, 1, 2, 123 . Occasionally it is useful to specify constants in base 8 or base 16
(octal or hexadecimal); this is done by prefixing an extra 0 (zero) for octal, or 0x for hexadecimal: the
constants 100, 0144, and 0x64 all represent the same number. (If you're not using these non-decimal
constants, just remember not to use any leading zeroes. If you accidentally write 0123 intending to get
one hundred and twenty three, you'll get 83 instead, which is 123 base 8.)
We write constants in decimal, octal, or hexadecimal for our convenience, not the compiler's. The
compiler doesn't care; it always converts everything into binary internally, anyway. (There is, however,
no good way to specify constants in source code in binary.)
A constant can be forced to be of type long int by suffixing it with the letter L (in upper or lower
case, although upper case is strongly recommended, because a lower case l looks too much like the digit
1).
A constant that contains a decimal point or the letter e (or both) is a floating-point constant: 3.14, 10.,
.01, 123e4, 123.456e7 . The e indicates multiplication by a power of 10; 123.456e7 is 123.456
times 10 to the 7th, or 1,234,560,000. (Floating-point constants are of type double by default.)
We also have constants for specifying characters and strings. (Make sure you understand the difference
between a character and a string: a character is exactly one character; a string is a set of zero or more
characters; a string containing one character is distinct from a lone character.) A character constant is
simply a single character between single quotes: 'A', '.', '%'. The numeric value of a character
constant is, naturally enough, that character's value in the machine's character set. (In ASCII, for
example, 'A' has the value 65.)
A string is represented in C as a sequence or array of characters. (We'll have more to say about arrays in
general, and strings in particular, later.) A string constant is a sequence of zero or more characters
enclosed in double quotes: "apple", "hello, world", "this is a test".
Within character and string constants, the backslash character \ is special, and is used to represent
characters not easily typed on the keyboard or for various reasons not easily typed in constants. The most
common of these ``character escapes'' are:

\n
\b

a ``newline'' character
a backspace

http://www.eskimo.com/~scs/cclass/notes/sx2b.html (1 of 2) [22/07/2003 5:28:44 PM]

2.2 Constants

\r
\'
\"
\\

a
a
a
a

carriage return (without a line feed)


single quote (e.g. in a character constant)
double quote (e.g. in a string constant)
single backslash

For example, "he said \"hi\"" is a string constant which contains two double quotes, and '\'' is
a character constant consisting of a (single) single quote. Notice once again that the character constant
'A' is very different from the string constant "A".

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2b.html (2 of 2) [22/07/2003 5:28:44 PM]

2.3 Declarations

2.3 Declarations
[This section corresponds to K&R Sec. 2.4]
Informally, a variable (also called an object) is a place you can store a value. So that you can refer to it
unambiguously, a variable needs a name. You can think of the variables in your program as a set of
boxes or cubbyholes, each with a label giving its name; you might imagine that storing a value ``in'' a
variable consists of writing the value on a slip of paper and placing it in the cubbyhole.
A declaration tells the compiler the name and type of a variable you'll be using in your program. In its
simplest form, a declaration consists of the type, the name of the variable, and a terminating semicolon:
char c;
int i;
float f;
You can also declare several variables of the same type in one declaration, separating them with
commas:
int i1, i2;
Later we'll see that declarations may also contain initializers, qualifiers and storage classes, and that we
can declare arrays, functions, pointers, and other kinds of data structures.
The placement of declarations is significant. You can't place them just anywhere (i.e. they cannot be
interspersed with the other statements in your program). They must either be placed at the beginning of a
function, or at the beginning of a brace-enclosed block of statements (which we'll learn about in the next
chapter), or outside of any function. Furthermore, the placement of a declaration, as well as its storage
class, controls several things about its visibility and lifetime, as we'll see later.
You may wonder why variables must be declared before use. There are two reasons:
1. It makes things somewhat easier on the compiler; it knows right away what kind of storage to
allocate and what code to emit to store and manipulate each variable; it doesn't have to try to intuit
the programmer's intentions.
2. It forces a bit of useful discipline on the programmer: you cannot introduce variables willy-nilly;
you must think about them enough to pick appropriate types for them. (The compiler's error
messages to you, telling you that you apparently forgot to declare a variable, are as often helpful
as they are a nuisance: they're helpful when they tell you that you misspelled a variable, or forgot
to think about exactly how you were going to use it.)

http://www.eskimo.com/~scs/cclass/notes/sx2c.html (1 of 2) [22/07/2003 5:28:48 PM]

2.3 Declarations

Although there are a few places where declarations can be omitted (in which case the compiler will
assume an implicit declaration), making use of these removes the advantages of reason 2 above, so I
recommend always declaring everything explicitly.
Most of the time, I recommend writing one declaration per line. For the most part, the compiler doesn't
care what order declarations are in. You can order the declarations alphabetically, or in the order that
they're used, or to put related declarations next to each other. Collecting all variables of the same type
together on one line essentially orders declarations by type, which isn't a very useful order (it's only
slightly more useful than random order).
A declaration for a variable can also contain an initial value. This initializer consists of an equals sign
and an expression, which is usually a single constant:
int i = 1;
int i1 = 10, i2 = 20;

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2c.html (2 of 2) [22/07/2003 5:28:48 PM]

2.4 Variable Names

2.4 Variable Names


[This section corresponds to K&R Sec. 2.1]
Within limits, you can give your variables and functions any names you want. These names (the formal
term is ``identifiers'') consist of letters, numbers, and underscores. For our purposes, names must begin
with a letter. Theoretically, names can be as long as you want, but extremely long ones get tedious to
type after a while, and the compiler is not required to keep track of extremely long ones perfectly. (What
this means is that if you were to name a variable, say,
supercalafragalisticespialidocious, the compiler might get lazy and pretend that you'd
named it supercalafragalisticespialidocio, such that if you later misspelled it
supercalafragalisticespialidociouz, the compiler wouldn't catch your mistake. Nor would
the compiler necessarily be able to tell the difference if for some perverse reason you deliberately
declared a second variable named supercalafragalisticespialidociouz.)
The capitalization of names in C is significant: the variable names variable, Variable, and
VARIABLE (as well as silly combinations like variAble) are all distinct.
A final restriction on names is that you may not use keywords (the words such as int and for which
are part of the syntax of the language) as the names of variables or functions (or as identifiers of any
kind).

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2d.html [22/07/2003 5:29:03 PM]

2.5 Arithmetic Operators

2.5 Arithmetic Operators


[This section corresponds to K&R Sec. 2.5]
The basic operators for performing arithmetic are the same in many computer languages:

+
*
/
%

addition
subtraction
multiplication
division
modulus (remainder)

The - operator can be used in two ways: to subtract two numbers (as in a - b), or to negate one
number (as in -a + b or a + -b).
When applied to integers, the division operator / discards any remainder, so 1 / 2 is 0 and 7 / 4 is
1. But when either operand is a floating-point quantity (type float or double), the division operator
yields a floating-point result, with a potentially nonzero fractional part. So 1 / 2.0 is 0.5, and 7.0 /
4.0 is 1.75.
The modulus operator % gives you the remainder when two integers are divided: 1 % 2 is 1; 7 % 4 is
3. (The modulus operator can only be applied to integers.)
An additional arithmetic operation you might be wondering about is exponentiation. Some languages
have an exponentiation operator (typically ^ or **), but C doesn't. (To square or cube a number, just
multiply it by itself.)
Multiplication, division, and modulus all have higher precedence than addition and subtraction. The term
``precedence'' refers to how ``tightly'' operators bind to their operands (that is, to the things they operate
on). In mathematics, multiplication has higher precedence than addition, so 1 + 2 * 3 is 7, not 9. In
other words, 1 + 2 * 3 is equivalent to 1 + (2 * 3). C is the same way.
All of these operators ``group'' from left to right, which means that when two or more of them have the
same precedence and participate next to each other in an expression, the evaluation conceptually
proceeds from left to right. For example, 1 - 2 - 3 is equivalent to (1 - 2) - 3 and gives -4, not
+2. (``Grouping'' is sometimes called associativity, although the term is used somewhat differently in
programming than it is in mathematics. Not all C operators group from left to right; a few group from
right to left.)
Whenever the default precedence or associativity doesn't give you the grouping you want, you can
http://www.eskimo.com/~scs/cclass/notes/sx2e.html (1 of 2) [22/07/2003 5:29:07 PM]

2.5 Arithmetic Operators

always use explicit parentheses. For example, if you wanted to add 1 to 2 and then multiply the result by
3, you could write (1 + 2) * 3.
By the way, the word ``arithmetic'' as used in the title of this section is an adjective, not a noun, and it's
pronounced differently than the noun: the accent is on the third syllable.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2e.html (2 of 2) [22/07/2003 5:29:07 PM]

2.6 Assignment Operators

2.6 Assignment Operators


[This section corresponds to K&R Sec. 2.10]
The assignment operator = assigns a value to a variable. For example,
x = 1
sets x to 1, and
a = b
sets a to whatever b's value is. The expression
i = i + 1
is, as we've mentioned elsewhere, the standard programming idiom for increasing a variable's value by 1:
this expression takes i's old value, adds 1 to it, and stores it back into i. (C provides several ``shortcut''
operators for modifying variables in this and similar ways, which we'll meet later.)
We've called the = sign the ``assignment operator'' and referred to ``assignment expressions'' because, in
fact, = is an operator just like + or -. C does not have ``assignment statements''; instead, an assignment
like a = b is an expression and can be used wherever any expression can appear. Since it's an
expression, the assignment a = b has a value, namely, the same value that's assigned to a. This value
can then be used in a larger expression; for example, we might write
c = a = b
which is equivalent to
c = (a = b)
and assigns b's value to both a and c. (The assignment operator, therefore, groups from right to left.)
Later we'll see other circumstances in which it can be useful to use the value of an assignment
expression.
It's usually a matter of style whether you initialize a variable with an initializer in its declaration or with
an assignment expression near where you first use it. That is, there's no particular difference between
int a = 10;

http://www.eskimo.com/~scs/cclass/notes/sx2f.html (1 of 2) [22/07/2003 5:29:09 PM]

2.6 Assignment Operators

and
int a;
/* later... */
a = 10;

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx2f.html (2 of 2) [22/07/2003 5:29:09 PM]

2.7 Function Calls

2.7 Function Calls


We'll have much more to say about functions in a later chapter, but for now let's just look at how they're
called. (To review: what a function is is a piece of code, written by you or by someone else, which
performs some useful, compartmentalizable task.) You call a function by mentioning its name followed
by a pair of parentheses. If the function takes any arguments, you place the arguments between the
parentheses, separated by commas. These are all function calls:
printf("Hello, world!\n")
printf("%d\n", i)
sqrt(144.)
getchar()
The arguments to a function can be arbitrary expressions. Therefore, you don't have to say things like
int sum = a + b + c;
printf("sum = %d\n", sum);
if you don't want to; you can instead collapse it to
printf("sum = %d\n", a + b + c);
Many functions return values, and when they do, you can embed calls to these functions within larger
expressions:
c = sqrt(a * a + b * b)
x = r * cos(theta)
i = f1(f2(j))
The first expression squares a and b, computes the square root of the sum of the squares, and assigns the
result to c. (In other words, it computes a * a + b * b, passes that number to the sqrt function,
and assigns sqrt's return value to c.) The second expression passes the value of the variable theta to
the cos (cosine) function, multiplies the result by r, and assigns the result to x. The third expression
passes the value of the variable j to the function f2, passes the return value of f2 immediately to the
function f1, and finally assigns f1's return value to the variable i.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback
http://www.eskimo.com/~scs/cclass/notes/sx2g.html (1 of 2) [22/07/2003 5:29:12 PM]

2.7 Function Calls

http://www.eskimo.com/~scs/cclass/notes/sx2g.html (2 of 2) [22/07/2003 5:29:12 PM]

Chapter 3: Statements and Control Flow

Chapter 3: Statements and Control Flow


Statements are the ``steps'' of a program. Most statements compute and assign values or call functions,
but we will eventually meet several other kinds of statements as well. By default, statements are executed
in sequence, one after another. We can, however, modify that sequence by using control flow constructs
which arrange that a statement or group of statements is executed only if some condition is true or false,
or executed over and over again to form a loop. (A somewhat different kind of control flow happens
when we call a function: execution of the caller is suspended while the called function proceeds. We'll
discuss functions in chapter 5.)
My definitions of the terms statement and control flow are somewhat circular. A statement is an element
within a program which you can apply control flow to; control flow is how you specify the order in
which the statements in your program are executed. (A weaker definition of a statement might be ``a part
of your program that does something,'' but this definition could as easily be applied to expressions or
functions.)
3.1 Expression Statements
3.2 if Statements
3.3 Boolean Expressions
3.4 while Loops
3.5 for Loops
3.6 break and continue

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3.html [22/07/2003 5:29:20 PM]

3.1 Expression Statements

3.1 Expression Statements


[This section corresponds to K&R Sec. 3.1]
Most of the statements in a C program are expression statements. An expression statement is simply an
expression followed by a semicolon. The lines
i = 0;
i = i + 1;
and
printf("Hello, world!\n");
are all expression statements. (In some languages, such as Pascal, the semicolon separates statements,
such that the last statement is not followed by a semicolon. In C, however, the semicolon is a statement
terminator; all simple statements are followed by semicolons. The semicolon is also used for a few other
things in C; we've already seen that it terminates declarations, too.)
Expression statements do all of the real work in a C program. Whenever you need to compute new values
for variables, you'll typically use expression statements (and they'll typically contain assignment
operators). Whenever you want your program to do something visible, in the real world, you'll typically
call a function (as part of an expression statement). We've already seen the most basic example: calling
the function printf to print text to the screen. But anything else you might do--read or write a disk file,
talk to a modem or printer, draw pictures on the screen--will also involve function calls. (Furthermore,
the functions you call to do these things are usually different depending on which operating system
you're using. The C language does not define them, so we won't be talking about or using them much.)
Expressions and expression statements can be arbitrarily complicated. They don't have to consist of
exactly one simple function call, or of one simple assignment to a variable. For one thing, many
functions return values, and the values they return can then be used by other parts of the expression. For
example, C provides a sqrt (square root) function, which we might use to compute the hypotenuse of a
right triangle like this:
c = sqrt(a*a + b*b);
To be useful, an expression statement must do something; it must have some lasting effect on the state of
the program. (Formally, a useful statement must have at least one side effect.) The first two sample
expression statements in this section (above) assign new values to the variable i, and the third one calls
printf to print something out, and these are good examples of statements that do something useful.
(To make the distinction clear, we may note that degenerate constructions such as

http://www.eskimo.com/~scs/cclass/notes/sx3a.html (1 of 2) [22/07/2003 5:29:22 PM]

3.1 Expression Statements

0;
i;
or
i + 1;
are syntactically valid statements--they consist of an expression followed by a semicolon--but in each
case, they compute a value without doing anything with it, so the computed value is discarded, and the
statement is useless. But if the ``degenerate'' statements in this paragraph don't make much sense to you,
don't worry; it's because they, frankly, don't make much sense.)
It's also possible for a single expression to have multiple side effects, but it's easy for such an expression
to be (a) confusing or (b) undefined. For now, we'll only be looking at expressions (and, therefore,
statements) which do one well-defined thing at a time.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3a.html (2 of 2) [22/07/2003 5:29:22 PM]

3.2 <TT>if</TT> Statements

3.2 if Statements
[This section corresponds to K&R Sec. 3.2]
The simplest way to modify the control flow of a program is with an if statement, which in its simplest
form looks like this:
if(x > max)
max = x;
Even if you didn't know any C, it would probably be pretty obvious that what happens here is that if x is
greater than max, x gets assigned to max. (We'd use code like this to keep track of the maximum value
of x we'd seen--for each new x, we'd compare it to the old maximum value max, and if the new value
was greater, we'd update max.)
More generally, we can say that the syntax of an if statement is:
if( expression )
statement
where expression is any expression and statement is any statement.
What if you have a series of statements, all of which should be executed together or not at all depending
on whether some condition is true? The answer is that you enclose them in braces:
if( expression )
{
statement<sub>1</sub>
statement<sub>2</sub>
statement<sub>3</sub>
}
As a general rule, anywhere the syntax of C calls for a statement, you may write a series of statements
enclosed by braces. (You do not need to, and should not, put a semicolon after the closing brace, because
the series of statements enclosed by braces is not itself a simple expression statement.)
An if statement may also optionally contain a second statement, the ``else clause,'' which is to be
executed if the condition is not met. Here is an example:
if(n > 0)
average = sum / n;
http://www.eskimo.com/~scs/cclass/notes/sx3b.html (1 of 4) [22/07/2003 5:29:24 PM]

3.2 <TT>if</TT> Statements

else

{
printf("can't compute average\n");
average = 0;
}

The first statement or block of statements is executed if the condition is true, and the second statement or
block of statements (following the keyword else) is executed if the condition is not true. In this
example, we can compute a meaningful average only if n is greater than 0; otherwise, we print a message
saying that we cannot compute the average. The general syntax of an if statement is therefore
if( expression )
statement<sub>1</sub>
else
statement<sub>2</sub>
(where both statement<sub>1</sub> and statement<sub>2</sub> may be lists of statements
enclosed in braces).
It's also possible to nest one if statement inside another. (For that matter, it's in general possible to nest
any kind of statement or control flow construct within another.) For example, here is a little piece of code
which decides roughly which quadrant of the compass you're walking into, based on an x value which is
positive if you're walking east, and a y value which is positive if you're walking north:
if(x > 0)
{
if(y > 0)
printf("Northeast.\n");
else
printf("Southeast.\n");
}
else
{
if(y > 0)
printf("Northwest.\n");
else
printf("Southwest.\n");
}
When you have one if statement (or loop) nested inside another, it's a very good idea to use explicit
braces {}, as shown, to make it clear (both to you and to the compiler) how they're nested and which
else goes with which if. It's also a good idea to indent the various levels, also as shown, to make the
code more readable to humans. Why do both? You use indentation to make the code visually more
readable to yourself and other humans, but the compiler doesn't pay attention to the indentation (since all
whitespace is essentially equivalent and is essentially ignored). Therefore, you also have to make sure
that the punctuation is right.
http://www.eskimo.com/~scs/cclass/notes/sx3b.html (2 of 4) [22/07/2003 5:29:24 PM]

3.2 <TT>if</TT> Statements

Here is an example of another common arrangement of if and else. Suppose we have a variable
grade containing a student's numeric grade, and we want to print out the corresponding letter grade.
Here is code that would do the job:
if(grade >= 90)
printf("A");
else if(grade >= 80)
printf("B");
else if(grade >= 70)
printf("C");
else if(grade >= 60)
printf("D");
else
printf("F");
What happens here is that exactly one of the five printf calls is executed, depending on which of the
conditions is true. Each condition is tested in turn, and if one is true, the corresponding statement is
executed, and the rest are skipped. If none of the conditions is true, we fall through to the last one,
printing ``F''.
In the cascaded if/else/if/else/... chain, each else clause is another if statement. This may be
more obvious at first if we reformat the example, including every set of braces and indenting each if
statement relative to the previous one:
if(grade >= 90)
{
printf("A");
}
else
{
if(grade >= 80)
{
printf("B");
}
else
{
if(grade >= 70)
{
printf("C");
}
else
{
if(grade >= 60)
{
printf("D");
}
http://www.eskimo.com/~scs/cclass/notes/sx3b.html (3 of 4) [22/07/2003 5:29:24 PM]

3.2 <TT>if</TT> Statements

else

{
printf("F");
}

}
}
}
By examining the code this way, it should be obvious that exactly one of the printf calls is executed,
and that whenever one of the conditions is found true, the remaining conditions do not need to be
checked and none of the later statements within the chain will be executed. But once you've convinced
yourself of this and learned to recognize the idiom, it's generally preferable to arrange the statements as
in the first example, without trying to indent each successive if statement one tabstop further out.
(Obviously, you'd run into the right margin very quickly if the chain had just a few more cases!)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3b.html (4 of 4) [22/07/2003 5:29:24 PM]

3.3 Boolean Expressions

3.3 Boolean Expressions


An if statement like
if(x > max)
max = x;
is perhaps deceptively simple. Conceptually, we say that it checks whether the condition x > max is
``true'' or ``false''. The mechanics underlying C's conception of ``true'' and ``false,'' however, deserve
some explanation. We need to understand how true and false values are represented, and how they are
interpreted by statements like if.
As far as C is concerned, a true/false condition can be represented as an integer. (An integer can
represent many values; here we care about only two values: ``true'' and ``false.'' The study of
mathematics involving only two values is called Boolean algebra, after George Boole, a mathematician
who refined this study.) In C, ``false'' is represented by a value of 0 (zero), and ``true'' is represented by
any value that is nonzero. Since there are many nonzero values (at least 65,534, for values of type int),
when we have to pick a specific value for ``true,'' we'll pick 1.
The relational operators such as <, <=, >, and >= are in fact operators, just like +, -, *, and /. The
relational operators take two values, look at them, and ``return'' a value of 1 or 0 depending on whether
the tested relation was true or false. The complete set of relational operators in C is:

<
<=
>
>=
==
!=

less than
less than or equal
greater than
greater than or equal
equal
not equal

For example, 1 < 2 is 1, 3 > 4 is 0, 5 == 5 is 1, and 6 != 6 is 0.


We've now encountered perhaps the most easy-to-stumble-on ``gotcha!'' in C: the equality-testing
operator is ==, not a single =, which is assignment. If you accidentally write
if(a = 0)
(and you probably will at some point; everybody makes this mistake), it will not test whether a is zero,
as you probably intended. Instead, it will assign 0 to a, and then perform the ``true'' branch of the if
http://www.eskimo.com/~scs/cclass/notes/sx3c.html (1 of 4) [22/07/2003 5:29:32 PM]

3.3 Boolean Expressions

statement if a is nonzero. But a will have just been assigned the value 0, so the ``true'' branch will never
be taken! (This could drive you crazy while debugging--you wanted to do something if a was 0, and after
the test, a is 0, whether it was supposed to be or not, but the ``true'' branch is nevertheless not taken.)
The relational operators work with arbitrary numbers and generate true/false values. You can also
combine true/false values by using the Boolean operators, which take true/false values as operands and
compute new true/false values. The three Boolean operators are:

&&
||
!

and
or
not (takes one operand; ``unary'')

The && (``and'') operator takes two true/false values and produces a true (1) result if both operands are
true (that is, if the left-hand side is true and the right-hand side is true). The || (``or'') operator takes two
true/false values and produces a true (1) result if either operand is true. The ! (``not'') operator takes a
single true/false value and negates it, turning false to true and true to false (0 to 1 and nonzero to 0).
For example, to test whether the variable i lies between 1 and 10, you might use
if(1 < i && i < 10)
...
Here we're expressing the relation ``i is between 1 and 10'' as ``1 is less than i and i is less than 10.''
It's important to understand why the more obvious expression
if(1 < i < 10)

/* WRONG */

would not work. The expression 1 < i < 10 is parsed by the compiler analogously to 1 + i + 10.
The expression 1 + i + 10 is parsed as (1 + i) + 10 and means ``add 1 to i, and then add the
result to 10.'' Similarly, the expression 1 < i < 10 is parsed as (1 < i) < 10 and means ``see if 1
is less than i, and then see if the result is less than 10.'' But in this case, ``the result'' is 1 or 0, depending
on whether i is greater than 1. Since both 0 and 1 are less than 10, the expression 1 < i < 10 would
always be true in C, regardless of the value of i!
Relational and Boolean expressions are usually used in contexts such as an if statement, where
something is to be done or not done depending on some condition. In these cases what's actually checked
is whether the expression representing the condition has a zero or nonzero value. As long as the
expression is a relational or Boolean expression, the interpretation is just what we want. For example,
http://www.eskimo.com/~scs/cclass/notes/sx3c.html (2 of 4) [22/07/2003 5:29:32 PM]

3.3 Boolean Expressions

when we wrote
if(x > max)
the > operator produced a 1 if x was greater than max, and a 0 otherwise. The if statement interprets 0
as false and 1 (or any nonzero value) as true.
But what if the expression is not a relational or Boolean expression? As far as C is concerned, the
controlling expression (of conditional statements like if) can in fact be any expression: it doesn't have to
``look like'' a Boolean expression; it doesn't have to contain relational or logical operators. All C looks at
(when it's evaluating an if statement, or anywhere else where it needs a true/false value) is whether the
expression evaluates to 0 or nonzero. For example, if you have a variable x, and you want to do
something if x is nonzero, it's possible to write
if(x)
statement
and the statement will be executed if x is nonzero (since nonzero means ``true'').
This possibility (that the controlling expression of an if statement doesn't have to ``look like'' a Boolean
expression) is both useful and potentially confusing. It's useful when you have a variable or a function
that is ``conceptually Boolean,'' that is, one that you consider to hold a true or false (actually nonzero or
zero) value. For example, if you have a variable verbose which contains a nonzero value when your
program should run in verbose mode and zero when it should be quiet, you can write things like
if(verbose)
printf("Starting first pass\n");
and this code is both legal and readable, besides which it does what you want. The standard library
contains a function isupper() which tests whether a character is an upper-case letter, so if c is a
character, you might write
if(isupper(c))
...
Both of these examples (verbose and isupper()) are useful and readable.
However, you will eventually come across code like
if(n)
average = sum / n;

http://www.eskimo.com/~scs/cclass/notes/sx3c.html (3 of 4) [22/07/2003 5:29:32 PM]

3.3 Boolean Expressions

where n is just a number. Here, the programmer wants to compute the average only if n is nonzero
(otherwise, of course, the code would divide by 0), and the code works, because, in the context of the if
statement, the trivial expression n is (as always) interpreted as ``true'' if it is nonzero, and ``false'' if it is
zero.
``Coding shortcuts'' like these can seem cryptic, but they're also quite common, so you'll need to be able
to recognize them even if you don't choose to write them in your own code. Whenever you see code like
if(x)
or
if(f())
where x or f() do not have obvious ``Boolean'' names, you can read them as ``if x is nonzero'' or ``if
f() returns nonzero.''

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3c.html (4 of 4) [22/07/2003 5:29:32 PM]

3.4 <TT>while</TT> Loops

3.4 while Loops


[This section corresponds to half of K&R Sec. 3.5]
Loops generally consist of two parts: one or more control expressions which (not surprisingly) control
the execution of the loop, and the body, which is the statement or set of statements which is executed
over and over.
The most basic loop in C is the while loop. A while loop has one control expression, and executes as
long as that expression is true. This example repeatedly doubles the number 2 (2, 4, 8, 16, ...) and prints
the resulting numbers as long as they are less than 1000:
int x = 2;
while(x < 1000)
{
printf("%d\n", x);
x = x * 2;
}
(Once again, we've used braces {} to enclose the group of statements which are to be executed together
as the body of the loop.)
The general syntax of a while loop is
while( expression )
statement
A while loop starts out like an if statement: if the condition expressed by the expression is true, the
statement is executed. However, after executing the statement, the condition is tested again, and if it's
still true, the statement is executed again. (Presumably, the condition depends on some value which is
changed in the body of the loop.) As long as the condition remains true, the body of the loop is executed
over and over again. (If the condition is false right at the start, the body of the loop is not executed at all.)
As another example, if you wanted to print a number of blank lines, with the variable n holding the
number of blank lines to be printed, you might use code like this:
while(n > 0)
{
printf("\n");
n = n - 1;
http://www.eskimo.com/~scs/cclass/notes/sx3d.html (1 of 2) [22/07/2003 5:29:34 PM]

3.4 <TT>while</TT> Loops

}
After the loop finishes (when control ``falls out'' of it, due to the condition being false), n will have the
value 0.
You use a while loop when you have a statement or group of statements which may have to be
executed a number of times to complete their task. The controlling expression represents the condition
``the loop is not done'' or ``there's more work to do.'' As long as the expression is true, the body of the
loop is executed; presumably, it makes at least some progress at its task. When the expression becomes
false, the task is done, and the rest of the program (beyond the loop) can proceed. When we think about a
loop in this way, we can seen an additional important property: if the expression evaluates to ``false''
before the very first trip through the loop, we make zero trips through the loop. In other words, if the task
is already done (if there's no work to do) the body of the loop is not executed at all. (It's always a good
idea to think about the ``boundary conditions'' in a piece of code, and to make sure that the code will
work correctly when there is no work to do, or when there is a trivial task to do, such as sorting an array
of one number. Experience has shown that bugs at boundary conditions are quite common.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3d.html (2 of 2) [22/07/2003 5:29:34 PM]

3.5 <TT>for</TT> Loops

3.5 for Loops


[This section corresponds to the other half of K&R Sec. 3.5]
Our second loop, which we've seen at least one example of already, is the for loop. The first one we
saw was:
for (i = 0; i < 10; i = i + 1)
printf("i is %d\n", i);
More generally, the syntax of a for loop is
for( expr<sub>1</sub> ; expr<sub>2</sub> ; expr<sub>3</sub> )
statement
(Here we see that the for loop has three control expressions. As always, the statement can be a braceenclosed block.)
Many loops are set up to cause some variable to step through a range of values, or, more generally, to set
up an initial condition and then modify some value to perform each succeeding loop as long as some
condition is true. The three expressions in a for loop encapsulate these conditions:
expr<sub>1</sub> sets up the initial condition, expr<sub>2</sub> tests whether another trip
through the loop should be taken, and expr<sub>3</sub> increments or updates things after each trip
through the loop and prior to the next one. In our first example, we had i = 0 as expr<sub>1</sub>,
i < 10 as expr<sub>2</sub>, i = i + 1 as expr<sub>3</sub>, and the call to printf as
statement, the body of the loop. So the loop began by setting i to 0, proceeded as long as i was less than
10, printed out i's value during each trip through the loop, and added 1 to i between each trip through
the loop.
When the compiler sees a for loop, first, expr<sub>1</sub> is evaluated. Then,
expr<sub>2</sub> is evaluated, and if it is true, the body of the loop (statement) is executed. Then,
expr<sub>3</sub> is evaluated to go to the next step, and expr<sub>2</sub> is evaluated again, to
see if there is a next step. During the execution of a for loop, the sequence is:
expr<sub>1</sub>
expr<sub>2</sub>
statement
expr<sub>3</sub>
expr<sub>2</sub>
statement
expr<sub>3</sub>
http://www.eskimo.com/~scs/cclass/notes/sx3e.html (1 of 4) [22/07/2003 5:29:36 PM]

3.5 <TT>for</TT> Loops

...
expr<sub>2</sub>
statement
expr<sub>3</sub>
expr<sub>2</sub>
The first thing executed is expr<sub>1</sub>. expr<sub>3</sub> is evaluated after every trip
through the loop. The last thing executed is always expr<sub>2</sub>, because when
expr<sub>2</sub> evaluates false, the loop exits.
All three expressions of a for loop are optional. If you leave out expr<sub>1</sub>, there simply is
no initialization step, and the variable(s) used with the loop had better have been initialized already. If
you leave out expr<sub>2</sub>, there is no test, and the default for the for loop is that another trip
through the loop should be taken (such that unless you break out of it some other way, the loop runs
forever). If you leave out expr<sub>3</sub>, there is no increment step.
The semicolons separate the three controlling expressions of a for loop. (These semicolons, by the way,
have nothing to do with statement terminators.) If you leave out one or more of the expressions, the
semicolons remain. Therefore, one way of writing a deliberately infinite loop in C is
for(;;)
...
It's useful to compare C's for loop to the equivalent loops in other computer languages you might know.
The C loop
for(i = x; i <= y; i = i + z)
is roughly equivalent to:
for I = X to Y step Z

(BASIC)

do 10 i=x,y,z

(FORTRAN)

for i := x to y

(Pascal)

In C (unlike FORTRAN), if the test condition is false before the first trip through the loop, the loop won't
be traversed at all. In C (unlike Pascal), a loop control variable (in this case, i) is guaranteed to retain its
final value after the loop completes, and it is also legal to modify the control variable within the loop, if
you really want to. (When the loop terminates due to the test condition turning false, the value of the
control variable after the loop will be the first value for which the condition failed, not the last value for
which it succeeded.)
http://www.eskimo.com/~scs/cclass/notes/sx3e.html (2 of 4) [22/07/2003 5:29:36 PM]

3.5 <TT>for</TT> Loops

It's also worth noting that a for loop can be used in more general ways than the simple, iterative
examples we've seen so far. The ``control variable'' of a for loop does not have to be an integer, and it
does not have to be incremented by an additive increment. It could be ``incremented'' by a multiplicative
factor (1, 2, 4, 8, ...) if that was what you needed, or it could be a floating-point variable, or it could be
another type of variable which we haven't met yet which would step, not over numeric values, but over
the elements of an array or other data structure. Strictly speaking, a for loop doesn't have to have a
``control variable'' at all; the three expressions can be anything, although the loop will make the most
sense if they are related and together form the expected initialize, test, increment sequence.
The powers-of-two example of the previous section does fit this pattern, so we could rewrite it like this:
int x;
for(x = 2; x < 1000; x = x * 2)
printf("%d\n", x);
There is no earth-shaking or fundamental difference between the while and for loops. In fact, given
the general for loop
for(expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>)
statement
you could usually rewrite it as a while loop, moving the initialize and increment expressions to
statements before and within the loop:
expr<sub>1</sub> ;
while(expr<sub>2</sub>)
{
statement
expr<sub>3</sub> ;
}
Similarly, given the general while loop
while(expr)
statement
you could rewrite it as a for loop:
for(; expr; )
statement

http://www.eskimo.com/~scs/cclass/notes/sx3e.html (3 of 4) [22/07/2003 5:29:36 PM]

3.5 <TT>for</TT> Loops

Another contrast between the for and while loops is that although the test expression
(expr<sub>2</sub>) is optional in a for loop, it is required in a while loop. If you leave out the
controlling expression of a while loop, the compiler will complain about a syntax error. (To write a
deliberately infinite while loop, you have to supply an expression which is always nonzero. The most
obvious one would simply be while(1) .)
If it's possible to rewrite a for loop as a while loop and vice versa, why do they both exist? Which one
should you choose? In general, when you choose a for loop, its three expressions should all manipulate
the same variable or data structure, using the initialize, test, increment pattern. If they don't manipulate
the same variable or don't follow that pattern, wedging them into a for loop buys nothing and a while
loop would probably be clearer. (The reason that one loop or the other can be clearer is simply that, when
you see a for loop, you expect to see an idiomatic initialize/test/increment of a single variable, and if the
for loop you're looking at doesn't end up matching that pattern, you've been momentarily misled.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3e.html (4 of 4) [22/07/2003 5:29:36 PM]

3.6 <TT>break</TT> and <TT>continue</TT>

3.6 break and continue


[This section corresponds to K&R Sec. 3.7]
Sometimes, due to an exceptional condition, you need to jump out of a loop early, that is, before the main
controlling expression of the loop causes it to terminate normally. Other times, in an elaborate loop, you
may want to jump back to the top of the loop (to test the controlling expression again, and perhaps begin
a new trip through the loop) without playing out all the steps of the current loop. The break and
continue statements allow you to do these two things. (They are, in fact, essentially restricted forms of
goto.)
To put everything we've seen in this chapter together, as well as demonstrate the use of the break
statement, here is a program for printing prime numbers between 1 and 100:
#include <stdio.h>
#include <math.h>
main()
{
int i, j;
printf("%d\n", 2);
for(i = 3; i <= 100; i = i + 1)
{
for(j = 2; j < i; j = j + 1)
{
if(i % j == 0)
break;
if(j > sqrt(i))
{
printf("%d\n", i);
break;
}
}
}
return 0;
}
The outer loop steps the variable i through the numbers from 3 to 100; the code tests to see if each
number has any divisors other than 1 and itself. The trial divisor j loops from 2 up to i. j is a divisor of

http://www.eskimo.com/~scs/cclass/notes/sx3f.html (1 of 2) [22/07/2003 5:30:34 PM]

3.6 <TT>break</TT> and <TT>continue</TT>

i if the remainder of i divided by j is 0, so the code uses C's ``remainder'' or ``modulus'' operator % to
make this test. (Remember that i % j gives the remainder when i is divided by j.)
If the program finds a divisor, it uses break to break out of the inner loop, without printing anything.
But if it notices that j has risen higher than the square root of i, without its having found any divisors,
then i must not have any divisors, so i is prime, and its value is printed. (Once we've determined that i
is prime by noticing that j > sqrt(i), there's no need to try the other trial divisors, so we use a
second break statement to break out of the loop in that case, too.)
The simple algorithm and implementation we used here (like many simple prime number algorithms)
does not work for 2, the only even prime number, so the program ``cheats'' and prints out 2 no matter
what, before going on to test the numbers from 3 to 100.
Many improvements to this simple program are of course possible; you might experiment with it. (Did
you notice that the ``test'' expression of the inner loop for(j = 2; j < i; j = j + 1) is in a
sense unnecessary, because the loop always terminates early due to one of the two break statements?)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx3f.html (2 of 2) [22/07/2003 5:30:34 PM]

Chapter 4: More about Declarations (and Initialization)

Chapter 4: More about Declarations (and


Initialization)
4.1 Arrays
4.2 Visibility and Lifetime (Global Variables, etc.)
4.3 Default Initialization
4.4 Examples

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4.html [22/07/2003 5:30:37 PM]

4.1 Arrays

4.1 Arrays
So far, we've been declaring simple variables: the declaration
int i;
declares a single variable, named i, of type int. It is also possible to declare an array of several
elements. The declaration
int a[10];
declares an array, named a, consisting of ten elements, each of type int. Simply speaking, an array is a
variable that can hold more than one value. You specify which of the several values you're referring to at
any given time by using a numeric subscript. (Arrays in programming are similar to vectors or matrices
in mathematics.) We can represent the array a above with a picture like this:

In C, arrays are zero-based: the ten elements of a 10-element array are numbered from 0 to 9. The
subscript which specifies a single element of an array is simply an integer expression in square brackets.
The first element of the array is a[0], the second element is a[1], etc. You can use these ``array
subscript expressions'' anywhere you can use the name of a simple variable, for example:
a[0] = 10;
a[1] = 20;
a[2] = a[0] + a[1];
Notice that the subscripted array references (i.e. expressions such as a[0] and a[1]) can appear on
either side of the assignment operator.
The subscript does not have to be a constant like 0 or 1; it can be any integral expression. For example,
it's common to loop over all elements of an array:
int i;
for(i = 0; i < 10; i = i + 1)
a[i] = 0;
This loop sets all ten elements of the array a to 0.
Arrays are a real convenience for many problems, but there is not a lot that C will do with them for you
http://www.eskimo.com/~scs/cclass/notes/sx4a.html (1 of 4) [22/07/2003 5:30:50 PM]

4.1 Arrays

automatically. In particular, you can neither set all elements of an array at once nor assign one array to
another; both of the assignments
a = 0;

/* WRONG */

int b[10];
b = a;

/* WRONG */

and

are illegal.
To set all of the elements of an array to some value, you must do so one by one, as in the loop example
above. To copy the contents of one array to another, you must again do so one by one:
int b[10];
for(i = 0; i < 10; i = i + 1)
b[i] = a[i];
Remember that for an array declared
int a[10];
there is no element a[10]; the topmost element is a[9]. This is one reason that zero-based loops are
also common in C. Note that the for loop
for(i = 0; i < 10; i = i + 1)
...
does just what you want in this case: it starts at 0, the number 10 suggests (correctly) that it goes through
10 iterations, but the less-than comparison means that the last trip through the loop has i set to 9. (The
comparison i <= 9 would also work, but it would be less clear and therefore poorer style.)
In the little examples so far, we've always looped over all 10 elements of the sample array a. It's
common, however, to use an array that's bigger than necessarily needed, and to use a second variable to
keep track of how many elements of the array are currently in use. For example, we might have an
integer variable
int na;

/* number of elements of a[] in use */

http://www.eskimo.com/~scs/cclass/notes/sx4a.html (2 of 4) [22/07/2003 5:30:50 PM]

4.1 Arrays

Then, when we wanted to do something with a (such as print it out), the loop would run from 0 to na,
not 10 (or whatever a's size was):
for(i = 0; i < na; i = i + 1)
printf("%d\n", a[i]);
Naturally, we would have to ensure ensure that na's value was always less than or equal to the number of
elements actually declared in a.
Arrays are not limited to type int; you can have arrays of char or double or any other type.
Here is a slightly larger example of the use of arrays. Suppose we want to investigate the behavior of
rolling a pair of dice. The total roll can be anywhere from 2 to 12, and we want to count how often each
roll comes up. We will use an array to keep track of the counts: a[2] will count how many times we've
rolled 2, etc.
We'll simulate the roll of a die by calling C's random number generation function, rand(). Each time
you call rand(), it returns a different, pseudo-random integer. The values that rand() returns
typically span a large range, so we'll use C's modulus (or ``remainder'') operator % to produce random
numbers in the range we want. The expression rand() % 6 produces random numbers in the range 0
to 5, and rand() % 6 + 1 produces random numbers in the range 1 to 6.
Here is the program:
#include <stdio.h>
#include <stdlib.h>
main()
{
int i;
int d1, d2;
int a[13];

/* uses [2..12] */

for(i = 2; i <= 12; i = i + 1)


a[i] = 0;
for(i = 0; i
{
d1 =
d2 =
a[d1
}

< 100; i = i + 1)
rand() % 6 + 1;
rand() % 6 + 1;
+ d2] = a[d1 + d2] + 1;

http://www.eskimo.com/~scs/cclass/notes/sx4a.html (3 of 4) [22/07/2003 5:30:50 PM]

4.1 Arrays

for(i = 2; i <= 12; i = i + 1)


printf("%d: %d\n", i, a[i]);
return 0;
}
We include the header <stdlib.h> because it contains the necessary declarations for the rand()
function. We declare the array of size 13 so that its highest element will be a[12]. (We're wasting
a[0] and a[1]; this is no great loss.) The variables d1 and d2 contain the rolls of the two individual
dice; we add them together to decide which cell of the array to increment, in the line
a[d1 + d2] = a[d1 + d2] + 1;
After 100 rolls, we print the array out. Typically (as craps players well know), we'll see mostly 7's, and
relatively few 2's and 12's.
(By the way, it turns out that using the % operator to reduce the range of the rand function is not always
a good idea. We'll say more about this problem in an exercise.)
4.1.1 Array Initialization
4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4a.html (4 of 4) [22/07/2003 5:30:50 PM]

4.1.1 Array Initialization

4.1.1 Array Initialization


Although it is not possible to assign to all elements of an array at once using an assignment expression, it
is possible to initialize some or all elements of an array when the array is defined. The syntax looks like
this:
int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
The list of values, enclosed in braces {}, separated by commas, provides the initial values for successive
elements of the array.
(Under older, pre-ANSI C compilers, you could not always supply initializers for ``local'' arrays inside
functions; you could only initialize ``global'' arrays, those outside of any function. Those compilers are
now rare, so you shouldn't have to worry about this distinction any more. We'll talk more about local and
global variables later in this chapter.)
If there are fewer initializers than elements in the array, the remaining elements are automatically
initialized to 0. For example,
int a[10] = {0, 1, 2, 3, 4, 5, 6};
would initialize a[7], a[8], and a[9] to 0. When an array definition includes an initializer, the array
dimension may be omitted, and the compiler will infer the dimension from the number of initializers. For
example,
int b[] = {10, 11, 12, 13, 14};
would declare, define, and initialize an array b of 5 elements (i.e. just as if you'd typed int b[5]).
Only the dimension is omitted; the brackets [] remain to indicate that b is in fact an array.
In the case of arrays of char, the initializer may be a string constant:
char s1[7] = "Hello,";
char s2[10] = "there,";
char s3[] = "world!";
As before, if the dimension is omitted, it is inferred from the size of the string initializer. (We haven't
covered strings in detail yet--we'll do so in chapter 8--but it turns out that all strings in C are terminated
by a special character with the value 0. Therefore, the array s3 will be of size 7, and the explicitly-sized
s1 does need to be of size at least 7. For s2, the last 4 characters in the array will all end up being this
zero-value character.)
http://www.eskimo.com/~scs/cclass/notes/sx4aa.html (1 of 2) [22/07/2003 5:30:52 PM]

4.1.1 Array Initialization

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4aa.html (2 of 2) [22/07/2003 5:30:52 PM]

4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)

4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)


[This section is optional and may be skipped.]
When we said that ``Arrays are not limited to type int; you can have arrays of... any other type,'' we
meant that more literally than you might have guessed. If you have an ``array of int,'' it means that you
have an array each of whose elements is of type int. But you can have an array each of whose elements
is of type x, where x is any type you choose. In particular, you can have an array each of whose elements
is another array! We can use these arrays of arrays for the same sorts of tasks as we'd use
multidimensional arrays in other computer languages (or matrices in mathematics). Naturally, we are not
limited to arrays of arrays, either; we could have an array of arrays of arrays, which would act like a 3dimensional array, etc.
The declaration of an array of arrays looks like this:
int a2[5][7];
You have to read complicated declarations like these ``inside out.'' What this one says is that a2 is an
array of 5 somethings, and that each of the somethings is an array of 7 ints. More briefly, ``a2 is an
array of 5 arrays of 7 ints,'' or, ``a2 is an array of array of int.'' In the declaration of a2, the brackets
closest to the identifier a2 tell you what a2 first and foremost is. That's how you know it's an array of 5
arrays of size 7, not the other way around. You can think of a2 as having 5 ``rows'' and 7 ``columns,''
although this interpretation is not mandatory. (You could also treat the ``first'' or inner subscript as ``x''
and the second as ``y.'' Unless you're doing something fancy, all you have to worry about is that the
subscripts when you access the array match those that you used when you declared it, as in the examples
below.)
To illustrate the use of multidimensional arrays, we might fill in the elements of the above array a2 using
this piece of code:
int i, j;
for(i = 0; i < 5; i = i + 1)
{
for(j = 0; j < 7; j = j + 1)
a2[i][j] = 10 * i + j;
}
This pair of nested loops sets a[1][2] to 12, a[4][1] to 41, etc. Since the first dimension of a2 is 5,
the first subscripting index variable, i, runs from 0 to 4. Similarly, the second subscript varies from 0 to
6.
We could print a2 out (in a two-dimensional way, suggesting its structure) with a similar pair of nested
http://www.eskimo.com/~scs/cclass/notes/sx4ba.html (1 of 3) [22/07/2003 5:30:55 PM]

4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)

loops:
for(i = 0; i < 5; i = i + 1)
{
for(j = 0; j < 7; j = j + 1)
printf("%d\t", a2[i][j]);
printf("\n");
}
(The character \t in the printf string is the tab character.)
Just to see more clearly what's going on, we could make the ``row'' and ``column'' subscripts explicit by
printing them, too:
for(j = 0; j < 7; j = j + 1)
printf("\t%d:", j);
printf("\n");
for(i = 0; i < 5; i = i + 1)
{
printf("%d:", i);
for(j = 0; j < 7; j = j + 1)
printf("\t%d", a2[i][j]);
printf("\n");
}
This last fragment would print

0:
1:
2:
3:
4:

0:
0
10
20
30
40

1:
1
11
21
31
41

2:
2
12
22
32
42

3:
3
13
23
33
43

4:
4
14
24
34
44

5:
5
15
25
35
45

6:
6
16
26
36
46

Finally, there's no reason we have to loop over the ``rows'' first and the ``columns'' second; depending on
what we wanted to do, we could interchange the two loops, like this:
for(j = 0; j < 7; j = j + 1)
{
for(i = 0; i < 5; i = i + 1)
printf("%d\t", a2[i][j]);
http://www.eskimo.com/~scs/cclass/notes/sx4ba.html (2 of 3) [22/07/2003 5:30:55 PM]

4.1.2 Arrays of Arrays (``Multidimensional'' Arrays)

printf("\n");
}
Notice that i is still the first subscript and it still runs from 0 to 4, and j is still the second subscript and
it still runs from 0 to 6.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4ba.html (3 of 3) [22/07/2003 5:30:55 PM]

4.2 Visibility and Lifetime (Global Variables, etc.)

4.2 Visibility and Lifetime (Global Variables, etc.)


We haven't said so explicitly, but variables are channels of communication within a program. You set a
variable to a value at one point in a program, and at another point (or points) you read the value out
again. The two points may be in adjoining statements, or they may be in widely separated parts of the
program.
How long does a variable last? How widely separated can the setting and fetching parts of the program
be, and how long after a variable is set does it persist? Depending on the variable and how you're using it,
you might want different answers to these questions.
The visibility of a variable determines how much of the rest of the program can access that variable. You
can arrange that a variable is visible only within one part of one function, or in one function, or in one
source file, or anywhere in the program. (We haven't really talked about source files yet; we'll be
exploring them soon.)
Why would you want to limit the visibility of a variable? For maximum flexibility, wouldn't it be handy
if all variables were potentially visible everywhere? As it happens, that arrangement would be too
flexible: everywhere in the program, you would have to keep track of the names of all the variables
declared anywhere else in the program, so that you didn't accidentally re-use one. Whenever a variable
had the wrong value by mistake, you'd have to search the entire program for the bug, because any
statement in the entire program could potentially have modified that variable. You would constantly be
stepping all over yourself by using a common variable name like i in two parts of your program, and
having one snippet of code accidentally overwrite the values being used by another part of the code. The
communication would be sort of like an old party line--you'd always be accidentally interrupting other
conversations, or having your conversations interrupted.
To avoid this confusion, we generally give variables the narrowest or smallest visibility they need. A
variable declared within the braces {} of a function is visible only within that function; variables
declared within functions are called local variables. If another function somewhere else declares a local
variable with the same name, it's a different variable entirely, and the two don't clash with each other.
On the other hand, a variable declared outside of any function is a global variable, and it is potentially
visible anywhere within the program. You use global variables when you do want the communications
path to be able to travel to any part of the program. When you declare a global variable, you will usually
give it a longer, more descriptive name (not something generic like i) so that whenever you use it you
will remember that it's the same variable everywhere.
Another word for the visibility of variables is scope.
How long do variables last? By default, local variables (those declared within a function) have automatic

http://www.eskimo.com/~scs/cclass/notes/sx4b.html (1 of 3) [22/07/2003 5:30:57 PM]

4.2 Visibility and Lifetime (Global Variables, etc.)

duration: they spring into existence when the function is called, and they (and their values) disappear
when the function returns. Global variables, on the other hand, have static duration: they last, and the
values stored in them persist, for as long as the program does. (Of course, the values can in general still
be overwritten, so they don't necessarily persist forever.)
Finally, it is possible to split a function up into several source files, for easier maintenance. When several
source files are combined into one program (we'll be seeing how in the next chapter) the compiler must
have a way of correlating the global variables which might be used to communicate between the several
source files. Furthermore, if a global variable is going to be useful for communication, there must be
exactly one of it: you wouldn't want one function in one source file to store a value in one global variable
named globalvar, and then have another function in another source file read from a different global
variable named globalvar. Therefore, a global variable should have exactly one defining instance, in
one place in one source file. If the same variable is to be used anywhere else (i.e. in some other source
file or files), the variable is declared in those other file(s) with an external declaration, which is not a
defining instance. The external declaration says, ``hey, compiler, here's the name and type of a global
variable I'm going to use, but don't define it here, don't allocate space for it; it's one that's defined
somewhere else, and I'm just referring to it here.'' If you accidentally have two distinct defining instances
for a variable of the same name, the compiler (or the linker) will complain that it is ``multiply defined.''
It is also possible to have a variable which is global in the sense that it is declared outside of any
function, but private to the one source file it's defined in. Such a variable is visible to the functions in that
source file but not to any functions in any other source files, even if they try to issue a matching
declaration.
You get any extra control you might need over visibility and lifetime, and you distinguish between
defining instances and external declarations, by using storage classes. A storage class is an extra
keyword at the beginning of a declaration which modifies the declaration in some way. Generally, the
storage class (if any) is the first word in the declaration, preceding the type name. (Strictly speaking, this
ordering has not traditionally been necessary, and you may see some code with the storage class, type
name, and other parts of a declaration in an unusual order.)
We said that, by default, local variables had automatic duration. To give them static duration (so that,
instead of coming and going as the function is called, they persist for as long as the function does), you
precede their declaration with the static keyword:
static int i;
By default, a declaration of a global variable (especially if it specifies an initial value) is the defining
instance. To make it an external declaration, of a variable which is defined somewhere else, you precede
it with the keyword extern:
extern int j;
http://www.eskimo.com/~scs/cclass/notes/sx4b.html (2 of 3) [22/07/2003 5:30:57 PM]

4.2 Visibility and Lifetime (Global Variables, etc.)

Finally, to arrange that a global variable is visible only within its containing source file, you precede it
with the static keyword:
static int k;
Notice that the static keyword can do two different things: it adjusts the duration of a local variable
from automatic to static, or it adjusts the visibility of a global variable from truly global to private-to-thefile.
To summarize, we've talked about two different attributes of a variable: visibility and duration. These are
orthogonal, as shown in this table:

visibility

duration
automatic

local

normal local
variables

global

N/A

static
static local
variables

normal global
variables

We can also distinguish between file-scope global variables and truly global variables, based on the
presence or absence of the static keyword.
We can also distinguish between external declarations and defining instances of global variables, based
on the presence or absence of the extern keyword.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4b.html (3 of 3) [22/07/2003 5:30:57 PM]

4.3 Default Initialization

4.3 Default Initialization


The duration of a variable (whether static or automatic) also affects its default initialization.
If you do not explicitly initialize them, automatic-duration variables (that is, local, non-static ones)
are not guaranteed to have any particular initial value; they will typically contain garbage. It is therefore
a fairly serious error to attempt to use the value of an automatic variable which has never been initialized
or assigned to: the program will either work incorrectly, or the garbage value may just happen to be
``correct'' such that the program appears to work correctly! However, the particular value that the
garbage takes on can vary depending literally on anything: other parts of the program, which compiler
was used, which hardware or operating system the program is running on, the time of day, the phase of
the moon. (Okay, maybe the phase of the moon is a bit of an exaggeration.) So you hardly want to say
that a program which uses an uninitialized variable ``works''; it may seem to work, but it works for the
wrong reason, and it may stop working tomorrow.
Static-duration variables (global and static local), on the other hand, are guaranteed to be initialized
to 0 if you do not use an explicit initializer in the definition.
(Once upon a time, there was another distinction between the initialization of automatic vs. static
variables: you could initialize aggregate objects, such as arrays, only if they had static duration. If your
compiler complains when you try to initialize a local array, it's probably an old, pre-ANSI compiler.
Modern, ANSI-compatible compilers remove this limitation, so it's no longer much of a concern.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4c.html [22/07/2003 5:31:01 PM]

4.4 Examples

4.4 Examples
Here is an example demonstrating almost everything we've seen so far:
int globalvar = 1;
extern int anotherglobalvar;
static int privatevar;
f()
{
int localvar;
int localvar2 = 2;
static int persistentvar;
}
Here we have six variables, three declared outside and three declared inside of the function f().
globalvar is a global variable. The declaration we see is its defining instance (it happens also to
include an initial value). globalvar can be used anywhere in this source file, and it could be used in
other source files, too (as long as corresponding external declarations are issued in those other source
files).
anotherglobalvar is a second global variable. It is not defined here; the defining instance for it
(and its initialization) is somewhere else.
privatevar is a ``private'' global variable. It can be used anywhere within this source file, but
functions in other source files cannot access it, even if they try to issue external declarations for it. (If
other source files try to declare a global variable called ``privatevar'', they'll get their own; they
won't be sharing this one.) Since it has static duration and receives no explicit initialization,
privatevar will be initialized to 0.
localvar is a local variable within the function f(). It can be accessed only within the function f().
(If any other part of the program declares a variable named ``localvar'', that variable will be distinct
from the one we're looking at here.) localvar is conceptually ``created'' each time f() is called, and
disappears when f() returns. Any value which was stored in localvar last time f() was running
will be lost and will not be available next time f() is called. Furthermore, since it has no explicit
initializer, the value of localvar will in general be garbage each time f() is called.
localvar2 is also local, and everything that we said about localvar applies to it, except that since
its declaration includes an explicit initializer, it will be initialized to 2 each time f() is called.

http://www.eskimo.com/~scs/cclass/notes/sx4d.html (1 of 2) [22/07/2003 5:31:03 PM]

4.4 Examples

Finally, persistentvar is again local to f(), but it does maintain its value between calls to f(). It
has static duration but no explicit initializer, so its initial value will be 0.
The defining instances and external declarations we've been looking at so far have all been of simple
variables. There are also defining instances and external declarations of functions, which we'll be looking
at in the next chapter.
(Also, don't worry about static variables for now if they don't make sense to you; they're a relatively
sophisticated concept, which you won't need to use at first.)
The term declaration is a general one which encompasses defining instances and external declarations;
defining instances and external declarations are two different kinds of declarations. Furthermore, either
kind of declaration suffices to inform the compiler of the name and type of a particular variable (or
function). If you have the defining instance of a global variable in a source file, the rest of that source file
can use that variable without having to issue any external declarations. It's only in source files where the
defining instance hasn't been seen that you need external declarations.
You will sometimes hear a defining instance referred to simply as a ``definition,'' and you will sometimes
hear an external declaration referred to simply as a ``declaration.'' These usages are mildly ambiguous, in
that you can't tell out of context whether a ``declaration'' is a generic declaration (that might be a defining
instance or an external declaration) or whether it's an external declaration that specifically is not a
defining instance. (Similarly, there are other constructions that can be called ``definitions'' in C, namely
the definitions of preprocessor macros, structures, and typedefs, none of which we've met.) In these
notes, we'll try to make things clear by using the unambiguous terms defining instance and external
declaration. Elsewhere, you may have to look at the context to determine how the terms ``definition'' and
``declaration'' are being used.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx4d.html (2 of 2) [22/07/2003 5:31:03 PM]

Chapter 5: Functions and Program Structure

Chapter 5: Functions and Program


Structure
[This chapter corresponds to K&R chapter 4.]
A function is a ``black box'' that we've locked part of our program into. The idea behind a function is that
it compartmentalizes part of the program, and in particular, that the code within the function has some
useful properties:
1. It performs some well-defined task, which will be useful to other parts of the program.
2. It might be useful to other programs as well; that is, we might be able to reuse it (and without
having to rewrite it).
3. The rest of the program doesn't have to know the details of how the function is implemented. This
can make the rest of the program easier to think about.
4. The function performs its task well. It may be written to do a little more than is required by the
first program that calls it, with the anticipation that the calling program (or some other program)
may later need the extra functionality or improved performance. (It's important that a finished
function do its job well, otherwise there might be a reluctance to call it, and it therefore might not
achieve the goal of reusability.)
5. By placing the code to perform the useful task into a function, and simply calling the function in
the other parts of the program where the task must be performed, the rest of the program becomes
clearer: rather than having some large, complicated, difficult-to-understand piece of code repeated
wherever the task is being performed, we have a single simple function call, and the name of the
function reminds us which task is being performed.
6. Since the rest of the program doesn't have to know the details of how the function is implemented,
the rest of the program doesn't care if the function is reimplemented later, in some different way
(as long as it continues to perform its same task, of course!). This means that one part of the
program can be rewritten, to improve performance or add a new feature (or simply to fix a bug),
without having to rewrite the rest of the program.
Functions are probably the most important weapon in our battle against software complexity. You'll want
to learn when it's appropriate to break processing out into functions (and also when it's not), and how to
set up function interfaces to best achieve the qualities mentioned above: reuseability, information hiding,
clarity, and maintainability.
5.1 Function Basics
5.2 Function Prototypes
5.3 Function Philosophy
http://www.eskimo.com/~scs/cclass/notes/sx5.html (1 of 2) [22/07/2003 5:31:05 PM]

Chapter 5: Functions and Program Structure

5.4 Separate Compilation--Logistics

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx5.html (2 of 2) [22/07/2003 5:31:05 PM]

5.1 Function Basics

5.1 Function Basics


So what defines a function? It has a name that you call it by, and a list of zero or more arguments or
parameters that you hand to it for it to act on or to direct its work; it has a body containing the actual
instructions (statements) for carrying out the task the function is supposed to perform; and it may give
you back a return value, of a particular type.
Here is a very simple function, which accepts one argument, multiplies it by 2, and hands that value
back:
int multbytwo(int x)
{
int retval;
retval = x * 2;
return retval;
}
On the first line we see the return type of the function (int), the name of the function (multbytwo),
and a list of the function's arguments, enclosed in parentheses. Each argument has both a name and a
type; multbytwo accepts one argument, of type int, named x. The name x is arbitrary, and is used
only within the definition of multbytwo. The caller of this function only needs to know that a single
argument of type int is expected; the caller does not need to know what name the function will use
internally to refer to that argument. (In particular, the caller does not have to pass the value of a variable
named x.)
Next we see, surrounded by the familiar braces, the body of the function itself. This function consists of
one declaration (of a local variable retval) and two statements. The first statement is a conventional
expression statement, which computes and assigns a value to retval, and the second statement is a
return statement, which causes the function to return to its caller, and also specifies the value which
the function returns to its caller.
The return statement can return the value of any expression, so we don't really need the local retval
variable; the function could be collapsed to
int multbytwo(int x)
{
return x * 2;
}
How do we call a function? We've been doing so informally since day one, but now we have a chance to
call one that we've written, in full detail. Here is a tiny skeletal program to call multby2:

http://www.eskimo.com/~scs/cclass/notes/sx5a.html (1 of 4) [22/07/2003 5:31:07 PM]

5.1 Function Basics

#include <stdio.h>
extern int multbytwo(int);
int main()
{
int i, j;
i = 3;
j = multbytwo(i);
printf("%d\n", j);
return 0;
}
This looks much like our other test programs, with the exception of the new line
extern int multbytwo(int);
This is an external function prototype declaration. It is an external declaration, in that it declares
something which is defined somewhere else. (We've already seen the defining instance of the function
multbytwo, but maybe the compiler hasn't seen it yet.) The function prototype declaration contains the
three pieces of information about the function that a caller needs to know: the function's name, return
type, and argument type(s). Since we don't care what name the multbytwo function will use to refer to
its first argument, we don't need to mention it. (On the other hand, if a function takes several arguments,
giving them names in the prototype may make it easier to remember which is which, so names may
optionally be used in function prototype declarations.) Finally, to remind us that this is an external
declaration and not a defining instance, the prototype is preceded by the keyword extern.
The presence of the function prototype declaration lets the compiler know that we intend to call this
function, multbytwo. The information in the prototype lets the compiler generate the correct code for
calling the function, and also enables the compiler to check up on our code (by making sure, for example,
that we pass the correct number of arguments to each function we call).
Down in the body of main, the action of the function call should be obvious: the line
j = multbytwo(i);
calls multbytwo, passing it the value of i as its argument. When multbytwo returns, the return value
is assigned to the variable j. (Notice that the value of main's local variable i will become the value of
multbytwo's parameter x; this is absolutely not a problem, and is a normal sort of affair.)
This example is written out in ``longhand,'' to make each step equivalent. The variable i isn't really
needed, since we could just as well call
http://www.eskimo.com/~scs/cclass/notes/sx5a.html (2 of 4) [22/07/2003 5:31:07 PM]

5.1 Function Basics

j = multbytwo(3);
And the variable j isn't really needed, either, since we could just as well call
printf("%d\n", multbytwo(3));
Here, the call to multbytwo is a subexpression which serves as the second argument to printf. The
value returned by multbytwo is passed immediately to printf. (Here, as in general, we see the
flexibility and generality of expressions in C. An argument passed to a function may be an arbitrarily
complex subexpression, and a function call is itself an expression which may be embedded as a
subexpression within arbitrarily complicated surrounding expressions.)
We should say a little more about the mechanism by which an argument is passed down from a caller
into a function. Formally, C is call by value, which means that a function receives copies of the values of
its arguments. We can illustrate this with an example. Suppose, in our implementation of multbytwo,
we had gotten rid of the unnecessary retval variable like this:
int multbytwo(int x)
{
x = x * 2;
return x;
}
We might wonder, if we wrote it this way, what would happen to the value of the variable i when we
called
j = multbytwo(i);
When our implementation of multbytwo changes the value of x, does that change the value of i up in
the caller? The answer is no. x receives a copy of i's value, so when we change x we don't change i.
However, there is an exception to this rule. When the argument you pass to a function is not a single
variable, but is rather an array, the function does not receive a copy of the array, and it therefore can
modify the array in the caller. The reason is that it might be too expensive to copy the entire array, and
furthermore, it can be useful for the function to write into the caller's array, as a way of handing back
more data than would fit in the function's single return value. We'll see an example of an array argument
(which the function deliberately writes into) in the next chapter.

Read sequentially: prev next up top


http://www.eskimo.com/~scs/cclass/notes/sx5a.html (3 of 4) [22/07/2003 5:31:07 PM]

5.1 Function Basics

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx5a.html (4 of 4) [22/07/2003 5:31:07 PM]

5.2 Function Prototypes

5.2 Function Prototypes


In modern C programming, it is considered good practice to use prototype declarations for all functions
that you call. As we mentioned, these prototypes help to ensure that the compiler can generate correct
code for calling the functions, as well as allowing the compiler to catch certain mistakes you might make.
Strictly speaking, however, prototypes are optional. If you call a function for which the compiler has not
seen a prototype, the compiler will do the best it can, assuming that you're calling the function correctly.
If prototypes are a good idea, and if we're going to get in the habit of writing function prototype
declarations for functions we call that we've written (such as multbytwo), what happens for library
functions such as printf? Where are their prototypes? The answer is in that boilerplate line
#include <stdio.h>
we've been including at the top of all of our programs. stdio.h is conceptually a file full of external
declarations and other information pertaining to the ``Standard I/O'' library functions, including
printf. The #include directive (which we'll meet formally in a later chapter) arranges that all of the
declarations within stdio.h are considered by the compiler, rather as if we'd typed them all in
ourselves. Somewhere within these declarations is an external function prototype declaration for
printf, which satisfies the rule that there should be a prototype for each function we call. (For other
standard library functions we call, there will be other ``header files'' to include.) Finally, one more thing
about external function prototype declarations. We've said that the distinction between external
declarations and defining instances of normal variables hinges on the presence or absence of the keyword
extern. The situation is a little bit different for functions. The ``defining instance'' of a function is the
function, including its body (that is, the brace-enclosed list of declarations and statements implementing
the function). An external declaration of a function, even without the keyword extern, looks nothing
like a function declaration. Therefore, the keyword extern is optional in function prototype
declarations. If you wish, you can write
int multbytwo(int);
and this is just as good an external function prototype declaration as
extern int multbytwo(int);
(In the first form, without the extern, as soon as the compiler sees the semicolon, it knows it's not
going to see a function body, so the declaration can't be a definition.) You may want to stay in the habit
of using extern in all external declarations, including function declarations, since ``extern = external
declaration'' is an easier rule to remember.

http://www.eskimo.com/~scs/cclass/notes/sx5b.html (1 of 2) [22/07/2003 5:31:09 PM]

5.2 Function Prototypes

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx5b.html (2 of 2) [22/07/2003 5:31:09 PM]

5.3 Function Philosophy

5.3 Function Philosophy


What makes a good function? The most important aspect of a good ``building block'' is that have a
single, well-defined task to perform. When you find that a program is hard to manage, it's often because
it has not been designed and broken up into functions cleanly. Two obvious reasons for moving code
down into a function are because:
1. It appeared in the main program several times, such that by making it a function, it can be written just
once, and the several places where it used to appear can be replaced with calls to the new function.
2. The main program was getting too big, so it could be made (presumably) smaller and more
manageable by lopping part of it off and making it a function.
These two reasons are important, and they represent significant benefits of well-chosen functions, but
they are not sufficient to automatically identify a good function. As we've been suggesting, a good
function has at least these two additional attributes:
3. It does just one well-defined task, and does it well.
4. Its interface to the rest of the program is clean and narrow.
Attribute 3 is just a restatement of two things we said above. Attribute 4 says that you shouldn't have to
keep track of too many things when calling a function. If you know what a function is supposed to do,
and if its task is simple and well-defined, there should be just a few pieces of information you have to
give it to act upon, and one or just a few pieces of information which it returns to you when it's done. If
you find yourself having to pass lots and lots of information to a function, or remember details of its
internal implementation to make sure that it will work properly this time, it's often a sign that the
function is not sufficiently well-defined. (A poorly-defined function may be an arbitrary chunk of code
that was ripped out of a main program that was getting too big, such that it essentially has to have access
to all of that main function's local variables.)
The whole point of breaking a program up into functions is so that you don't have to think about the
entire program at once; ideally, you can think about just one function at a time. We say that a good
function is a ``black box,'' which is supposed to suggest that the ``container'' it's in is opaque--callers
can't see inside it (and the function inside can't see out). When you call a function, you only have to
know what it does, not how it does it. When you're writing a function, you only have to know what it's
supposed to do, and you don't have to know why or under what circumstances its caller will be calling it.
(When designing a function, we should perhaps think about the callers just enough to ensure that the
function we're designing will be easy to call, and that we aren't accidentally setting things up so that
callers will have to think about any internal details.)

http://www.eskimo.com/~scs/cclass/notes/sx5c.html (1 of 2) [22/07/2003 5:31:16 PM]

5.3 Function Philosophy

Some functions may be hard to write (if they have a hard job to do, or if it's hard to make them do it truly
well), but that difficulty should be compartmentalized along with the function itself. Once you've written
a ``hard'' function, you should be able to sit back and relax and watch it do that hard work on call from
the rest of your program. It should be pleasant to notice (in the ideal case) how much easier the rest of the
program is to write, now that the hard work can be deferred to this workhorse function.
(In fact, if a difficult-to-write function's interface is well-defined, you may be able to get away with
writing a quick-and-dirty version of the function first, so that you can begin testing the rest of the
program, and then go back later and rewrite the function to do the hard parts. As long as the function's
original interface anticipated the hard parts, you won't have to rewrite the rest of the program when you
fix the function.)
What I've been trying to say in the preceding few paragraphs is that functions are important for far more
important reasons than just saving typing. Sometimes, we'll write a function which we only call once,
just because breaking it out into a function makes things clearer and easier.
If you find that difficulties pervade a program, that the hard parts can't be buried inside black-box
functions and then forgotten about; if you find that there are hard parts which involve complicated
interactions among multiple functions, then the program probably needs redesigning.
For the purposes of explanation, we've been seeming to talk so far only about ``main programs'' and the
functions they call and the rationale behind moving some piece of code down out of a ``main program''
into a function. But in reality, there's obviously no need to restrict ourselves to a two-tier scheme. Any
function we find ourself writing will often be appropriately written in terms of sub-functions, sub-subfunctions, etc. (Furthermore, the ``main program,'' main(), is itself just a function.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx5c.html (2 of 2) [22/07/2003 5:31:16 PM]

5.4 Separate Compilation--Logistics

5.4 Separate Compilation--Logistics


When a program consists of many functions, it can be convenient to split them up into several source
files. Among other things, this means that when a change is made, only the source file containing the
change has to be recompiled, not the whole program.
The job of putting the pieces of a program together and producing the final executable falls to a tool
called the linker. (We may or may not need to invoke the linker explicitly; a compiler often invokes it
automatically, as needed.) The linker looks through all of the pieces making up the program, sorting out
the external declarations and defining instances. The compiler has noted the definitions made by each
source file, as well as the declarations of things used by each source file but (presumably) defined
elsewhere. For each thing (global variable or function) used but not defined by one piece of the program,
the linker looks for another piece which does define that thing.
The logistics of writing a program in several source files, and then compiling and linking all of the
source files together, depend on the programming environment you're using. We'll cover two
possibilities, depending on whether you're using a traditional command-line compiler or a newer
integrated development environment (IDE) or other graphical user interface (GUI) compiler.
When using a command-line compiler, there are usually two main steps involved in building an
executable program from one or more source files. First, each source file is compiled, resulting in an
object file containing the machine instructions (generated by the compiler) corresponding to just the code
in that source file. Second, the various object files are linked together, with each other and with libraries
containing code for functions which you did not write (such as printf), to produce a final, executable
program.
Under Unix, the cc command can perform one or both steps. So far, we've been using extremely simple
invocations of cc such as
cc -o hello hello.c
This invocation compiles a single source file, hello.c, links it, and places the executable in a file
named hello.
Suppose we have a program which we're trying to build from three separate source files, x.c, y.c, and
z.c. We could compile all three of them, and link them together, all at once, with the command
cc -o myprog x.c y.c z.c
Alternatively, we could compile them separately: the -c option to cc tells it to compile only, but not to
link. Instead of building an executable, it merely creates an object file, with a name ending in .o, for
http://www.eskimo.com/~scs/cclass/notes/sx5d.html (1 of 3) [22/07/2003 5:31:18 PM]

5.4 Separate Compilation--Logistics

each source file compiled. So the three commands


cc -c x.c
cc -c y.c
cc -c y.c
would compile x.c, y.c, and z.c and create object files x.o, y.o, and z.o. Then, the three object
files could be linked together using
cc -o myprog x.o y.o z.o
When the cc command is given an .o file, it knows that it does not have to compile it (it's an object file,
already compiled); it just sends it through to the link process.
Above we mentioned that the second, linking step also involves pulling in library functions. Normally,
the functions from the Standard C library are linked in automatically. Occasionally, you must request a
library manually; one common situation under Unix is that the math functions tend to be in a separate
math library, which is requested by using -lm on the command line. Since the libraries must typically be
searched after your program's own object files are linked (so that the linker knows which library
functions your program uses), any -l option must appear after the names of your files on the command
line. For example, to link the object file mymath.o (previously compiled with cc -c mymath.c)
together with the math library, you might use
cc -o mymathprog mymath.o -lm
(The l in the -l option is the lower case ell, for library; it is not the digit 1.)
Everything we've said about cc also applies to most other Unix C compilers. (Many of you will be using
gcc, the FSF's GNU C Compiler.)
There are command-line compilers for MS-DOS systems which work similarly. For example, the
Microsoft C compiler comes with a CL (``compile and link'') command, which works almost the same as
Unix cc. You can compile and link in one step:
cl hello.c
or you can compile only:
cl /c hello.c
creating an object file named hello.obj which you can link later.

http://www.eskimo.com/~scs/cclass/notes/sx5d.html (2 of 3) [22/07/2003 5:31:18 PM]

5.4 Separate Compilation--Logistics

The preceding has all been about command-line compilers. If you're using some kind of integrated
development environment, such as Borland's Turbo C or the Microsoft Programmer's Workbench or
Visual C or Think C or Codewarrior, most of the mechanical details are taken care of for you. (There's
also less I can say here about these environments, because they're all different.) Typically you define a
``project,'' and there's a way to specify the list of files (modules) which make up your project. The
modules might be source files which you typed in or obtained elsewhere, or they might be source files
which you created within the environment (perhaps by requesting a ``New source file,'' and typing it in).
Typically, the programming environment has a single ``build'' button which does whatever's required to
build (and perhaps even execute) your program. There may also be configuration windows in which you
can specify compiler options (such as whether you'd like it to accept C or C++). ``See your manual for
details.''

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx5d.html (3 of 3) [22/07/2003 5:31:18 PM]

Chapter 6: Basic I/O

Chapter 6: Basic I/O


So far, we've been using printf to do output, and we haven't had a way of doing any input. In this
chapter, we'll learn a bit more about printf, and we'll begin learning about character-based input and
output.
6.1 printf
6.2 Character Input and Output
6.3 Reading Lines
6.4 Reading Numbers

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx6.html [22/07/2003 5:31:20 PM]

6.1 <TT>printf</TT>

6.1 printf
printf's name comes from print formatted. It generates output under the control of a format string (its first
argument) which consists of literal characters to be printed and also special character sequences--format specifiers-which request that other arguments be fetched, formatted, and inserted into the string. Our very first program was
nothing more than a call to printf, printing a constant string:
printf("Hello, world!\n");
Our second program also featured a call to printf:
printf("i is %d\n", i);
In that case, whenever printf ``printed'' the string "i is %d", it did not print it verbatim; it replaced the two
characters %d with the value of the variable i.
There are quite a number of format specifiers for printf. Here are the basic ones :

%d
%ld
%c
%s
%f
%e
%g
%o
%x
%%

print an int argument in decimal


print a long int argument in decimal
print a character
print a string
print a float or double argument
same as %f, but use exponential notation
use %e or %f, whichever is better
print an int argument in octal (base 8)
print an int argument in hexadecimal (base 16)
print a single %

It is also possible to specify the width and precision of numbers and strings as they are inserted (somewhat like
FORTRAN format statements); we'll present those details in a later chapter. (Very briefly, for those who are
curious: a notation like %3d means to print an int in a field at least 3 spaces wide; a notation like %5.2f means
to print a float or double in a field at least 5 spaces wide, with two places to the right of the decimal.)
To illustrate with a few more examples: the call
printf("%c %d %f %e %s %d%%\n", '1', 2, 3.14, 56000000., "eight", 9);
would print
1 2 3.140000 5.600000e+07 eight 9%

http://www.eskimo.com/~scs/cclass/notes/sx6a.html (1 of 3) [22/07/2003 5:31:22 PM]

6.1 <TT>printf</TT>

The call
printf("%d %o %x\n", 100, 100, 100);
would print
100 144 64
Successive calls to printf just build up the output a piece at a time, so the calls
printf("Hello, ");
printf("world!\n");
would also print Hello, world! (on one line of output).
Earlier we learned that C represents characters internally as small integers corresponding to the characters' values
in the machine's character set (typically ASCII). This means that there isn't really much difference between a
character and an integer in C; most of the difference is in whether we choose to interpret an integer as an integer or
a character. printf is one place where we get to make that choice: %d prints an integer value as a string of digits
representing its decimal value, while %c prints the character corresponding to a character set value. So the lines
char c = 'A';
int i = 97;
printf("c = %c, i = %d\n", c, i);
would print c as the character A and i as the number 97. But if, on the other hand, we called
printf("c = %d, i = %c\n", c, i);
we'd see the decimal value (printed by %d) of the character 'A', followed by the character (whatever it is) which
happens to have the decimal value 97.
You have to be careful when calling printf. It has no way of knowing how many arguments you've passed it or
what their types are other than by looking for the format specifiers in the format string. If there are more format
specifiers (that is, more % signs) than there are arguments, or if the arguments have the wrong types for the format
specifiers, printf can misbehave badly, often printing nonsense numbers or (even worse) numbers which
mislead you into thinking that some other part of your program is broken.
Because of some automatic conversion rules which we haven't covered yet, you have a small amount of latitude in
the types of the expressions you pass as arguments to printf. The argument for %c may be of type char or
int, and the argument for %d may be of type char or int. The string argument for %s may be a string constant,
an array of characters, or a pointer to some characters (though we haven't really covered strings or pointers yet).
Finally, the arguments corresponding to %e, %f, and %g may be of types float or double. But other
combinations do not work reliably: %d will not print a long int or a float or a double; %ld will not print
an int; %e, %f, and %g will not print an int.

http://www.eskimo.com/~scs/cclass/notes/sx6a.html (2 of 3) [22/07/2003 5:31:22 PM]

6.1 <TT>printf</TT>

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx6a.html (3 of 3) [22/07/2003 5:31:22 PM]

6.2 Character Input and Output

6.2 Character Input and Output


[This section corresponds to K&R Sec. 1.5]
Unless a program can read some input, it's hard to keep it from doing exactly the same thing every time
it's run, and thus being rather boring after a while.
The most basic way of reading input is by calling the function getchar. getchar reads one character
from the ``standard input,'' which is usually the user's keyboard, but which can sometimes be redirected
by the operating system. getchar returns (rather obviously) the character it reads, or, if there are no
more characters available, the special value EOF (``end of file'').
A companion function is putchar, which writes one character to the ``standard output.'' (The standard
output is, again not surprisingly, usually the user's screen, although it, too, can be redirected. printf,
like putchar, prints to the standard output; in fact, you can imagine that printf calls putchar to
actually print each of the characters it formats.)
Using these two functions, we can write a very basic program to copy the input, a character at a time, to
the output:
#include <stdio.h>
/* copy input to output */
main()
{
int c;
c = getchar();
while(c != EOF)
{
putchar(c);
c = getchar();
}
return 0;
}
This code is straightforward, and I encourage you to type it in and try it out. It reads one character, and if
it is not the EOF code, enters a while loop, printing one character and reading another, as long as the
character read is not EOF. This is a straightforward loop, although there's one mystery surrounding the
declaration of the variable c: if it holds characters, why is it an int?
http://www.eskimo.com/~scs/cclass/notes/sx6b.html (1 of 5) [22/07/2003 5:31:29 PM]

6.2 Character Input and Output

We said that a char variable could hold integers corresponding to character set values, and that an int
could hold integers of more arbitrary values (up to +-32767). Since most character sets contain a few
hundred characters (nowhere near 32767), an int variable can in general comfortably hold all char
values, and then some. Therefore, there's nothing wrong with declaring c as an int. But in fact, it's
important to do so, because getchar can return every character value, plus that special, non-character
value EOF, indicating that there are no more characters. Type char is only guaranteed to be able to hold
all the character values; it is not guaranteed to be able to hold this ``no more characters'' value without
possibly mixing it up with some actual character value. (It's like trying to cram five pounds of books into
a four-pound box, or 13 eggs into a carton that holds a dozen.) Therefore, you should always remember to
use an int for anything you assign getchar's return value to.
When you run the character copying program, and it begins copying its input (your typing) to its output
(your screen), you may find yourself wondering how to stop it. It stops when it receives end-of-file
(EOF), but how do you send EOF? The answer depends on what kind of computer you're using. On Unix
and Unix-related systems, it's almost always control-D. On MS-DOS machines, it's control-Z followed by
the RETURN key. Under Think C on the Macintosh, it's control-D, just like Unix. On other systems, you
may have to do some research to learn how to send EOF.
(Note, too, that the character you type to generate an end-of-file condition from the keyboard is not the
same as the special EOF value returned by getchar. The EOF value returned by getchar is a code
indicating that the input system has detected an end-of-file condition, whether it's reading the keyboard or
a file or a magnetic tape or a network connection or anything else. In a disk file, at least, there is not likely
to be any character in the file corresponding to EOF; as far as your program is concerned, EOF indicates
the absence of any more characters to read.)
Another excellent thing to know when doing any kind of programming is how to terminate a runaway
program. If a program is running forever waiting for input, you can usually stop it by sending it an end-offile, as above, but if it's running forever not waiting for something, you'll have to take more drastic
measures. Under Unix, control-C (or, occasionally, the DELETE key) will terminate the current program,
almost no matter what. Under MS-DOS, control-C or control-BREAK will sometimes terminate the
current program, but by default MS-DOS only checks for control-C when it's looking for input, so an
infinite loop can be unkillable. There's a DOS command,
break on
which tells DOS to look for control-C more often, and I recommend using this command if you're doing
any programming. (If a program is in a really tight infinite loop under MS-DOS, there can be no way of
killing it short of rebooting.) On the Mac, try command-period or command-option-ESCAPE.
Finally, don't be disappointed (as I was) the first time you run the character copying program. You'll type
a character, and see it on the screen right away, and assume it's your program working, but it's only your

http://www.eskimo.com/~scs/cclass/notes/sx6b.html (2 of 5) [22/07/2003 5:31:29 PM]

6.2 Character Input and Output

computer echoing every key you type, as it always does. When you hit RETURN, a full line of characters
is made available to your program. It then zips several times through its loop, reading and printing all the
characters in the line in quick succession. In other words, when you run this program, it will probably
seem to copy the input a line at a time, rather than a character at a time. You may wonder how a program
could instead read a character right away, without waiting for the user to hit RETURN. That's an excellent
question, but unfortunately the answer is rather complicated, and beyond the scope of our discussion here.
(Among other things, how to read a character right away is one of the things that's not defined by the C
language, and it's not defined by any of the standard library functions, either. How to do it depends on
which operating system you're using.)
Stylistically, the character-copying program above can be said to have one minor flaw: it contains two
calls to getchar, one which reads the first character and one which reads (by virtue of the fact that it's
in the body of the loop) all the other characters. This seems inelegant and perhaps unnecessary, and it can
also be risky: if there were more things going on within the loop, and if we ever changed the way we read
characters, it would be easy to change one of the getchar calls but forget to change the other one. Is
there a way to rewrite the loop so that there is only one call to getchar, responsible for reading all the
characters? Is there a way to read a character, test it for EOF, and assign it to the variable c, all at the
same time?
There is. It relies on the fact that the assignment operator, =, is just another operator in C. An assignment
is not (necessarily) a standalone statement; it is an expression, and it has a value (the value that's assigned
to the variable on the left-hand side), and it can therefore participate in a larger, surrounding expression.
Therefore, most C programmers would write the character-copying loop like this:
while((c = getchar()) != EOF)
putchar(c);
What does this mean? The function getchar is called, as before, and its return value is assigned to the
variable c. Then the value is immediately compared against the value EOF. Finally, the true/false value of
the comparison controls the while loop: as long as the value is not EOF, the loop continues executing,
but as soon as an EOF is received, no more trips through the loop are taken, and it exits. The net result is
that the call to getchar happens inside the test at the top of the while loop, and doesn't have to be
repeated before the loop and within the loop (more on this in a bit).
Stated another way, the syntax of a while loop is always
while( expression ) ...
A comparison (using the != operator) is of course an expression; the syntax is
expression != expression

http://www.eskimo.com/~scs/cclass/notes/sx6b.html (3 of 5) [22/07/2003 5:31:29 PM]

6.2 Character Input and Output

And an assignment is an expression; the syntax is


expression = expression
What we're seeing is just another example of the fact that expressions can be combined with essentially
limitless generality and therefore infinite variety. The left-hand side of the != operator (its first
expression) is the (sub)expression c = getchar(), and the combined expression is the expression
needed by the while loop.
The extra parentheses around
(c = getchar())
are important, and are there because because the precedence of the != operator is higher than that of the =
operator. If we (incorrectly) wrote
while(c = getchar() != EOF)

/* WRONG */

the compiler would interpret it as


while(c = (getchar() != EOF))
That is, it would assign the result of the != operator to the variable c, which is not what we want.
(``Precedence'' refers to the rules for which operators are applied to their operands in which order, that is,
to the rules controlling the default grouping of expressions and subexpressions. For example, the
multiplication operator * has higher precedence than the addition operator +, which means that the
expression a + b * c is parsed as a + (b * c). We'll have more to say about precedence later.)
The line
while((c = getchar()) != EOF)
epitomizes the cryptic brevity which C is notorious for. You may find this terseness infuriating (and
you're not alone!), and it can certainly be carried too far, but bear with me for a moment while I defend it.
The simple example we've been discussing illustrates the tradeoffs well. We have four things to do:
1. call getchar,
2. assign its return value to a variable,
3. test the return value against EOF, and

http://www.eskimo.com/~scs/cclass/notes/sx6b.html (4 of 5) [22/07/2003 5:31:29 PM]

6.2 Character Input and Output

4. process the character (in this case, print it out again).


We can't eliminate any of these steps. We have to assign getchar's value to a variable (we can't just use
it directly) because we have to do two different things with it (test, and print). Therefore, compressing the
assignment and test into the same line is the only good way of avoiding two distinct calls to getchar.
You may not agree that the compressed idiom is better for being more compact or easier to read, but the
fact that there is now only one call to getchar is a real virtue.
Don't think that you'll have to write compressed lines like
while((c = getchar()) != EOF)
right away, or in order to be an ``expert C programmer.'' But, for better or worse, most experienced C
programmers do like to use these idioms (whether they're justified or not), so you'll need to be able to at
least recognize and understand them when you're reading other peoples' code.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx6b.html (5 of 5) [22/07/2003 5:31:29 PM]

6.3 Reading Lines

6.3 Reading Lines


It's often convenient for a program to process its input not a character at a time but rather a line at a time,
that is, to read an entire line of input and then act on it all at once. The standard C library has a couple of
functions for reading lines, but they have a few awkward features, so we're going to learn more about
character input (and about writing functions in general) by writing our own function to read one line. Here
it is:
#include <stdio.h>
/*
/*
/*
/*
int
{
int
int
max

Read one line from standard input, */


copying it to line array (but no more than max chars). */
Does not place terminating \n in line array. */
Returns line length, or 0 for empty line, or EOF for end-of-file. */
getline(char line[], int max)
nch = 0;
c;
= max - 1;

/* leave room for '\0' */

while((c = getchar()) != EOF)


{
if(c == '\n')
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
As the comment indicates, this function will read one line of input from the standard input, placing it into
the line array. The size of the line array is given by the max argument; the function will never write
more than max characters into line.

http://www.eskimo.com/~scs/cclass/notes/sx6c.html (1 of 3) [22/07/2003 5:31:31 PM]

6.3 Reading Lines

The main body of the function is a getchar loop, much as we used in the character-copying program. In
the body of this loop, however, we're storing the characters in an array (rather than immediately printing
them out). Also, we're only reading one line of characters, then stopping and returning.
There are several new things to notice here.
First of all, the getline function accepts an array as a parameter. As we've said, array parameters are an
exception to the rule that functions receive copies of their arguments--in the case of arrays, the function
does have access to the actual array passed by the caller, and can modify it. Since the function is
accessing the caller's array, not creating a new one to hold a copy, the function does not have to declare
the argument array's size; it's set by the caller. (Thus, the brackets in ``char line[]'' are empty.)
However, so that we won't overflow the caller's array by reading too long a line into it, we allow the caller
to pass along the size of the array, which we promise not to exceed.
Second, we see an example of the break statement. The top of the loop looks like our earlier charactercopying loop--it stops when it reaches EOF--but we only want this loop to read one line, so we also stop
(that is, break out of the loop) when we see the \n character signifying end-of-line. An equivalent loop,
without the break statement, would be
while((c = getchar()) != EOF && c != '\n')
{
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
We haven't learned about the internal representation of strings yet, but it turns out that strings in C are
simply arrays of characters, which is why we are reading the line into an array of characters. The end of a
string is marked by the special character, '\0'. To make sure that there's always room for that character,
on our way in we subtract 1 from max, the argument that tells us how many characters we may place in
the line array. When we're done reading the line, we store the end-of-string character '\0' at the end
of the string we've just built in the line array.
Finally, there's one subtlety in the code which isn't too important for our purposes now but which you
may wonder about: it's arranged to handle the possibility that a few characters (i.e. the apparent beginning
of a line) are read, followed immediately by an EOF, without the usual \n end-of-line character. (That's
why we return EOF only if we received EOF and we hadn't read any characters first.)
In any case, the function returns the length (number of characters) of the line it read, not including the \n.
(Therefore, it returns 0 for an empty line.) Like getchar, it returns EOF when there are no more lines to
http://www.eskimo.com/~scs/cclass/notes/sx6c.html (2 of 3) [22/07/2003 5:31:31 PM]

6.3 Reading Lines

read. (It happens that EOF is a negative number, so it will never match the length of a line that getline
has read.)
Here is an example of a test program which calls getline, reading the input a line at a time and then
printing each line back out:
#include <stdio.h>
extern int getline(char [], int);
main()
{
char line[256];
while(getline(line, 256) != EOF)
printf("you typed \"%s\"\n", line);
return 0;
}
The notation char [] in the function prototype for getline says that getline accepts as its first
argument an array of char. When the program calls getline, it is careful to pass along the actual size
of the array. (You might notice a potential problem: since the number 256 appears in two places, if we
ever decide that 256 is too small, and that we want to be able to read longer lines, we could easily change
one of the instances of 256, and forget to change the other one. Later we'll learn ways of solving--that is,
avoiding--this sort of problem.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx6c.html (3 of 3) [22/07/2003 5:31:31 PM]

6.4 Reading Numbers

6.4 Reading Numbers


The getline function of the previous section reads one line from the user, as a string. What if we want
to read a number? One straightforward way is to read a string as before, and then immediately convert
the string to a number. The standard C library contains a number of functions for doing this. The simplest
to use are atoi(), which converts a string to an integer, and atof(), which converts a string to a
floating-point number. (Both of these functions are declared in the header <stdlib.h>, so you should
#include that header at the top of any file using these functions.) You could read an integer from the
user like this:
#include <stdlib.h>
char line[256];
int n;
printf("Type an integer:\n");
getline(line, 256);
n = atoi(line);
Now the variable n contains the number typed by the user. (This assumes that the user did type a valid
number, and that getline did not return EOF.)
Reading a floating-point number is similar:
#include <stdlib.h>
char line[256];
double x;
printf("Type a floating-point number:\n");
getline(line, 256);
x = atof(line);
(atof is actually declared as returning type double, but you could also use it with a variable of type
float, because in general, C automatically converts between float and double as needed.)
Another way of reading in numbers, which you're likely to see in other books on C, involves the scanf
function, but it has several problems, so we won't discuss it for now. (Superficially, scanf seems simple
enough, which is why it's often used, especially in textbooks. The trouble is that to perform input reliably
using scanf is not nearly as easy as it looks, especially when you're not sure what the user is going to
type.)

http://www.eskimo.com/~scs/cclass/notes/sx6d.html (1 of 2) [22/07/2003 5:31:34 PM]

6.4 Reading Numbers

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx6d.html (2 of 2) [22/07/2003 5:31:34 PM]

Chapter 7: More Operators

Chapter 7: More Operators


In this chapter we'll meet some (though still not all) of C's more advanced arithmetic operators. The ones
we'll meet here have to do with making common patterns of operations easier.
It's extremely common in programming to have to increment a variable by 1, that is, to add 1 to it. (For
example, if you're processing each element of an array, you'll typically write a loop with an index or
pointer variable stepping through the elements of the array, and you'll increment the variable each time
through the loop.) The classic way to increment a variable is with an assignment like
i = i + 1
Such an assignment is perfectly common and acceptable, but it has a few slight problems:
1. As we've mentioned, it looks a little odd, especially from an algebraic perspective.
2. If the object being incremented is not a simple variable, the idiom can become cumbersome to
type, and correspondingly more error-prone. For example, the expression
a[i+j+2*k] = a[i+j+2*k] + 1
is a bit of a mess, and you may have to look closely to see that the similar-looking expression
a[i+j+2*k] = a[i+j+2+k] + 1
probably has a mistake in it.
3. Since incrementing things is so common, it might be nice to have an easier way of doing it.
In fact, C provides not one but two other, simpler ways of incrementing variables and performing other
similar operations.
7.1 Assignment Operators
7.2 Increment and Decrement Operators
7.3 Order of Evaluation

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/notes/sx7.html (1 of 2) [22/07/2003 5:31:36 PM]

Chapter 7: More Operators

This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx7.html (2 of 2) [22/07/2003 5:31:36 PM]

7.1 Assignment Operators

7.1 Assignment Operators


[This section corresponds to K&R Sec. 2.10]
The first and more general way is that any time you have the pattern
v = v op e
where v is any variable (or anything like a[i]), op is any of the binary arithmetic operators we've seen
so far, and e is any expression, you can replace it with the simplified
v op= e
For example, you can replace the expressions
i = i + 1
j = j - 10
k = k * (n + 1)
a[i] = a[i] / b
with
i +=
j -=
k *=
a[i]

1
10
n + 1
/= b

In an example in a previous chapter, we used the assignment


a[d1 + d2] = a[d1 + d2] + 1;
to count the rolls of a pair of dice. Using +=, we could simplify this expression to
a[d1 + d2] += 1;
As these examples show, you can use the ``op='' form with any of the arithmetic operators (and with
several other operators that we haven't seen yet). The expression, e, does not have to be the constant 1; it
can be any expression. You don't always need as many explicit parentheses when using the op=
operators: the expression

http://www.eskimo.com/~scs/cclass/notes/sx7a.html (1 of 2) [22/07/2003 5:31:38 PM]

7.1 Assignment Operators

k *= n + 1
is interpreted as
k = k * (n + 1)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx7a.html (2 of 2) [22/07/2003 5:31:38 PM]

7.2 Increment and Decrement Operators

7.2 Increment and Decrement Operators


[This section corresponds to K&R Sec. 2.8]
The assignment operators of the previous section let us replace v = v op e with v op= e, so that we didn't
have to mention v twice. In the most common cases, namely when we're adding or subtracting the
constant 1 (that is, when op is + or - and e is 1), C provides another set of shortcuts: the autoincrement
and autodecrement operators. In their simplest forms, they look like this:
++i
--j

add 1 to i
subtract 1 from j

These correspond to the slightly longer i += 1 and j -= 1, respectively, and also to the fully
``longhand'' forms i = i + 1 and j = j - 1.
The ++ and -- operators apply to one operand (they're unary operators). The expression ++i adds 1 to
i, and stores the incremented result back in i. This means that these operators don't just compute new
values; they also modify the value of some variable. (They share this property--modifying some variable-with the assignment operators; we can say that these operators all have side effects. That is, they have
some effect, on the side, other than just computing a new value.)
The incremented (or decremented) result is also made available to the rest of the expression, so an
expression like
k = 2 * ++i
means ``add one to i, store the result back in i, multiply it by 2, and store that result in k.'' (This is a
pretty meaningless expression; our actual uses of ++ later will make more sense.)
Both the ++ and -- operators have an unusual property: they can be used in two ways, depending on
whether they are written to the left or the right of the variable they're operating on. In either case, they
increment or decrement the variable they're operating on; the difference concerns whether it's the old or
the new value that's ``returned'' to the surrounding expression. The prefix form ++i increments i and
returns the incremented value. The postfix form i++ increments i, but returns the prior, non-incremented
value. Rewriting our previous example slightly, the expression
k = 2 * i++
means ``take i's old value and multiply it by 2, increment i, store the result of the multiplication in k.''
The distinction between the prefix and postfix forms of ++ and -- will probably seem strained at first,
http://www.eskimo.com/~scs/cclass/notes/sx7b.html (1 of 4) [22/07/2003 5:31:41 PM]

7.2 Increment and Decrement Operators

but it will make more sense once we begin using these operators in more realistic situations.
For example, our getline function of the previous chapter used the statements
line[nch] = c;
nch = nch + 1;
as the body of its inner loop. Using the ++ operator, we could simplify this to
line[nch++] = c;
We wanted to increment nch after deciding which element of the line array to store into, so the postfix
form nch++ is appropriate.
Notice that it only makes sense to apply the ++ and -- operators to variables (or to other ``containers,''
such as a[i]). It would be meaningless to say something like
1++
or
(2+3)++
The ++ operator doesn't just mean ``add one''; it means ``add one to a variable'' or ``make a variable's
value one more than it was before.'' But (1+2) is not a variable, it's an expression; so there's no place for
++ to store the incremented result.
Another unfortunate example is
i = i++;
which some confused programmers sometimes write, presumably because they want to be extra sure that
i is incremented by 1. But i++ all by itself is sufficient to increment i by 1; the extra (explicit)
assignment to i is unnecessary and in fact counterproductive, meaningless, and incorrect. If you want to
increment i (that is, add one to it, and store the result back in i), either use
i = i + 1;
or
i += 1;
or
++i;

http://www.eskimo.com/~scs/cclass/notes/sx7b.html (2 of 4) [22/07/2003 5:31:41 PM]

7.2 Increment and Decrement Operators

or
i++;
Don't try to use some bizarre combination.
Did it matter whether we used ++i or i++ in this last example? Remember, the difference between the
two forms is what value (either the old or the new) is passed on to the surrounding expression. If there is
no surrounding expression, if the ++i or i++ appears all by itself, to increment i and do nothing else,
you can use either form; it makes no difference. (Two ways that an expression can appear ``all by itself,''
with ``no surrounding expression,'' are when it is an expression statement terminated by a semicolon, as
above, or when it is one of the controlling expressions of a for loop.) For example, both the loops
for(i = 0; i < 10; ++i)
printf("%d\n", i);
and
for(i = 0; i < 10; i++)
printf("%d\n", i);
will behave exactly the same way and produce exactly the same results. (In real code, postfix increment is
probably more common, though prefix definitely has its uses, too.)
In the preceding section, we simplified the expression
a[d1 + d2] = a[d1 + d2] + 1;
from a previous chapter down to
a[d1 + d2] += 1;
Using ++, we could simplify it still further to
a[d1 + d2]++;
or
++a[d1 + d2];
(Again, in this case, both are equivalent.)

http://www.eskimo.com/~scs/cclass/notes/sx7b.html (3 of 4) [22/07/2003 5:31:41 PM]

7.2 Increment and Decrement Operators

We'll see more examples of these operators in the next section and in the next chapter.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx7b.html (4 of 4) [22/07/2003 5:31:41 PM]

7.3 Order of Evaluation

7.3 Order of Evaluation


[This section corresponds to K&R Sec. 2.12]
When you start using the ++ and -- operators in larger expressions, you end up with expressions which
do several things at once, i.e., they modify several different variables at more or less the same time.
When you write such an expression, you must be careful not to have the expression ``pull the rug out
from under itself'' by assigning two different values to the same variable, or by assigning a new value to a
variable at the same time that another part of the expression is trying to use the value of that variable.
Actually, we had already started writing expressions which did several things at once even before we met
the ++ and -- operators. The expression
(c = getchar()) != EOF
assigns getchar's return value to c, and compares it to EOF. The ++ and -- operators make it much
easier to cram a lot into a small expression: the example
line[nch++] = c;
from the previous section assigned c to line[nch], and incremented nch. We'll eventually meet
expressions which do three things at once, such as
a[i++] = b[j++];
which assigns b[j] to a[i], and increments i, and increments j.
If you're not careful, though, it's easy for this sort of thing to get out of hand. Can you figure out exactly
what the expression
a[i++] = b[i++];

/* WRONG */

should do? I can't, and here's the important part: neither can the compiler. We know that the definition of
postfix ++ is that the former value, before the increment, is what goes on to participate in the rest of the
expression, but the expression a[i++] = b[i++] contains two ++ operators. Which of them happens
first? Does this expression assign the old ith element of b to the new ith element of a, or vice versa? No
one knows.
When the order of evaluation matters but is not well-defined (that is, when we can't say for sure which
order the compiler will evaluate the various dependent parts in) we say that the meaning of the
expression is undefined, and if we're smart we won't write the expression in the first place. (Why would
http://www.eskimo.com/~scs/cclass/notes/sx7c.html (1 of 4) [22/07/2003 5:31:43 PM]

7.3 Order of Evaluation

anyone ever write an ``undefined'' expression? Because sometimes, the compiler happens to evaluate it in
the order a programmer wanted, and the programmer assumes that since it works, it must be okay.)
For example, suppose we carelessly wrote this loop:
int i, a[10];
i = 0;
while(i < 10)
a[i] = i++;

/* WRONG */

It looks like we're trying to set a[0] to 0, a[1] to 1, etc. But what if the increment i++ happens before
the compiler decides which cell of the array a to store the (unincremented) result in? We might end up
setting a[1] to 0, a[2] to 1, etc., instead. Since, in this case, we can't be sure which order things would
happen in, we simply shouldn't write code like this. In this case, what we're doing matches the pattern of
a for loop, anyway, which would be a better choice:
for(i = 0; i < 10; i++)
a[i] = i;
Now that the increment i++ isn't crammed into the same expression that's setting a[i], the code is
perfectly well-defined, and is guaranteed to do what we want.
In general, you should be wary of ever trying to second-guess the order an expression will be evaluated
in, with two exceptions:
1. You can obviously assume that precedence will dictate the order in which binary operators are
applied. This typically says more than just what order things happens in, but also what the
expression actually means. (In other words, the precedence of * over + says more than that the
multiplication ``happens first'' in 1 + 2 * 3; it says that the answer is 7, not 9.)
2. Although we haven't mentioned it yet, it is guaranteed that the logical operators && and || are
evaluated left-to-right, and that the right-hand side is not evaluated at all if the left-hand side
determines the outcome.
To look at one more example, it might seem that the code
int i = 7;
printf("%d\n", i++ * i++);
would have to print 56, because no matter which order the increments happen in, 7*8 is 8*7 is 56. But
++ just says that the increment happens later, not that it happens immediately, so this code could print 49
(if the compiler chose to perform the multiplication first, and both increments later). And, it turns out that
ambiguous expressions like this are such a bad idea that the ANSI C Standard does not require compilers
http://www.eskimo.com/~scs/cclass/notes/sx7c.html (2 of 4) [22/07/2003 5:31:43 PM]

7.3 Order of Evaluation

to do anything reasonable with them at all. Theoretically, the above code could end up printing 42, or
8923409342, or 0, or crashing your computer.
Programmers sometimes mistakenly imagine that they can write an expression which tries to do too
much at once and then predict exactly how it will behave based on ``order of evaluation.'' For example,
we know that multiplication has higher precedence than addition, which means that in the expression
i + j * k
j will be multiplied by k, and then i will be added to the result. Informally, we often say that the
multiplication happens ``before'' the addition. That's true in this case, but it doesn't say as much as we
might think about a more complicated expression, such as
i++ + j++ * k++
In this case, besides the addition and multiplication, i, j, and k are all being incremented. We can not
say which of them will be incremented first; it's the compiler's choice. (In particular, it is not necessarily
the case that j++ or k++ will happen first; the compiler might choose to save i's value somewhere and
increment i first, even though it will have to keep the old value around until after it has done the
multiplication.)
In the preceding example, it probably doesn't matter which variable is incremented first. It's not too hard,
though, to write an expression where it does matter. In fact, we've seen one already: the ambiguous
assignment a[i++] = b[i++]. We still don't know which i++ happens first. (We can not assume,
based on the right-to-left behavior of the = operator, that the right-hand i++ will happen first.) But if we
had to know what a[i++] = b[i++] really did, we'd have to know which i++ happened first.
Finally, note that parentheses don't dictate overall evaluation order any more than precedence does.
Parentheses override precedence and say which operands go with which operators, and they therefore
affect the overall meaning of an expression, but they don't say anything about the order of subexpressions
or side effects. We could not ``fix'' the evaluation order of any of the expressions we've been discussing
by adding parentheses. If we wrote
i++ + (j++ * k++)
we still wouldn't know which of the increments would happen first. (The parentheses would force the
multiplication to happen before the addition, but precedence already would have forced that, anyway.) If
we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication or in any well-defined
http://www.eskimo.com/~scs/cclass/notes/sx7c.html (3 of 4) [22/07/2003 5:31:43 PM]

7.3 Order of Evaluation

order; this parenthesized version would be just as undefined as i++ * i++ was.
There's a line from Kernighan & Ritchie, which I am fond of quoting when discussing these issues [Sec.
2.12, p. 54]:
The moral is that writing code that depends on order of evaluation is a bad programming
practice in any language. Naturally, it is necessary to know what things to avoid, but if you
don't know how they are done on various machines, you won't be tempted to take
advantage of a particular implementation.
The first edition of K&R said
...if you don't know how they are done on various machines, that innocence may help to
protect you.
I actually prefer the first edition wording. Many textbooks encourage you to write small programs to find
out how your compiler implements some of these ambiguous expressions, but it's just one step from
writing a small program to find out, to writing a real program which makes use of what you've just
learned. But you don't want to write programs that work only under one particular compiler, that take
advantage of the way that one compiler (but perhaps no other) happens to implement the undefined
expressions. It's fine to be curious about what goes on ``under the hood,'' and many of you will be curious
enough about what's going on with these ``forbidden'' expressions that you'll want to investigate them,
but please keep very firmly in mind that, for real programs, the very easiest way of dealing with
ambiguous, undefined expressions (which one compiler interprets one way and another interprets another
way and a third crashes on) is not to write them in the first place.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx7c.html (4 of 4) [22/07/2003 5:31:43 PM]

Chapter 8: Strings

Chapter 8: Strings
Strings in C are represented by arrays of characters. The end of the string is marked with a special
character, the null character, which is simply the character with the value 0. (The null character has no
relation except in name to the null pointer. In the ASCII character set, the null character is named NUL.)
The null or string-terminating character is represented by another character escape sequence, \0. (We've
seen it once already, in the getline function of chapter 6.)
Because C has no built-in facilities for manipulating entire arrays (copying them, comparing them, etc.),
it also has very few built-in facilities for manipulating strings.
In fact, C's only truly built-in string-handling is that it allows us to use string constants (also called string
literals) in our code. Whenever we write a string, enclosed in double quotes, C automatically creates an
array of characters for us, containing that string, terminated by the \0 character. For example, we can
declare and define an array of characters, and initialize it with a string constant:
char string[] = "Hello, world!";
In this case, we can leave out the dimension of the array, since the compiler can compute it for us based
on the size of the initializer (14, including the terminating \0). This is the only case where the compiler
sizes a string array for us, however; in other cases, it will be necessary that we decide how big the arrays
and other data structures we use to hold strings are.
To do anything else with strings, we must typically call functions. The C library contains a few basic
string manipulation functions, and to learn more about strings, we'll be looking at how these functions
might be implemented.
Since C never lets us assign entire arrays, we use the strcpy function to copy one string to another:
#include <string.h>
char string1[] = "Hello, world!";
char string2[20];
strcpy(string2, string1);
The destination string is strcpy's first argument, so that a call to strcpy mimics an assignment
expression (with the destination on the left-hand side). Notice that we had to allocate string2 big
enough to hold the string that would be copied to it. Also, at the top of any source file where we're using
the standard library's string-handling functions (such as strcpy) we must include the line

http://www.eskimo.com/~scs/cclass/notes/sx8.html (1 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

#include <string.h>
which contains external declarations for these functions.
Since C won't let us compare entire arrays, either, we must call a function to do that, too. The standard
library's strcmp function compares two strings, and returns 0 if they are identical, or a negative number
if the first string is alphabetically ``less than'' the second string, or a positive number if the first string is
``greater.'' (Roughly speaking, what it means for one string to be ``less than'' another is that it would
come first in a dictionary or telephone book, although there are a few anomalies.) Here is an example:
char string3[] = "this is";
char string4[] = "a test";
if(strcmp(string3, string4) == 0)
printf("strings are equal\n");
else
printf("strings are different\n");
This code fragment will print ``strings are different''. Notice that strcmp does not return a Boolean,
true/false, zero/nonzero answer, so it's not a good idea to write something like
if(strcmp(string3, string4))
...
because it will behave backwards from what you might reasonably expect. (Nevertheless, if you start
reading other people's code, you're likely to come across conditionals like if(strcmp(a, b)) or
even if(!strcmp(a, b)). The first does something if the strings are unequal; the second does
something if they're equal. You can read these more easily if you pretend for a moment that strcmp's
name were strdiff, instead.)
Another standard library function is strcat, which concatenates strings. It does not concatenate two
strings together and give you a third, new string; what it really does is append one string onto the end of
another. (If it gave you a new string, it would have to allocate memory for it somewhere, and the
standard library string functions generally never do that for you automatically.) Here's an example:
char string5[20] = "Hello, ";
char string6[] = "world!";
printf("%s\n", string5);
strcat(string5, string6);
printf("%s\n", string5);

http://www.eskimo.com/~scs/cclass/notes/sx8.html (2 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

The first call to printf prints ``Hello, '', and the second one prints ``Hello, world!'', indicating that the
contents of string6 have been tacked on to the end of string5. Notice that we declared string5
with extra space, to make room for the appended characters.
If you have a string and you want to know its length (perhaps so that you can check whether it will fit in
some other array you've allocated for it), you can call strlen, which returns the length of the string (i.e.
the number of characters in it), not including the \0:
char string7[] = "abc";
int len = strlen(string7);
printf("%d\n", len);
Finally, you can print strings out with printf using the %s format specifier, as we've been doing in
these examples already (e.g. printf("%s\n", string5);).
Since a string is just an array of characters, all of the string-handling functions we've just seen can be
written quite simply, using no techniques more complicated than the ones we already know. In fact, it's
quite instructive to look at how these functions might be implemented. Here is a version of strcpy:
mystrcpy(char dest[], char src[])
{
int i = 0;
while(src[i] != '\0')
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}
We've called it mystrcpy instead of strcpy so that it won't clash with the version that's already in the
standard library. Its operation is simple: it looks at characters in the src string one at a time, and as long
as they're not \0, assigns them, one by one, to the corresponding positions in the dest string. When it's
done, it terminates the dest string by appending a \0. (After exiting the while loop, i is guaranteed to
have a value one greater than the subscript of the last character in src.) For comparison, here's a way of
writing the same code, using a for loop:
for(i = 0; src[i] != '\0'; i++)
dest[i] = src[i];
dest[i] = '\0';
http://www.eskimo.com/~scs/cclass/notes/sx8.html (3 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

Yet a third possibility is to move the test for the terminating \0 character out of the for loop header and
into the body of the loop, using an explicit if and break statement, so that we can perform the test
after the assignment and therefore use the assignment inside the loop to copy the \0 to dest, too:
for(i = 0; ; i++)
{
dest[i] = src[i];
if(src[i] == '\0')
break;
}
(There are in fact many, many ways to write strcpy. Many programmers like to combine the
assignment and test, using an expression like (dest[i] = src[i]) != '\0'. This is actually the
same sort of combined operation as we used in our getchar loop in chapter 6.)
Here is a version of strcmp:
mystrcmp(char str1[], char str2[])
{
int i = 0;
while(1)
{
if(str1[i] != str2[i])
return str1[i] - str2[i];
if(str1[i] == '\0' || str2[i] == '\0')
return 0;
i++;
}
}
Characters are compared one at a time. If two characters in one position differ, the strings are different,
and we are supposed to return a value less than zero if the first string (str1) is alphabetically less than
the second string. Since characters in C are represented by their numeric character set values, and since
most reasonable character sets assign values to characters in alphabetical order, we can simply subtract
the two differing characters from each other: the expression str1[i] - str2[i] will yield a
negative result if the i'th character of str1 is less than the corresponding character in str2. (As it
turns out, this will behave a bit strangely when comparing upper- and lower-case letters, but it's the
traditional approach, which the standard versions of strcmp tend to use.) If the characters are the same,
we continue around the loop, unless the characters we just compared were (both) \0, in which case
we've reached the end of both strings, and they were both equal. Notice that we used what may at first

http://www.eskimo.com/~scs/cclass/notes/sx8.html (4 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

appear to be an infinite loop--the controlling expression is the constant 1, which is always true. What
actually happens is that the loop runs until one of the two return statements breaks out of it (and the
entire function). Note also that when one string is longer than the other, the first test will notice this
(because one string will contain a real character at the [i] location, while the other will contain \0, and
these are not equal) and the return value will be computed by subtracting the real character's value from
0, or vice versa. (Thus the shorter string will be treated as ``less than'' the longer.)
Finally, here is a version of strlen:
int mystrlen(char str[])
{
int i;
for(i = 0; str[i] != '\0'; i++)
{}
return i;
}
In this case, all we have to do is find the \0 that terminates the string, and it turns out that the three
control expressions of the for loop do all the work; there's nothing left to do in the body. Therefore, we
use an empty pair of braces {} as the loop body. Equivalently, we could use a null statement, which is
simply a semicolon:
for(i = 0; str[i] != '\0'; i++)
;
Empty loop bodies can be a bit startling at first, but they're not unheard of.
Everything we've looked at so far has come out of C's standard libraries. As one last example, let's write
a substr function, for extracting a substring out of a larger string. We might call it like this:
char string8[] = "this is a test";
char string9[10];
substr(string9, string8, 5, 4);
printf("%s\n", string9);
The idea is that we'll extract a substring of length 4, starting at character 5 (0-based) of string8, and
copy the substring to string9. Just as with strcpy, it's our responsibility to declare the destination
string (string9) big enough. Here is an implementation of substr. Not surprisingly, it's quite similar
to strcpy:

http://www.eskimo.com/~scs/cclass/notes/sx8.html (5 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

substr(char dest[], char src[], int offset, int len)


{
int i;
for(i = 0; i < len && src[offset + i] != '\0'; i++)
dest[i] = src[i + offset];
dest[i] = '\0';
}
If you compare this code to the code for mystrcpy, you'll see that the only differences are that
characters are fetched from src[offset + i] instead of src[i], and that the loop stops when len
characters have been copied (or when the src string runs out of characters, whichever comes first).
In this chapter, we've been careless about declaring the return types of the string functions, and (with the
exception of mystrlen) they haven't returned values. The real string functions do return values, but
they're of type ``pointer to character,'' which we haven't discussed yet.
When working with strings, it's important to keep firmly in mind the differences between characters and
strings. We must also occasionally remember the way characters are represented, and about the relation
between character values and integers.
As we have had several occasions to mention, a character is represented internally as a small integer,
with a value depending on the character set in use. For example, we might find that 'A' had the value
65, that 'a' had the value 97, and that '+' had the value 43. (These are, in fact, the values in the ASCII
character set, which most computers use. However, you don't need to learn these values, because the vast
majority of the time, you use character constants to refer to characters, and the compiler worries about
the values for you. Using character constants in preference to raw numeric values also makes your
programs more portable.)
As we may also have mentioned, there is a big difference between a character and a string, even a string
which contains only one character (other than the \0). For example, 'A' is not the same as "A". To
drive home this point, let's illustrate it with a few examples.
If you have a string:
char string[] = "hello, world!";
you can modify its first character by saying
string[0] = 'H';
(Of course, there's nothing magic about the first character; you can modify any character in the string in
this way. Be aware, though, that it is not always safe to modify strings in-place like this; we'll say more
http://www.eskimo.com/~scs/cclass/notes/sx8.html (6 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

about the modifiability of strings in a later chapter on pointers.) Since you're replacing a character, you
want a character constant, 'H'. It would not be right to write
string[0] = "H";

/* WRONG */

because "H" is a string (an array of characters), not a single character. (The destination of the
assignment, string[0], is a char, but the right-hand side is a string; these types don't match.)
On the other hand, when you need a string, you must use a string. To print a single newline, you could
call
printf("\n");
It would not be correct to call
printf('\n');

/* WRONG */

printf always wants a string as its first argument. (As one final example, putchar wants a single
character, so putchar('\n') would be correct, and putchar("\n") would be incorrect.)
We must also remember the difference between strings and integers. If we treat the character '1' as an
integer, perhaps by saying
int i = '1';
we will probably not get the value 1 in i; we'll get the value of the character '1' in the machine's
character set. (In ASCII, it's 49.) When we do need to find the numeric value of a digit character (or to go
the other way, to get the digit character with a particular value) we can make use of the fact that, in any
character set used by C, the values for the digit characters, whatever they are, are contiguous. In other
words, no matter what values '0' and '1' have, '1' - '0' will be 1 (and, obviously, '0' - '0'
will be 0). So, for a variable c holding some digit character, the expression
c - '0'
gives us its value. (Similarly, for an integer value i, i + '0' gives us the corresponding digit
character, as long as 0 <= i <= 9.)
Just as the character '1' is not the integer 1, the string "123" is not the integer 123. When we have a
string of digits, we can convert it to the corresponding integer by calling the standard function atoi:
char string[] = "123";
http://www.eskimo.com/~scs/cclass/notes/sx8.html (7 of 8) [22/07/2003 5:31:47 PM]

Chapter 8: Strings

int i = atoi(string);
int j = atoi("456");
Later we'll learn how to go in the other direction, to convert an integer into a string. (One way, as long as
what you want to do is print the number out, is to call printf, using %d in the format string.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx8.html (8 of 8) [22/07/2003 5:31:47 PM]

Chapter 9: The C Preprocessor

Chapter 9: The C Preprocessor


Conceptually, the ``preprocessor'' is a translation phase that is applied to your source code before the
compiler proper gets its hands on it. (Once upon a time, the preprocessor was a separate program, much
as the compiler and linker may still be separate programs today.) Generally, the preprocessor performs
textual substitutions on your source code, in three sorts of ways:

File inclusion: inserting the contents of another file into your source file, as if you had typed it all
in there.
Macro substitution: replacing instances of one piece of text with another.
Conditional compilation: Arranging that, depending on various circumstances, certain parts of
your source code are seen or not seen by the compiler at all.

The next three sections will introduce these three preprocessing functions.
The syntax of the preprocessor is different from the syntax of the rest of C in several respects. First of all,
the preprocessor is ``line based.'' Each of the preprocessor directives we're going to learn about (all of
which begin with the # character) must begin at the beginning of a line, and each ends at the end of the
line. (The rest of C treats line ends as just another whitespace character, and doesn't care how your
program text is arranged into lines.) Secondly, the preprocessor does not know about the structure of C-about functions, statements, or expressions. It is possible to play strange tricks with the preprocessor to
turn something which does not look like C into C (or vice versa). It's also possible to run into problems
when a preprocessor substitution does not do what you expected it to, because the preprocessor does not
respect the structure of C statements and expressions (but you expected it to). For the simple uses of the
preprocessor we'll be discussing, you shouldn't have any of these problems, but you'll want to be careful
before doing anything tricky or outrageous with the preprocessor. (As it happens, playing tricky and
outrageous games with the preprocessor is considered sporting in some circles, but it rapidly gets out of
hand, and can lead to bewilderingly impenetrable programs.)
9.1 File Inclusion
9.2 Macro Definition and Substitution
9.3 Conditional Compilation

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback
http://www.eskimo.com/~scs/cclass/notes/sx9.html [22/07/2003 5:31:48 PM]

9.1 File Inclusion

9.1 File Inclusion


[This section corresponds to K&R Sec. 4.11.1]
A line of the form
#include <filename.h>
or
#include "filename.h"
causes the contents of the file filename.h to be read, parsed, and compiled at that point. (After
filename.h is processed, compilation continues on the line following the #include line.) For
example, suppose you got tired of retyping external function prototypes such as
extern int getline(char [], int);
at the top of each source file. You could instead place the prototype in a header file, perhaps
getline.h, and then simply place
#include "getline.h"
at the top of each source file where you called getline. (You might not find it worthwhile to create an
entire header file for a single function, but if you had a package of several related function, it might be
very useful to place all of their declarations in one header file.) As we may have mentioned, that's exactly
what the Standard header files such as stdio.h are--collections of declarations (including external
function prototype declarations) having to do with various sets of Standard library functions. When you
use #include to read in a header file, you automatically get the prototypes and other declarations it
contains, and you should use header files, precisely so that you will get the prototypes and other
declarations they contain.
The difference between the <> and "" forms is where the preprocessor searches for filename.h. As a
general rule, it searches for files enclosed in <> in central, standard directories, and it searches for files
enclosed in "" in the ``current directory,'' or the directory containing the source file that's doing the
including. Therefore, "" is usually used for header files you've written, and <> is usually used for
headers which are provided for you (which someone else has written).
The extension ``.h'', by the way, simply stands for ``header,'' and reflects the fact that #include
directives usually sit at the top (head) of your source files, and contain global declarations and definitions
which you would otherwise put there. (That extension is not mandatory--you can theoretically name your
http://www.eskimo.com/~scs/cclass/notes/sx9a.html (1 of 3) [22/07/2003 5:31:54 PM]

9.1 File Inclusion

own header files anything you wish--but .h is traditional, and recommended.)


As we've already begun to see, the reason for putting something in a header file, and then using
#include to pull that header file into several different source files, is when the something (whatever it
is) must be declared or defined consistently in all of the source files. If, instead of using a header file, you
typed the something in to each of the source files directly, and the something ever changed, you'd have to
edit all those source files, and if you missed one, your program could fail in subtle (or serious) ways due
to the mismatched declarations (i.e. due to the incompatibility between the new declaration in one source
file and the old one in a source file you forgot to change). Placing common declarations and definitions
into header files means that if they ever change, they only have to be changed in one place, which is a
much more workable system.
What should you put in header files?

External declarations of global variables and functions. We said that a global variable must have
exactly one defining instance, but that it can have external declarations in many places. We said
that it was a grave error to issue an external declaration in one place saying that a variable or
function has one type, when the defining instance in some other place actually defines it with
another type. (If the two places are two source files, separately compiled, the compiler will
probably not even catch the discrepancy.) If you put the external declarations in a header file,
however, and include the header wherever it's needed, the declarations are virtually guaranteed to
be consistent. It's a good idea to include the header in the source file where the defining instance
appears, too, so that the compiler can check that the declaration and definition match. (That is, if
you ever change the type, you do still have to change it in two places: in the source file where the
defining instance occurs, and in the header file where the external declaration appears. But at least
you don't have to change it in an arbitrary number of places, and, if you've set things up correctly,
the compiler can catch any remaining mistakes.)
Preprocessor macro definitions (which we'll meet in the next section).
Structure definitions (which we haven't seen yet).
Typedef declarations (which we haven't seen yet).

However, there are a few things not to put in header files:

Defining instances of global variables. If you put these in a header file, and include the header file
in more than one source file, the variable will end up multiply defined.
Function bodies (which are also defining instances). You don't want to put these in headers for the
same reason--it's likely that you'll end up with multiple copies of the function and hence
``multiply defined'' errors. People sometimes put commonly-used functions in header files and
then use #include to bring them (once) into each program where they use that function, or use
#include to bring together the several source files making up a program, but both of these are
poor ideas. It's much better to learn how to use your compiler or linker to combine together
separately-compiled object files.

http://www.eskimo.com/~scs/cclass/notes/sx9a.html (2 of 3) [22/07/2003 5:31:54 PM]

9.1 File Inclusion

Since header files typically contain only external declarations, and should not contain function bodies,
you have to understand just what does and doesn't happen when you #include a header file. The
header file may provide the declarations for some functions, so that the compiler can generate correct
code when you call them (and so that it can make sure that you're calling them correctly), but the header
file does not give the compiler the functions themselves. The actual functions will be combined into your
program at the end of compilation, by the part of the compiler called the linker. The linker may have to
get the functions out of libraries, or you may have to tell the compiler/linker where to find them. In
particular, if you are trying to use a third-party library containing some useful functions, the library will
often come with a header file describing those functions. Using the library is therefore a two-step
process: you must #include the header in the files where you call the library functions, and you must
tell the linker to read in the functions from the library itself.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx9a.html (3 of 3) [22/07/2003 5:31:54 PM]

9.2 Macro Definition and Substitution

9.2 Macro Definition and Substitution


[This section corresponds to K&R Sec. 4.11.2]
A preprocessor line of the form
#define name text
defines a macro with the given name, having as its value the given replacement text. After that (for the
rest of the current source file), wherever the preprocessor sees that name, it will replace it with the
replacement text. The name follows the same rules as ordinary identifiers (it can contain only letters,
digits, and underscores, and may not begin with a digit). Since macros behave quite differently from
normal variables (or functions), it is customary to give them names which are all capital letters (or at
least which begin with a capital letter). The replacement text can be absolutely anything--it's not
restricted to numbers, or simple strings, or anything.
The most common use for macros is to propagate various constants around and to make them more selfdocumenting. We've been saying things like
char line[100];
...
getline(line, 100);
but this is neither readable nor reliable; it's not necessarily obvious what all those 100's scattered around
the program are, and if we ever decide that 100 is too small for the size of the array to hold lines, we'll
have to remember to change the number in two (or more) places. A much better solution is to use a
macro:
#define MAXLINE 100
char line[MAXLINE];
...
getline(line, MAXLINE);
Now, if we ever want to change the size, we only have to do it in one place, and it's more obvious what
the words MAXLINE sprinkled through the program mean than the magic numbers 100 did.
Since the replacement text of a preprocessor macro can be anything, it can also be an expression,
although you have to realize that, as always, the text is substituted (and perhaps evaluated) later. No
evaluation is performed when the macro is defined. For example, suppose that you write something like
#define A 2
http://www.eskimo.com/~scs/cclass/notes/sx9b.html (1 of 3) [22/07/2003 5:31:56 PM]

9.2 Macro Definition and Substitution

#define B 3
#define C A + B
(this is a pretty meaningless example, but the situation does come up in practice). Then, later, suppose
that you write
int x = C * 2;
If A, B, and C were ordinary variables, you'd expect x to end up with the value 10. But let's see what
happens.
The preprocessor always substitutes text for macros exactly as you have written it. So it first substitites
the replacement text for the macro C, resulting in
int x = A + B * 2;
Then it substitutes the macros A and B, resulting in
int x = 2 + 3 * 2;
Only when the preprocessor is done doing all this substituting does the compiler get into the act. But
when it evaluates that expression (using the normal precedence of multiplication over addition), it ends
up initializing x with the value 8!
To guard against this sort of problem, it is always a good idea to include explicit parentheses in the
definitions of macros which contain expressions. If we were to define the macro C as
#define C (A + B)
then the declaration of x would ultimately expand to
int x = (2 + 3) * 2;
and x would be initialized to 10, as we probably expected.
Notice that there does not have to be (and in fact there usually is not) a semicolon at the end of a
#define line. (This is just one of the ways that the syntax of the preprocessor is different from the rest
of C.) If you accidentally type
#define MAXLINE 100;

http://www.eskimo.com/~scs/cclass/notes/sx9b.html (2 of 3) [22/07/2003 5:31:56 PM]

/* WRONG */

9.2 Macro Definition and Substitution

then when you later declare


char line[MAXLINE];
the preprocessor will expand it to
char line[100;];

/* WRONG */

which is a syntax error. This is what we mean when we say that the preprocessor doesn't know much of
anything about the syntax of C--in this last example, the value or replacement text for the macro
MAXLINE was the 4 characters 1 0 0 ; , and that's exactly what the preprocessor substituted (even
though it didn't make any sense).
Simple macros like MAXLINE act sort of like little variables, whose values are constant (or constant
expressions). It's also possible to have macros which look like little functions (that is, you invoke them
with what looks like function call syntax, and they expand to replacement text which is a function of the
actual arguments they are invoked with) but we won't be looking at these yet.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx9b.html (3 of 3) [22/07/2003 5:31:56 PM]

9.3 Conditional Compilation

9.3 Conditional Compilation


[This section corresponds to K&R Sec. 4.11.3]
The last preprocessor directive we're going to look at is #ifdef. If you have the sequence
#ifdef name
program text
#else
more program text
#endif
in your program, the code that gets compiled depends on whether a preprocessor macro by that name is
defined or not. If it is (that is, if there has been a #define line for a macro called name), then
``program text'' is compiled and ``more program text'' is ignored. If the macro is not defined, ``more
program text'' is compiled and ``program text'' is ignored. This looks a lot like an if statement, but it
behaves completely differently: an if statement controls which statements of your program are executed
at run time, but #ifdef controls which parts of your program actually get compiled.
Just as for the if statement, the #else in an #ifdef is optional. There is a companion directive
#ifndef, which compiles code if the macro is not defined (although the ``#else clause'' of an
#ifndef directive will then be compiled if the macro is defined). There is also an #if directive which
compiles code depending on whether a compile-time expression is true or false. (The expressions which
are allowed in an #if directive are somewhat restricted, however, so we won't talk much about #if
here.)
Conditional compilation is useful in two general classes of situations:

You are trying to write a portable program, but the way you do something is different depending
on what compiler, operating system, or computer you're using. You place different versions of
your code, one for each situation, between suitable #ifdef directives, and when you compile the
progam in a particular environment, you arrange to have the macro names defined which select
the variants you need in that environment. (For this reason, compilers usually have ways of letting
you define macros from the invocation command line or in a configuration file, and many also
predefine certain macro names related to the operating system, processor, or compiler in use. That
way, you don't have to change the code to change the #define lines each time you compile it in
a different environment.)
For example, in ANSI C, the function to delete a file is remove. On older Unix systems,
however, the function was called unlink. So if filename is a variable containing the name of
a file you want to delete, and if you want to be able to compile the program under these older
Unix systems, you might write

http://www.eskimo.com/~scs/cclass/notes/sx9c.html (1 of 3) [22/07/2003 5:32:01 PM]

9.3 Conditional Compilation

#ifdef unix
unlink(filename);
#else
remove(filename);
#endif
Then, you could place the line
#define unix
at the top of the file when compiling under an old Unix system. (Since all you're using the macro
unix for is to control the #ifdef, you don't need to give it any replacement text at all. Any
definition for a macro, even if the replacement text is empty, causes an #ifdef to succeed.)

(In fact, in this example, you wouldn't even need to define the macro unix at all, because C
compilers on old Unix systems tend to predefine it for you, precisely so you can make tests like
these.)
You want to compile several different versions of your program, with different features present in
the different versions. You bracket the code for each feature with #ifdef directives, and (as for
the previous case) arrange to have the right macros defined or not to build the version you want to
build at any given time. This way, you can build the several different versions from the same
source code. (One common example is whether you turn debugging statements on or off. You can
bracket each debugging printout with #ifdef DEBUG and #endif, and then turn on
debugging only when you need it.)
For example, you might use lines like this:
#ifdef DEBUG
printf("x is %d\n", x);
#endif
to print out the value of the variable x at some point in your program to see if it's what you
expect. To enable debugging printouts, you insert the line
#define DEBUG
at the top of the file, and to turn them off, you delete that line, but the debugging printouts quietly
remain in your code, temporarily deactivated, but ready to reactivate if you find yourself needing
them again later. (Also, instead of inserting and deleting the #define line, you might use a
compiler flag such as -DDEBUG to define the macro DEBUG from the compiler invocatin line.)

http://www.eskimo.com/~scs/cclass/notes/sx9c.html (2 of 3) [22/07/2003 5:32:01 PM]

9.3 Conditional Compilation

Conditional compilation can be very handy, but it can also get out of hand. When large chunks of the
program are completely different depending on, say, what operating system the program is being
compiled for, it's often better to place the different versions in separate source files, and then only use
one of the files (corresponding to one of the versions) to build the program on any given system. Also, if
you are using an ANSI Standard compiler and you are writing ANSI-compatible code, you usually won't
need so much conditional compilation, because the Standard specifies exactly how the compiler must do
certain things, and exactly which library functions it much provide, so you don't have to work so hard to
accommodate the old variations among compilers and libraries.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx9c.html (3 of 3) [22/07/2003 5:32:01 PM]

Chapter 10: Pointers

Chapter 10: Pointers


Pointers are often thought to be the most difficult aspect of C. It's true that many people have various
problems with pointers, and that many programs founder on pointer-related bugs. Actually, though, many
of the problems are not so much with the pointers per se but rather with the memory they point to, and
more specifically, when there isn't any valid memory which they point to. As long as you're careful to
ensure that the pointers in your programs always point to valid memory, pointers can be useful, powerful,
and relatively trouble-free tools. (We'll talk about memory allocation in the next chapter.)
[This chapter is the only one in this series that contains any graphics. If you are using a text-only
browser, there are a few figures you won't be able to see.]
A pointer is a variable that points at, or refers to, another variable. That is, if we have a pointer variable
of type ``pointer to int,`` it might point to the int variable i, or to the third cell of the int array a.
Given a pointer variable, we can ask questions like, ``What's the value of the variable that this pointer
points to?''
Why would we want to have a variable that refers to another variable? Why not just use that other
variable directly? The answer is that a level of indirection can be very useful. (Indirection is just another
word for the situation when one variable refers to another.)
Imagine a club which elects new officers each year. In its clubroom, it might have a set of mailboxes for
each member, along with special mailboxes for the president, secretary, and treasurer. The bank doesn't
mail statements to the treasurer under the treasurer's name; it mails them to ``treasurer,'' and the
statements go to the mailbox marked ``treasurer.'' This way, the bank doesn't have to change the mailing
address it uses every year. The mailboxes labeled ``president,'' ``treasurer,'' and ``secretary'' are a little bit
like pointers--they don't refer to people directly.
If we make the analogy that a mailbox holding letters is like a variable holding numbers, then mailboxes
for the president, secretary, and treasurer aren't quite like pointers, because they're still mailboxes which
in principle could hold letters directly. But suppose that mail is never actually put in those three
mailboxes: suppose each of the officers' mailboxes contains a little marker listing the name of the
member currently holding that office. When you're sorting mail, and you have a letter for the treasurer,
you first go to the treasurer's mailbox, but rather than putting the letter there, you read the name on the
marker there, and put the mail in the mailbox for that person. Similarly, if the club is poorly organized,
and the treasurer stops doing his job, and you're the president, and one day you get a call from the bank
saying that the club's account is in arrears and the treasurer hasn't done anything about it and asking if
you, the president, can look into it; and if the club is so poorly organized that you've forgotten who the
treasurer is, you can go to the treasurer's mailbox, read the name on the marker there, and go to that
mailbox (which is probably overflowing) to find all the treasury-related mail.

http://www.eskimo.com/~scs/cclass/notes/sx10.html (1 of 2) [22/07/2003 5:32:03 PM]

Chapter 10: Pointers

We could say that the markers in the mailboxes for the president, secretary, and treasurer were pointers
to other mailboxes. In an analogous way, pointer variables in C contain pointers to other variables or
memory locations.
10.1 Basic Pointer Operations
10.2 Pointers and Arrays; Pointer Arithmetic
10.3 Pointer Subtraction and Comparison
10.4 Null Pointers
10.5 ``Equivalence'' between Pointers and Arrays
10.6 Arrays and Pointers as Function Arguments
10.7 Strings
10.8 Example: Breaking a Line into ``Words''

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10.html (2 of 2) [22/07/2003 5:32:03 PM]

10.1 Basic Pointer Operations

10.1 Basic Pointer Operations


[This section corresponds to K&R Sec. 5.1]
The first things to do with pointers are to declare a pointer variable, set it to point somewhere, and finally
manipulate the value that it points to. A simple pointer declaration looks like this:
int *ip;
This declaration looks like our earlier declarations, with one obvious difference: that asterisk. The asterisk
means that ip, the variable we're declaring, is not of type int, but rather of type pointer-to-int.
(Another way of looking at it is that *ip, which as we'll see is the value pointed to by ip, will be an
int.)
We may think of setting a pointer variable to point to another variable as a two-step process: first we
generate a pointer to that other variable, then we assign this new pointer to the pointer variable. We can
say (but we have to be careful when we're saying it) that a pointer variable has a value, and that its value
is ``pointer to that other variable''. This will make more sense when we see how to generate pointer
values.
Pointers (that is, pointer values) are generated with the ``address-of'' operator &, which we can also think
of as the ``pointer-to'' operator. We demonstrate this by declaring (and initializing) an int variable i, and
then setting ip to point to it:
int i = 5;
ip = &i;
The assignment expression ip = &i; contains both parts of the ``two-step process'': &i generates a
pointer to i, and the assignment operator assigns the new pointer to (that is, places it ``in'') the variable
ip. Now ip ``points to'' i, which we can illustrate with this picture:

i is a variable of type int, so the value in its box is a number, 5. ip is a variable of type pointer-to-int,
so the ``value'' in its box is an arrow pointing at another box. Referring once again back to the ``two-step
process'' for setting a pointer variable: the & operator draws us the arrowhead pointing at i's box, and the
assignment operator =, with the pointer variable ip on its left, anchors the other end of the arrow in ip's
box.
We discover the value pointed to by a pointer using the ``contents-of'' operator, *. Placed in front of a
http://www.eskimo.com/~scs/cclass/notes/sx10a.html (1 of 6) [22/07/2003 5:32:08 PM]

10.1 Basic Pointer Operations

pointer, the * operator accesses the value pointed to by that pointer. In other words, if ip is a pointer,
then the expression *ip gives us whatever it is that's in the variable or location pointed to by ip. For
example, we could write something like
printf("%d\n", *ip);
which would print 5, since ip points to i, and i is (at the moment) 5.
(You may wonder how the asterisk * can be the pointer contents-of operator when it is also the
multiplication operator. There is no ambiguity here: it is the multiplication operator when it sits between
two variables, and it is the contents-of operator when it sits in front of a single variable. The situation is
analogous to the minus sign: between two variables or expressions it's the subtraction operator, but in
front of a single operator or expression it's the negation operator. Technical terms you may hear for these
distinct roles are unary and binary: a binary operator applies to two operands, usually on either side of it,
while a unary operator applies to a single operand.)
The contents-of operator * does not merely fetch values through pointers; it can also set values through
pointers. We can write something like
*ip = 7;
which means ``set whatever ip points to to 7.'' Again, the * tells us to go to the location pointed to by ip,
but this time, the location isn't the one to fetch from--we're on the left-hand sign of an assignment
operator, so *ip tells us the location to store to. (The situation is no different from array subscripting
expressions such as a[3] which we've already seen appearing on both sides of assignments.)
The result of the assignment *ip = 7 is that i's value is changed to 7, and the picture changes to:

If we called printf("%d\n", *ip) again, it would now print 7.


At this point, you may be wondering why we're going through this rigamarole--if we wanted to set i to 7,
why didn't we do it directly? We'll begin to explore that next, but first let's notice the difference between
changing a pointer (that is, changing what variable it points to) and changing the value at the location it
points to. When we wrote *ip = 7, we changed the value pointed to by ip, but if we declare another
variable j:
int j = 3;

http://www.eskimo.com/~scs/cclass/notes/sx10a.html (2 of 6) [22/07/2003 5:32:08 PM]

10.1 Basic Pointer Operations

and write
ip = &j;
we've changed ip itself. The picture now looks like this:

We have to be careful when we say that a pointer assignment changes ``what the pointer points to.'' Our
earlier assignment
*ip = 7;
changed the value pointed to by ip, but this more recent assignment
ip = &j;
has changed what variable ip points to. It's true that ``what ip points to'' has changed, but this time, it
has changed for a different reason. Neither i (which is still 7) nor j (which is still 3) has changed. (What
has changed is ip's value.) If we again call
printf("%d\n", *ip);
this time it will print 3.
We can also assign pointer values to other pointer variables. If we declare a second pointer variable:
int *ip2;
then we can say
ip2 = ip;
Now ip2 points where ip does; we've essentially made a ``copy'' of the arrow:

http://www.eskimo.com/~scs/cclass/notes/sx10a.html (3 of 6) [22/07/2003 5:32:08 PM]

10.1 Basic Pointer Operations

Now, if we set ip to point back to i again:


ip = &i;
the two arrows point to different places:

We can now see that the two assignments


ip2 = ip;
and
*ip2 = *ip;
do two very different things. The first would make ip2 again point to where ip points (in other words,
back to i again). The second would store, at the location pointed to by ip2, a copy of the value pointed
to by ip; in other words (if ip and ip2 still point to i and j respectively) it would set j to i's value, or
7.
It's important to keep very clear in your mind the distinction between a pointer and what it points to. The
two are like apples and oranges (or perhaps oil and water); you can't mix them. You can't ``set ip to 5'' by
writing something like
ip = 5;

/* WRONG */

5 is an integer, but ip is a pointer. You probably wanted to ``set the value pointed to by ip to 5,'' which
you express by writing
*ip = 5;
Similarly, you can't ``see what ip is'' by writing
printf("%d\n", ip);

/* WRONG */

Again, ip is a pointer-to-int, but %d expects an int. To print what ip points to, use
http://www.eskimo.com/~scs/cclass/notes/sx10a.html (4 of 6) [22/07/2003 5:32:08 PM]

10.1 Basic Pointer Operations

printf("%d\n", *ip);
Finally, a few more notes about pointer declarations. The * in a pointer declaration is related to, but
different from, the contents-of operator *. After we declare a pointer variable
int *ip;
the expression
ip = &i
sets what ip points to (that is, which location it points to), while the expression
*ip = 5
sets the value of the location pointed to by ip. On the other hand, if we declare a pointer variable and
include an initializer:
int *ip3 = &i;
we're setting the initial value for ip3, which is where ip3 will point, so that initial value is a pointer. (In
other words, the * in the declaration int *ip3 = &i; is not the contents-of operator, it's the indicator
that ip3 is a pointer.)
If you have a pointer declaration containing an initialization, and you ever have occasion to break it up
into a simple declaration and a conventional assignment, do it like this:
int *ip3;
ip3 = &i;
Don't write
int *ip3;
*ip3 = &i;
or you'll be trying to mix oil and water again.
Also, when we write
int *ip;

http://www.eskimo.com/~scs/cclass/notes/sx10a.html (5 of 6) [22/07/2003 5:32:08 PM]

10.1 Basic Pointer Operations

although the asterisk affects ip's type, it goes with the identifier name ip, not with the type int on the
left. To declare two pointers at once, the declaration looks like
int *ip1, *ip2;
Some people write pointer declarations like this:
int* ip;
This works for one pointer, because C essentially ignores whitespace. But if you ever write
int* ip1, ip2;

/* PROBABLY WRONG */

it will declare one pointer-to-int ip1 and one plain int ip2, which is probably not what you meant.
What is all of this good for? If it was just for changing variables like i from 5 to 7, it would not be good
for much. What it's good for, among other things, is when for various reasons we don't know exactly
which variable we want to change, just like the bank didn't know exactly which club member it wanted to
send the statement to.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10a.html (6 of 6) [22/07/2003 5:32:08 PM]

10.2 Pointers and Arrays; Pointer Arithmetic

10.2 Pointers and Arrays; Pointer Arithmetic


[This section corresponds to K&R Sec. 5.3]
Pointers do not have to point to single variables. They can also point at the cells of an array. For
example, we can write
int *ip;
int a[10];
ip = &a[3];
and we would end up with ip pointing at the fourth cell of the array a (remember, arrays are 0-based, so
a[0] is the first cell). We could illustrate the situation like this:

We'd use this ip just like the one in the previous section: *ip gives us what ip points to, which in this
case will be the value in a[3].
Once we have a pointer pointing into an array, we can start doing pointer arithmetic. Given that ip is a
pointer to a[3], we can add 1 to ip:
ip + 1
What does it mean to add one to a pointer? In C, it gives a pointer to the cell one farther on, which in this
case is a[4]. To make this clear, let's assign this new pointer to another pointer variable:
ip2 = ip + 1;
Now the picture looks like this:

If we now do
*ip2 = 4;

http://www.eskimo.com/~scs/cclass/notes/sx10b.html (1 of 4) [22/07/2003 5:32:11 PM]

10.2 Pointers and Arrays; Pointer Arithmetic

we've set a[4] to 4. But it's not necessary to assign a new pointer value to a pointer variable in order to
use it; we could also compute a new pointer value and use it immediately:
*(ip + 1) = 5;
In this last example, we've changed a[4] again, setting it to 5. The parentheses are needed because the
unary ``contents of'' operator * has higher precedence (i.e., binds more tightly than) the addition
operator. If we wrote *ip + 1, without the parentheses, we'd be fetching the value pointed to by ip,
and adding 1 to that value. The expression *(ip + 1), on the other hand, accesses the value one past
the one pointed to by ip.
Given that we can add 1 to a pointer, it's not surprising that we can add and subtract other numbers as
well. If ip still points to a[3], then
*(ip + 3) = 7;
sets a[6] to 7, and
*(ip - 2) = 4;
sets a[1] to 4.
Up above, we added 1 to ip and assigned the new pointer to ip2, but there's no reason we can't add one
to a pointer, and change the same pointer:
ip = ip + 1;
Now ip points one past where it used to (to a[4], if we hadn't changed it in the meantime). The
shortcuts we learned in a previous chapter all work for pointers, too: we could also increment a pointer
using
ip += 1;
or
ip++;
Of course, pointers are not limited to ints. It's quite common to use pointers to other types, especially
char. Here is the innards of the mystrcmp function we saw in a previous chapter, rewritten to use
pointers. (mystrcmp, you may recall, compares two strings, character by character.)

http://www.eskimo.com/~scs/cclass/notes/sx10b.html (2 of 4) [22/07/2003 5:32:11 PM]

10.2 Pointers and Arrays; Pointer Arithmetic

char *p1 = &str1[0], *p2 = &str2[0];


while(1)
{
if(*p1 != *p2)
return *p1 - *p2;
if(*p1 == '\0' || *p2 == '\0')
return 0;
p1++;
p2++;
}
The autoincrement operator ++ (like its companion, --) makes it easy to do two things at once. We've
seen idioms like a[i++] which accesses a[i] and simultaneously increments i, leaving it referencing
the next cell of the array a. We can do the same thing with pointers: an expression like *ip++ lets us
access what ip points to, while simultaneously incrementing ip so that it points to the next element. The
preincrement form works, too: *++ip increments ip, then accesses what it points to. Similarly, we can
use notations like *ip-- and *--ip.
As another example, here is the strcpy (string copy) loop from a previous chapter, rewritten to use
pointers:
char *dp = &dest[0], *sp = &src[0];
while(*sp != '\0')
*dp++ = *sp++;
*dp = '\0';
(One question that comes up is whether the expression *p++ increments p or what it points to. The
answer is that it increments p. To increment what p points to, you can use (*p)++.)
When you're doing pointer arithmetic, you have to remember how big the array the pointer points into is,
so that you don't ever point outside it. If the array a has 10 elements, you can't access a[50] or a[-1]
or even a[10] (remember, the valid subscripts for a 10-element array run from 0 to 9). Similarly, if a
has 10 elements and ip points to a[3], you can't compute or access ip + 10 or ip - 5. (There is
one special case: you can, in this case, compute, but not access, a pointer to the nonexistent element just
beyond the end of the array, which in this case is &a[10]. This becomes useful when you're doing
pointer comparisons, which we'll look at next.)

Read sequentially: prev next up top

http://www.eskimo.com/~scs/cclass/notes/sx10b.html (3 of 4) [22/07/2003 5:32:11 PM]

10.2 Pointers and Arrays; Pointer Arithmetic

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10b.html (4 of 4) [22/07/2003 5:32:11 PM]

10.3 Pointer Subtraction and Comparison

10.3 Pointer Subtraction and Comparison


As we've seen, you can add an integer to a pointer to get a new pointer, pointing somewhere beyond the
original (as long as it's in the same array). For example, you might write
ip2 = ip1 + 3;
Applying a little algebra, you might wonder whether
ip2 - ip1 = 3
and the answer is, yes. When you subtract two pointers, as long as they point into the same array, the
result is the number of elements separating them. You can also ask (again, as long as they point into the
same array) whether one pointer is greater or less than another: one pointer is ``greater than'' another if it
points beyond where the other one points. You can also compare pointers for equality and inequality: two
pointers are equal if they point to the same variable or to the same cell in an array, and are (obviously)
unequal if they don't. (When testing for equality or inequality, the two pointers do not have to point into
the same array.)
One common use of pointer comparisons is when copying arrays using pointers. Here is a code fragment
which copies 10 elements from array1 to array2, using pointers. It uses an end pointer, ep, to keep
track of when it should stop copying.
int array1[10], array2[10];
int *ip1, *ip2 = &array2[0];
int *ep = &array1[10];
for(ip1 = &array1[0]; ip1 < ep; ip1++)
*ip2++ = *ip1;
As we mentioned, there is no element array1[10], but it is legal to compute a pointer to this
(nonexistent) element, as long as we only use it in pointer comparisons like this (that is, as long as we
never try to fetch or store the value that it points to.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10c.html [22/07/2003 5:32:13 PM]

10.4 Null Pointers

10.4 Null Pointers


We said that the value of a pointer variable is a pointer to some other variable. There is one other value a pointer may
have: it may be set to a null pointer. A null pointer is a special pointer value that is known not to point anywhere.
What this means that no other valid pointer, to any other variable or array cell or anything else, will ever compare
equal to a null pointer.
The most straightforward way to ``get'' a null pointer in your program is by using the predefined constant NULL,
which is defined for you by several standard header files, including <stdio.h>, <stdlib.h>, and
<string.h>. To initialize a pointer to a null pointer, you might use code like
#include <stdio.h>
int *ip = NULL;
and to test it for a null pointer before inspecting the value pointed to you might use code like
if(ip != NULL)
printf("%d\n", *ip);
It is also possible to refer to the null pointer by using a constant 0, and you will see some code that sets null pointers
by simply doing
int *ip = 0;
(In fact, NULL is a preprocessor macro which typically has the value, or replacement text, 0.)
Furthermore, since the definition of ``true'' in C is a value that is not equal to 0, you will see code that tests for nonnull pointers with abbreviated code like
if(ip)
printf("%d\n", *ip);
This has the same meaning as our previous example; if(ip) is equivalent to if(ip != 0) and to if(ip !=
NULL).
All of these uses are legal, and although I recommend that you use the constant NULL for clarity, you will come
across the other forms, so you should be able to recognize them.
You can use a null pointer as a placeholder to remind yourself (or, more importantly, to help your program remember)
that a pointer variable does not point anywhere at the moment and that you should not use the ``contents of'' operator
on it (that is, you should not try to inspect what it points to, since it doesn't point to anything). A function that returns
pointer values can return a null pointer when it is unable to perform its task. (A null pointer used in this way is
analogous to the EOF value that functions like getchar return.)
As an example, let us write our own version of the standard library function strstr, which looks for one string
http://www.eskimo.com/~scs/cclass/notes/sx10d.html (1 of 3) [22/07/2003 5:32:15 PM]

10.4 Null Pointers

within another, returning a pointer to the string if it can, or a null pointer if it cannot. Here is the function, using the
obvious brute-force algorithm: at every character of the input string, the code checks for a match there of the pattern
string:
#include <stddef.h>
char *mystrstr(char input[], char pat[])
{
char *start, *p1, *p2;
for(start = &input[0]; *start != '\0'; start++)
{
/* for each position in input string... */
p1 = pat;
/* prepare to check for pattern string there */
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2) /* characters differ */
break;
p1++;
p2++;
}
if(*p1 == '\0')
/* found match */
return start;
}
return NULL;
}
The start pointer steps over each character position in the input string. At each character, the inner loop checks
for a match there, by using p1 to step over the pattern string (pat), and p2 to step over the input string (starting at
start). We compare successive characters until either (a) we reach the end of the pattern string (*p1 == '\0'),
or (b) we find two characters which differ. When we're done with the inner loop, if we reached the end of the pattern
string (*p1 == '\0'), it means that all preceding characters matched, and we found a complete match for the
pattern starting at start, so we return start. Otherwise, we go around the outer loop again, to try another starting
position. If we run out of those (if *start == '\0'), without finding a match, we return a null pointer.
Notice that the function is declared as returning (and does in fact return) a pointer-to-char.
We can use mystrstr (or its standard library counterpart strstr) to determine whether one string contains
another:
if(mystrstr("Hello, world!", "lo") == NULL)
printf("no\n");
else
printf("yes\n");
In general, C does not initialize pointers to null for you, and it never tests pointers to see if they are null before using
them. If one of the pointers in your programs points somewhere some of the time but not all of the time, an excellent
convention to use is to set it to a null pointer when it doesn't point anywhere valid, and to test to see if it's a null
pointer before using it. But you must use explicit code to set it to NULL, and to test it against NULL. (In other words,
http://www.eskimo.com/~scs/cclass/notes/sx10d.html (2 of 3) [22/07/2003 5:32:15 PM]

10.4 Null Pointers

just setting an unused pointer variable to NULL doesn't guarantee safety; you also have to check for the null value
before using the pointer.) On the other hand, if you know that a particular pointer variable is always valid, you don't
have to insert a paranoid test against NULL before using it.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10d.html (3 of 3) [22/07/2003 5:32:15 PM]

10.5 ``Equivalence'' between Pointers and Arrays

10.5 ``Equivalence'' between Pointers and Arrays


There are a number of similarities between arrays and pointers in C. If you have an array
int a[10];
you can refer to a[0], a[1], a[2], etc., or to a[i] where i is an int. If you declare a pointer
variable ip and set it to point to the beginning of an array:
int *ip = &a[0];
you can refer to *ip, *(ip+1), *(ip+2), etc., or to *(ip+i) where i is an int.
There are also differences, of course. You cannot assign two arrays; the code
int a[10], b[10];
a = b;

/* WRONG */

is illegal. As we've seen, though, you can assign two pointer variables:
int *ip1, *ip2;
ip1 = &a[0];
ip2 = ip1;
Pointer assignment is straightforward; the pointer on the left is simply made to point wherever the pointer
on the right does. We haven't copied the data pointed to (there's still just one copy, in the same place);
we've just made two pointers point to that one place.
The similarities between arrays and pointers end up being quite useful, and in fact C builds on the
similarities, leading to what is called ``the equivalence of arrays and pointers in C.'' When we speak of
this ``equivalence'' we do not mean that arrays and pointers are the same thing (they are in fact quite
different), but rather that they can be used in related ways, and that certain operations may be used
between them.
The first such operation is that it is possible to (apparently) assign an array to a pointer:
int a[10];
int *ip;
ip = a;
What can this mean? In that last assignment ip = a, aren't we mixing apples and oranges again? It
http://www.eskimo.com/~scs/cclass/notes/sx10e.html (1 of 3) [22/07/2003 5:32:17 PM]

10.5 ``Equivalence'' between Pointers and Arrays

turns out that we are not; C defines the result of this assignment to be that ip receives a pointer to the
first element of a. In other words, it is as if you had written
ip = &a[0];
The second facet of the equivalence is that you can use the ``array subscripting'' notation [i] on
pointers, too. If you write
ip[3]
it is just as if you had written
*(ip + 3)
So when you have a pointer that points to a block of memory, such as an array or a part of an array, you
can treat that pointer ``as if'' it were an array, using the convenient [i] notation. In other words, at the
beginning of this section when we talked about *ip, *(ip+1), *(ip+2), and *(ip+i), we could
have written ip[0], ip[1], ip[2], and ip[i]. As we'll see, this can be quite useful (or at least
convenient).
The third facet of the equivalence (which is actually a more general version of the first one we
mentioned) is that whenever you mention the name of an array in a context where the ``value'' of the
array would be needed, C automatically generates a pointer to the first element of the array, as if you had
written &array[0]. When you write something like
int a[10];
int *ip;
ip = a + 3;
it is as if you had written
ip = &a[0] + 3;
which (and you might like to convince yourself of this) gives the same result as if you had written
ip = &a[3];
For example, if the character array
char string[100];

http://www.eskimo.com/~scs/cclass/notes/sx10e.html (2 of 3) [22/07/2003 5:32:17 PM]

10.5 ``Equivalence'' between Pointers and Arrays

contains some string, here is another way to find its length:


int len;
char *p;
for(p = string; *p != '\0'; p++)
;
len = p - string;
After the loop, p points to the '\0' terminating the string. The expression p - string is equivalent
to p - &string[0], and gives the length of the string. (Of course, we could also call strlen; in
fact here we've essentially written another implementation of strlen.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10e.html (3 of 3) [22/07/2003 5:32:17 PM]

10.6 Arrays and Pointers as Function Arguments

10.6 Arrays and Pointers as Function Arguments


[This section corresponds to K&R Sec. 5.2]
Earlier, we learned that functions in C receive copies of their arguments. (This means that C uses call by
value; it means that a function can modify one of its arguments without modifying the value in the
caller.) We didn't say so at the time, but when a function is called, the copies of the arguments are made
as if by assignment. But since arrays can't be assigned, how can a function receive an array as an
argument? The answer will explain why arrays are an apparent exception to the rule that functions cannot
modify their arguments.
We've been regularly calling a function getline like this:
char line[100];
getline(line, 100);
with the intention that getline read the next line of input into the character array line. But in the
previous paragraph, we learned that when we mention the name of an array in an expression, the
compiler generates a pointer to its first element. So the call above is as if we had written
char line[100];
getline(&line[0], 100);
In other words, the getline function does not receive an array of char at all; it actually receives a
pointer to char!
As we've seen throughout this chapter, it's straightforward to manipulate the elements of an array using
pointers, so there's no particular insurmountable difficulty if getline receives a pointer. One question
remains, though: we had been defining getline with its line parameter declared as an array:
int getline(char line[], int max)
{
...
}
We mentioned that we didn't have to specify a size for the line parameter, with the explanation that
getline really used the array in its caller, where the actual size was specified. But that declaration
certainly does look like an array--how can it work when getline actually receives a pointer?
The answer is that the C compiler does a little something behind your back. It knows that whenever you
mention an array name in an expression, it (the compiler) generates a pointer to the array's first element.
http://www.eskimo.com/~scs/cclass/notes/sx10f.html (1 of 4) [22/07/2003 5:32:20 PM]

10.6 Arrays and Pointers as Function Arguments

Therefore, it knows that a function can never actually receive an array as a parameter. Therefore,
whenever it sees you defining a function that seems to accept an array as a parameter, the compiler
quietly pretends that you had declared it as accepting a pointer, instead. The definition of getline
above is compiled exactly as if it had been written
int getline(char *line, int max)
{
...
}
Let's look at how getline might be written if we thought of its first parameter (argument) as a pointer,
instead:
int
{
int
int
max

getline(char *line, int max)


nch = 0;
c;
= max - 1;

/* leave room for '\0' */

#ifndef FGETLINE
while((c = getchar()) != EOF)
#else
while((c = getc(fp)) != EOF)
#endif
{
if(c == '\n')
break;
if(nch < max)
{
*(line + nch) = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
*(line + nch) = '\0';
return nch;
}
But, as we've learned, we can also use ``array subscript'' notation with pointers, so we could rewrite the
http://www.eskimo.com/~scs/cclass/notes/sx10f.html (2 of 4) [22/07/2003 5:32:20 PM]

10.6 Arrays and Pointers as Function Arguments

pointer version of getline like this:


int
{
int
int
max

getline(char *line, int max)


nch = 0;
c;
= max - 1;

/* leave room for '\0' */

#ifndef FGETLINE
while((c = getchar()) != EOF)
#else
while((c = getc(fp)) != EOF)
#endif
{
if(c == '\n')
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
But this is exactly what we'd written before (see chapter 6, Sec. 6.3), except that the declaration of the
line parameter is different. In other words, within the body of the function, it hardly matters whether
we thought line was an array or a pointer, since we can use array subscripting notation with both arrays
and pointers.
These games that the compiler is playing with arrays and pointers may seem bewildering at first, and it
may seem faintly miraculous that everything comes out in the wash when you declare a function like
getline that seems to accept an array. The equivalence in C between arrays and pointers can be
confusing, but it does work and is one of the central features of C. If the games which the compiler plays
(pretending that you declared a parameter as a pointer when you thought you declared it as an array)
bother you, you can do two things:

http://www.eskimo.com/~scs/cclass/notes/sx10f.html (3 of 4) [22/07/2003 5:32:20 PM]

10.6 Arrays and Pointers as Function Arguments

1. Continue to pretend that functions can receive arrays as parameters; declare and use them that
way, but remember that unlike other arguments, a function can modify the copy in its caller of an
argument that (seems to be) an array.
2. Realize that arrays are always passed to functions as pointers, and always declare your functions
as accepting pointers.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10f.html (4 of 4) [22/07/2003 5:32:20 PM]

10.7 Strings

10.7 Strings
Because of the ``equivalence'' of arrays and pointers, it is extremely common to refer to and manipulate
strings as character pointers, or char *'s. It is so common, in fact, that it is easy to forget that strings
are arrays, and to imagine that they're represented by pointers. (Actually, in the case of strings, it may not
even matter that much if the distinction gets a little blurred; there's certainly nothing wrong with referring
to a character pointer, suitably initialized, as a ``string.'') Let's look at a few of the implications:
1. Any function that manipulates a string will actually accept it as a char * argument. The caller
may pass an array containing a string, but the function will receive a pointer to the array's
(string's) first element (character).
2. The %s format in printf expects a character pointer.
3. Although you have to use strcpy to copy a string from one array to another, you can use simple
pointer assignment to assign a string to a pointer. The string being assigned might either be in an
array or pointed to by another pointer. In other words, given
char string[] = "Hello, world!";
char *p1, *p2;
both
p1 = string
and
p2 = p1
are legal. (Remember, though, that when you assign a pointer, you're making a copy of the pointer
but not of the data it points to. In the first example, p1 ends up pointing to the string in string.
In the second example, p2 ends up pointing to the same string as p1. In any case, after a pointer
assignment, if you ever change the string (or other data) pointed to, the change is ``visible'' to both
pointers.
4. Many programs manipulate strings exclusively using character pointers, never explicitly declaring
any actual arrays. As long as these programs are careful to allocate appropriate memory for the
strings, they're perfectly valid and correct.
When you start working heavily with strings, however, you have to be aware of one subtle fact.
When you initialize a character array with a string constant:
char string[] = "Hello, world!";
http://www.eskimo.com/~scs/cclass/notes/sx10g.html (1 of 3) [22/07/2003 5:32:22 PM]

10.7 Strings

you end up with an array containing the string, and you can modify the array's contents to your heart's
content:
string[0] = 'J';
However, it's possible to use string constants (the formal term is string literals) at other places in your
code. Since they're arrays, the compiler generates pointers to their first elements when they're used in
expressions, as usual. That is, if you say
char *p1 = "Hello";
int len = strlen("world");
it's almost as if you'd said
char internal_string_1[] = "Hello";
char internal_string_2[] = "world";
char *p1 = &internal_string_1[0];
int len = strlen(&internal_string_2[0]);
Here, the arrays named internal_string_1 and internal_string_2 are supposed to suggest
the fact that the compiler is actually generating little temporary arrays every time you use a string
constant in your code. However, the subtle fact is that the arrays which are ``behind'' the string constants
are not necessarily modifiable. In particular, the compiler may store them in read-only-memory.
Therefore, if you write
char *p3 = "Hello, world!";
p3[0] = 'J';
your program may crash, because it may try to store a value (in this case, the character 'J') into
nonwritable memory.
The moral is that whenever you're building or modifying strings, you have to make sure that the memory
you're building or modifying them in is writable. That memory should either be an array you've
allocated, or some memory which you've dynamically allocated by the techniques which we'll see in the
next chapter. Make sure that no part of your program will ever try to modify a string which is actually
one of the unnamed, unwritable arrays which the compiler generated for you in response to one of your
string constants. (The only exception is array initialization, because if you write to such an array, you're
writing to the array, not to the string literal which you used to initialize the array.)

http://www.eskimo.com/~scs/cclass/notes/sx10g.html (2 of 3) [22/07/2003 5:32:22 PM]

10.7 Strings

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10g.html (3 of 3) [22/07/2003 5:32:22 PM]

10.8 Example: Breaking a Line into ``Words''

10.8 Example: Breaking a Line into ``Words''


In an earlier assignment, an ``extra credit'' version of a problem asked you to write a little checkbook
balancing program that accepted a series of lines of the form
deposit 1000
check 10
check 12.34
deposit 50
check 20
It was a surprising nuisance to do this in an ad hoc way, using only the tools we had at the time. It was
easy to read each line, but it was cumbersome to break it up into the word (``deposit'' or ``check'') and the
amount.
I find it very convenient to use a more general approach: first, break lines like these into a series of
whitespace-separated words, then deal with each word separately. To do this, we will use an array of
pointers to char, which we can also think of as an ``array of strings,'' since a string is an array of char,
and a pointer-to-char can easily point at a string. Here is the declaration of such an array:
char *words[10];
This is the first complicated C declaration we've seen: it says that words is an array of 10 pointers to
char. We're going to write a function, getwords, which we can call like this:
int nwords;
nwords = getwords(line, words, 10);
where line is the line we're breaking into words, words is the array to be filled in with the (pointers to
the) words, and nwords (the return value from getwords) is the number of words which the function
finds. (As with getline, we tell the function the size of the array so that if the line should happen to
contain more words than that, it won't overflow the array).
Here is the definition of the getwords function. It finds the beginning of each word, places a pointer to
it in the array, finds the end of that word (which is signified by at least one whitespace character) and
terminates the word by placing a '\0' character after it. (The '\0' character will overwrite the first
whitespace character following the word.) Note that the original input string is therefore modified by
getwords: if you were to try to print the input line after calling getwords, it would appear to contain
only its first word (because of the first inserted '\0').
#include <stddef.h>
http://www.eskimo.com/~scs/cclass/notes/sx10h.html (1 of 3) [22/07/2003 5:32:24 PM]

10.8 Example: Breaking a Line into ``Words''

#include <ctype.h>
getwords(char *line, char *words[], int maxwords)
{
char *p = line;
int nwords = 0;
while(1)
{
while(isspace(*p))
p++;
if(*p == '\0')
return nwords;
words[nwords++] = p;
while(!isspace(*p) && *p != '\0')
p++;
if(*p == '\0')
return nwords;
*p++ = '\0';
if(nwords >= maxwords)
return nwords;
}
}
Each time through the outer while loop, the function tries to find another word. First it skips over
whitespace (which might be leading spaces on the line, or the space(s) separating this word from the
previous one). The isspace function is new: it's in the standard library, declared in the header file
<ctype.h>, and it returns nonzero (``true'') if the character you hand it is a space character (a space or
a tab, or any other whitespace character there might happen to be).
When the function finds a non-whitespace character, it has found the beginning of another word, so it
places the pointer to that character in the next cell of the words array. Then it steps though the word,
looking at non-whitespace characters, until it finds another whitespace character, or the \0 at the end of
the line. If it finds the \0, it's done with the entire line; otherwise, it changes the whitespace character to
a \0, to terminate the word it's just found, and continues. (If it's found as many words as will fit in the
words array, it returns prematurely.)
Each time it finds a word, the function increments the number of words (nwords) it has found. Since
http://www.eskimo.com/~scs/cclass/notes/sx10h.html (2 of 3) [22/07/2003 5:32:24 PM]

10.8 Example: Breaking a Line into ``Words''

arrays in C start at [0], the number of words the function has found so far is also the index of the cell in
the words array where the next word should be stored. The function actually assigns the next word and
increments nwords in one expression:
words[nwords++] = p;
You should convince yourself that this arrangement works, and that (in this case) the preincrement form
words[++nwords] = p;

/* WRONG */

would not behave as desired.


When the function is done (when it finds the \0 terminating the input line, or when it runs out of cells in
the words array) it returns the number of words it has found.
Here is a complete example of calling getwords:
char line[] = "this is a test";
int i;
nwords = getwords(line, words, 10);
for(i = 0; i < nwords; i++)
printf("%s\n", words[i]);

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx10h.html (3 of 3) [22/07/2003 5:32:24 PM]

Chapter 11: Memory Allocation

Chapter 11: Memory Allocation


In this chapter, we'll meet malloc, C's dynamic memory allocation function, and we'll cover dynamic
memory allocation in some detail.
As we begin doing dynamic memory allocation, we'll begin to see (if we haven't seen it already) what
pointers can really be good for. Many of the pointer examples in the previous chapter (those which used
pointers to access arrays) didn't do all that much for us that we couldn't have done using arrays.
However, when we begin doing dynamic memory allocation, pointers are the only way to go, because
what malloc returns is a pointer to the memory it gives us. (Due to the equivalence between pointers
and arrays, though, we will still be able to think of dynamically allocated regions of storage as if they
were arrays, and even to use array-like subscripting notation on them.)
You have to be careful with dynamic memory allocation. malloc operates at a pretty ``low level''; you
will often find yourself having to do a certain amount of work to manage the memory it gives you. If you
don't keep accurate track of the memory which malloc has given you, and the pointers of yours which
point to it, it's all too easy to accidentally use a pointer which points ``nowhere'', with generally
unpleasant results. (The basic problem is that if you assign a value to the location pointed to by a pointer:
*p = 0;
and if the pointer p points ``nowhere'', well actually it can be construed to point somewhere, just not
where you wanted it to, and that ``somewhere'' is where the 0 gets written. If the ``somewhere'' is
memory which is in use by some other part of your program, or even worse, if the operating system has
not protected itself from you and ``somewhere'' is in fact in use by the operating system, things could get
ugly.)
11.1 Allocating Memory with malloc
11.2 Freeing Memory
11.3 Reallocating Memory Blocks
11.4 Pointer Safety

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback
http://www.eskimo.com/~scs/cclass/notes/sx11.html [22/07/2003 5:32:26 PM]

11.1 Allocating Memory with <TT>malloc</TT>

11.1 Allocating Memory with malloc


[This section corresponds to parts of K&R Secs. 5.4, 5.6, 6.5, and 7.8.5]
A problem with many simple programs, including in particular little teaching programs such as we've
been writing so far, is that they tend to use fixed-size arrays which may or may not be big enough. We
have an array of 100 ints for the numbers which the user enters and wishes to find the average of--what
if the user enters 101 numbers? We have an array of 100 chars which we pass to getline to receive
the user's input--what if the user types a line of 200 characters? If we're lucky, the relevant parts of the
program check how much of an array they've used, and print an error message or otherwise gracefully
abort before overflowing the array. If we're not so lucky, a program may sail off the end of an array,
overwriting other data and behaving quite badly. In either case, the user doesn't get his job done. How can
we avoid the restrictions of fixed-size arrays?
The answers all involve the standard library function malloc. Very simply, malloc returns a pointer to
n bytes of memory which we can do anything we want to with. If we didn't want to read a line of input
into a fixed-size array, we could use malloc, instead. Here's the first step:
#include <stdlib.h>
char *line;
int linelen = 100;
line = malloc(linelen);
/* incomplete -- malloc's return value not checked */
getline(line, linelen);
malloc is declared in <stdlib.h>, so we #include that header in any program that calls malloc.
A ``byte'' in C is, by definition, an amount of storage suitable for storing one character, so the above
invocation of malloc gives us exactly as many chars as we ask for. We could illustrate the resulting
pointer like this:

The 100 bytes of memory (not all of which are shown) pointed to by line are those allocated by
malloc. (They are brand-new memory, conceptually a bit different from the memory which the compiler
arranges to have allocated automatically for our conventional variables. The 100 boxes in the figure don't
have a name next to them, because they're not storage for a variable we've declared.)
As a second example, we might have occasion to allocate a piece of memory, and to copy a string into it
with strcpy:
char *p = malloc(15);
http://www.eskimo.com/~scs/cclass/notes/sx11a.html (1 of 3) [22/07/2003 5:32:30 PM]

11.1 Allocating Memory with <TT>malloc</TT>

/* incomplete -- malloc's return value not checked */


strcpy(p, "Hello, world!");
When copying strings, remember that all strings have a terminating \0 character. If you use strlen to
count the characters in a string for you, that count will not include the trailing \0, so you must add one
before calling malloc:
char *somestring, *copy;
...
copy = malloc(strlen(somestring) + 1);
/* +1 for \0 */
/* incomplete -- malloc's return value not checked */
strcpy(copy, somestring);
What if we're not allocating characters, but integers? If we want to allocate 100 ints, how many bytes is
that? If we know how big ints are on our machine (i.e. depending on whether we're using a 16- or 32-bit
machine) we could try to compute it ourselves, but it's much safer and more portable to let C compute it
for us. C has a sizeof operator, which computes the size, in bytes, of a variable or type. It's just what
we need when calling malloc. To allocate space for 100 ints, we could call
int *ip = malloc(100 * sizeof(int));
The use of the sizeof operator tends to look like a function call, but it's really an operator, and it does
its work at compile time.
Since we can use array indexing syntax on pointers, we can treat a pointer variable after a call to malloc
almost exactly as if it were an array. In particular, after the above call to malloc initializes ip to point at
storage for 100 ints, we can access ip[0], ip[1], ... up to ip[99]. This way, we can get the effect
of an array even if we don't know until run time how big the ``array'' should be. (In a later section we'll
see how we might deal with the case where we're not even sure at the point we begin using it how big an
``array'' will eventually have to be.)
Our examples so far have all had a significant omission: they have not checked malloc's return value.
Obviously, no real computer has an infinite amount of memory available, so there is no guarantee that
malloc will be able to give us as much memory as we ask for. If we call malloc(100000000), or if
we call malloc(10) 10,000,000 times, we're probably going to run out of memory.
When malloc is unable to allocate the requested memory, it returns a null pointer. A null pointer,
remember, points definitively nowhere. It's a ``not a pointer'' marker; it's not a pointer you can use. (As
we said in section 9.4, a null pointer can be used as a failure return from a function that returns pointers,
and malloc is a perfect example.) Therefore, whenever you call malloc, it's vital to check the returned
pointer before using it! If you call malloc, and it returns a null pointer, and you go off and use that null
pointer as if it pointed somewhere, your program probably won't last long. Instead, a program should

http://www.eskimo.com/~scs/cclass/notes/sx11a.html (2 of 3) [22/07/2003 5:32:30 PM]

11.1 Allocating Memory with <TT>malloc</TT>

immediately check for a null pointer, and if it receives one, it should at the very least print an error
message and exit, or perhaps figure out some way of proceeding without the memory it asked for. But it
cannot go on to use the null pointer it got back from malloc in any way, because that null pointer by
definition points nowhere. (``It cannot use a null pointer in any way'' means that the program cannot use
the * or [] operators on such a pointer value, or pass it to any function that expects a valid pointer.)
A call to malloc, with an error check, typically looks something like this:
int *ip = malloc(100 * sizeof(int));
if(ip == NULL)
{
printf("out of memory\n");
exit or return
}
After printing the error message, this code should return to its caller, or exit from the program entirely; it
cannot proceed with the code that would have used ip.
Of course, in our examples so far, we've still limited ourselves to ``fixed size'' regions of memory,
because we've been calling malloc with fixed arguments like 10 or 100. (Our call to getline is still
limited to 100-character lines, or whatever number we set the linelen variable to; our ip variable still
points at only 100 ints.) However, since the sizes are now values which can in principle be determined
at run-time, we've at least moved beyond having to recompile the program (with a bigger array) to
accommodate longer lines, and with a little more work, we could arrange that the ``arrays'' automatically
grew to be as large as required. (For example, we could write something like getline which could read
the longest input line actually seen.) We'll begin to explore this possibility in a later section.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx11a.html (3 of 3) [22/07/2003 5:32:30 PM]

11.2 Freeing Memory

11.2 Freeing Memory


Memory allocated with malloc lasts as long as you want it to. It does not automatically disappear when
a function returns, as automatic-duration variables do, but it does not have to remain for the entire
duration of your program, either. Just as you can use malloc to control exactly when and how much
memory you allocate, you can also control exactly when you deallocate it.
In fact, many programs use memory on a transient basis. They allocate some memory, use it for a while,
but then reach a point where they don't need that particular piece any more. Because memory is not
inexhaustible, it's a good idea to deallocate (that is, release or free) memory you're no longer using.
Dynamically allocated memory is deallocated with the free function. If p contains a pointer previously
returned by malloc, you can call
free(p);
which will ``give the memory back'' to the stock of memory (sometimes called the ``arena'' or ``pool'')
from which malloc requests are satisfied. Calling free is sort of the ultimate in recycling: it costs you
almost nothing, and the memory you give back is immediately usable by other parts of your program.
(Theoretically, it may even be usable by other programs.)
(Freeing unused memory is a good idea, but it's not mandatory. When your program exits, any memory
which it has allocated but not freed should be automatically released. If your computer were to somehow
``lose'' memory just because your program forgot to free it, that would indicate a problem or deficiency
in your operating system.)
Naturally, once you've freed some memory you must remember not to use it any more. After calling
free(p);
it is probably the case that p still points at the same memory. However, since we've given it back, it's
now ``available,'' and a later call to malloc might give that memory to some other part of your
program. If the variable p is a global variable or will otherwise stick around for a while, one good way to
record the fact that it's not to be used any more would be to set it to a null pointer:
free(p);
p = NULL;
Now we don't even have the pointer to the freed memory any more, and (as long as we check to see that
p is non-NULL before using it), we won't misuse any memory via the pointer p.

http://www.eskimo.com/~scs/cclass/notes/sx11b.html (1 of 2) [22/07/2003 5:32:33 PM]

11.2 Freeing Memory

When thinking about malloc, free, and dynamically-allocated memory in general, remember again
the distinction between a pointer and what it points to. If you call malloc to allocate some memory, and
store the pointer which malloc gives you in a local pointer variable, what happens when the function
containing the local pointer variable returns? If the local pointer variable has automatic duration (which
is the default, unless the variable is declared static), it will disappear when the function returns. But
for the pointer variable to disappear says nothing about the memory pointed to! That memory still exists
and, as far as malloc and free are concerned, is still allocated. The only thing that has disappeared is
the pointer variable you had which pointed at the allocated memory. (Furthermore, if it contained the
only copy of the pointer you had, once it disappears, you'll have no way of freeing the memory, and no
way of using it, either. Using memory and freeing memory both require that you have at least one pointer
to the memory!)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx11b.html (2 of 2) [22/07/2003 5:32:33 PM]

11.3 Reallocating Memory Blocks

11.3 Reallocating Memory Blocks


Sometimes you're not sure at first how much memory you'll need. For example, if you need to store a series
of items you read from the user, and if the only way to know how many there are is to read them until the
user types some ``end'' signal, you'll have no way of knowing, as you begin reading and storing the first few,
how many you'll have seen by the time you do see that ``end'' marker. You might want to allocate room for,
say, 100 items, and if the user enters a 101st item before entering the ``end'' marker, you might wish for a
way to say ``uh, malloc, remember those 100 items I asked for? Could I change my mind and have 200
instead?''
In fact, you can do exactly this, with the realloc function. You hand realloc an old pointer (such as
you received from an initial call to malloc) and a new size, and realloc does what it can to give you a
chunk of memory big enough to hold the new size. For example, if we wanted the ip variable from an
earlier example to point at 200 ints instead of 100, we could try calling
ip = realloc(ip, 200 * sizeof(int));
Since you always want each block of dynamically-allocated memory to be contiguous (so that you can treat
it as if it were an array), you and realloc have to worry about the case where realloc can't make the
old block of memory bigger ``in place,'' but rather has to relocate it elsewhere in order to find enough
contiguous space for the new requested size. realloc does this by returning a new pointer. If realloc
was able to make the old block of memory bigger, it returns the same pointer. If realloc has to go
elsewhere to get enough contiguous memory, it returns a pointer to the new memory, after copying your old
data there. (In this case, after it makes the copy, it frees the old block.) Finally, if realloc can't find
enough memory to satisfy the new request at all, it returns a null pointer. Therefore, you usually don't want
to overwrite your old pointer with realloc's return value until you've tested it to make sure it's not a null
pointer. You might use code like this:
int *newp;
newp = realloc(ip, 200 * sizeof(int));
if(newp != NULL)
ip = newp;
else
{
printf("out of memory\n");
/* exit or return */
/* but ip still points at 100 ints */
}
If realloc returns something other than a null pointer, it succeeded, and we set ip to what it returned.
(We've either set ip to what it used to be or to a new pointer, but in either case, it points to where our data is
now.) If realloc returns a null pointer, however, we hang on to our old pointer in ip which still points at
our original 100 values.

http://www.eskimo.com/~scs/cclass/notes/sx11c.html (1 of 3) [22/07/2003 5:32:35 PM]

11.3 Reallocating Memory Blocks

Putting this all together, here is a piece of code which reads lines of text from the user, treats each line as an
integer by calling atoi, and stores each integer in a dynamically-allocated ``array'':
#define MAXLINE 100
char line[MAXLINE];
int *ip;
int nalloc, nitems;
nalloc = 100;
ip = malloc(nalloc * sizeof(int));
if(ip == NULL)
{
printf("out of memory\n");
exit(1);
}

/* initial allocation */

nitems = 0;
while(getline(line, MAXLINE) != EOF)
{
if(nitems >= nalloc)
{
/* increase allocation */
int *newp;
nalloc += 100;
newp = realloc(ip, nalloc * sizeof(int));
if(newp == NULL)
{
printf("out of memory\n");
exit(1);
}
ip = newp;
}
ip[nitems++] = atoi(line);
}
We use two different variables to keep track of the ``array'' pointed to by ip. nalloc is now many
elements we've allocated, and nitems is how many of them are in use. Whenever we're about to store
another item in the ``array,'' if nitems >= nalloc, the old ``array'' is full, and it's time to call realloc
to make it bigger.
Finally, we might ask what the return type of malloc and realloc is, if they are able to return pointers to
char or pointers to int or (though we haven't seen it yet) pointers to any other type. The answer is that
both of these functions are declared (in <stdlib.h>) as returning a type we haven't seen, void * (that is,
http://www.eskimo.com/~scs/cclass/notes/sx11c.html (2 of 3) [22/07/2003 5:32:35 PM]

11.3 Reallocating Memory Blocks

pointer to void). We haven't really seen type void, either, but what's going on here is that void * is
specially defined as a ``generic'' pointer type, which may be used (strictly speaking, assigned to or from) any
pointer type.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx11c.html (3 of 3) [22/07/2003 5:32:35 PM]

11.4 Pointer Safety

11.4 Pointer Safety


At the beginning of the previous chapter, we said that the hard thing about pointers is not so much
manipulating them as ensuring that the memory they point to is valid. When a pointer doesn't point
where you think it does, if you inadvertently access or modify the memory it points to, you can damage
other parts of your program, or (in some cases) other programs or the operating system itself!
When we use pointers to simple variables, as in section 10.1, there's not much that can go wrong. When
we use pointers into arrays, as in section 10.2, and begin moving the pointers around, we have to be more
careful, to ensure that the roving pointers always stay within the bounds of the array(s). When we begin
passing pointers to functions, and especially when we begin returning them from functions (as in the
strstr function of section 10.4) we have to be more careful still, because the code using the pointer
may be far removed from the code which owns or allocated the memory.
One particular problem concerns functions that return pointers. Where is the memory to which the
returned pointer points? Is it still around by the time the function returns? The strstr function returns
either a null pointer (which points definitively nowhere, and which the caller presumably checks for) or it
returns a pointer which points into the input string, which the caller supplied, which is pretty safe. One
thing a function must not do, however, is return a pointer to one of its own, local, automatic-duration
arrays. Remember that automatic-duration variables (which includes all non-static local variables),
including automatic-duration arrays, are deallocated and disappear when the function returns. If a
function returns a pointer to a local array, that pointer will be invalid by the time the caller tries to use it.
Finally, when we're doing dynamic memory allocation with malloc, realloc, and free, we have to
be most careful of all. Dynamic allocation gives us a lot more flexibility in how our programs use
memory, although with that flexibility comes the responsibility that we manage dynamically allocated
memory carefully. The possibilities for misdirected pointers and associated mayhem are greatest in
programs that make heavy use of dynamic memory allocation. You can reduce these possibilities by
designing your program in such a way that it's easy to ensure that pointers are used correctly and that
memory is always allocated and deallocated correctly. (If, on the other hand, your program is designed in
such a way that meeting these guarantees is a tedious nuisance, sooner or later you'll forget or neglect to,
and maintenance will be a nightmare.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx11d.html [22/07/2003 5:32:38 PM]

Chapter 12: Input and Output

Chapter 12: Input and Output


So far, we've been calling printf to print formatted output to the ``standard output'' (wherever that is).
We've also been calling getchar to read single characters from the ``standard input,'' and putchar to
write single characters to the standard output. ``Standard input'' and ``standard output'' are two predefined
I/O streams which are implicitly available to us. In this chapter we'll learn how to take control of input
and output by opening our own streams, perhaps connected to data files, which we can read from and
write to.
12.1 File Pointers and fopen
12.2 I/O with File Pointers
12.3 Predefined Streams
12.4 Closing Files
12.5 Example: Reading a Data File

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx12.html [22/07/2003 5:32:39 PM]

12.1 File Pointers and <TT>fopen</TT>

12.1 File Pointers and fopen


[This section corresponds to K&R Sec. 7.5]
How will we specify that we want to access a particular data file? It would theoretically be possible to
mention the name of a file each time it was desired to read from or write to it. But such an approach
would have a number of drawbacks. Instead, the usual approach (and the one taken in C's stdio library) is
that you mention the name of the file once, at the time you open it. Thereafter, you use some little token-in this case, the file pointer--which keeps track (both for your sake and the library's) of which file you're
talking about. Whenever you want to read from or write to one of the files you're working with, you
identify that file by using its file pointer (that is, the file pointer you obtained when you opened the file).
As we'll see, you store file pointers in variables just as you store any other data you manipulate, so it is
possible to have several files open, as long as you use distinct variables to store the file pointers.
You declare a variable to store a file pointer like this:
FILE *fp;
The type FILE is predefined for you by <stdio.h>. It is a data structure which holds the information
the standard I/O library needs to keep track of the file for you. For historical reasons, you declare a
variable which is a pointer to this FILE type. The name of the variable can (as for any variable) be
anything you choose; it is traditional to use the letters fp in the variable name (since we're talking about
a file pointer). If you were reading from two files at once you'd probably use two file pointers:
FILE *fp1, *fp2;
If you were reading from one file and writing to another you might declare and input file pointer and an
output file pointer:
FILE *ifp, *ofp;
Like any pointer variable, a file pointer isn't any good until it's initialized to point to something.
(Actually, no variable of any type is much good until you've initialized it.) To actually open a file, and
receive the ``token'' which you'll store in your file pointer variable, you call fopen. fopen accepts a
file name (as a string) and a mode value indicating among other things whether you intend to read or
write this file. (The mode variable is also a string.) To open the file input.dat for reading you might
call
ifp = fopen("input.dat", "r");
The mode string "r" indicates reading. Mode "w" indicates writing, so we could open output.dat
http://www.eskimo.com/~scs/cclass/notes/sx12a.html (1 of 2) [22/07/2003 5:32:41 PM]

12.1 File Pointers and <TT>fopen</TT>

for output like this:


ofp = fopen("output.dat", "w");
The other values for the mode string are less frequently used. The third major mode is "a" for append.
(If you use "w" to write to a file which already exists, its old contents will be discarded.) You may also
add a + character to the mode string to indicate that you want to both read and write, or a b character to
indicate that you want to do ``binary'' (as opposed to text) I/O.
One thing to beware of when opening files is that it's an operation which may fail. The requested file
might not exist, or it might be protected against reading or writing. (These possibilities ought to be
obvious, but it's easy to forget them.) fopen returns a null pointer if it can't open the requested file, and
it's important to check for this case before going off and using fopen's return value as a file pointer.
Every call to fopen will typically be followed with a test, like this:
ifp = fopen("input.dat", "r");
if(ifp == NULL)
{
printf("can't open file\n");
exit or return
}
If fopen returns a null pointer, and you store it in your file pointer variable and go off and try to do I/O
with it, your program will typically crash.
It's common to collapse the call to fopen and the assignment in with the test:
if((ifp = fopen("input.dat", "r")) == NULL)
{
printf("can't open file\n");
exit or return
}
You don't have to write these ``collapsed'' tests if you're not comfortable with them, but you'll see them
in other people's code, so you should be able to read them.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx12a.html (2 of 2) [22/07/2003 5:32:41 PM]

12.2 I/O with File Pointers

12.2 I/O with File Pointers


For each of the I/O library functions we've been using so far, there's a companion function which accepts
an additional file pointer argument telling it where to read from or write to. The companion function to
printf is fprintf, and the file pointer argument comes first. To print a string to the output.dat
file we opened in the previous section, we might call
fprintf(ofp, "Hello, world!\n");
The companion function to getchar is getc, and the file pointer is its only argument. To read a
character from the input.dat file we opened in the previous section, we might call
int c;
c = getc(ifp);
The companion function to putchar is putc, and the file pointer argument comes last. To write a
character to output.dat, we could call
putc(c, ofp);
Our own getline function calls getchar and so always reads the standard input. We could write a
companion fgetline function which reads from an arbitrary file pointer:
#include <stdio.h>
/*
/*
/*
/*
int
{
int
int
max

Read one line from fp, */


copying it to line array (but no more than max chars). */
Does not place terminating \n in line array. */
Returns line length, or 0 for empty line, or EOF for end-of-file. */
fgetline(FILE *fp, char line[], int max)
nch = 0;
c;
= max - 1;

/* leave room for '\0' */

while((c = getc(fp)) != EOF)


{
if(c == '\n')
break;
if(nch < max)

http://www.eskimo.com/~scs/cclass/notes/sx12b.html (1 of 2) [22/07/2003 5:32:43 PM]

12.2 I/O with File Pointers

{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
Now we could read one line from ifp by calling
char line[MAXLINE];
...
fgetline(ifp, line, MAXLINE);

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx12b.html (2 of 2) [22/07/2003 5:32:43 PM]

12.3 Predefined Streams

12.3 Predefined Streams


Besides the file pointers which we explicitly open by calling fopen, there are also three predefined
streams. stdin is a constant file pointer corresponding to standard input, and stdout is a constant file
pointer corresponding to standard output. Both of these can be used anywhere a file pointer is called for;
for example, getchar() is the same as getc(stdin) and putchar(c) is the same as putc(c,
stdout). The third predefined stream is stderr. Like stdout, stderr is typically connected to
the screen by default. The difference is that stderr is not redirected when the standard output is
redirected. For example, under Unix or MS-DOS, when you invoke
program > filename
anything printed to stdout is redirected to the file filename, but anything printed to stderr still
goes to the screen. The intent behind stderr is that it is the ``standard error output''; error messages
printed to it will not disappear into an output file. For example, a more realistic way to print an error
message when a file can't be opened would be
if((ifp = fopen(filename, "r")) == NULL)
{
fprintf(stderr, "can't open file %s\n", filename);
exit or return
}
where filename is a string variable indicating the file name to be opened. Not only is the error
message printed to stderr, but it is also more informative in that it mentions the name of the file that
couldn't be opened. (We'll see another example in the next chapter.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx12c.html [22/07/2003 5:32:44 PM]

12.4 Closing Files

12.4 Closing Files


Although you can open multiple files, there's a limit to how many you can have open at once. If your
program will open many files in succession, you'll want to close each one as you're done with it;
otherwise the standard I/O library could run out of the resources it uses to keep track of open files.
Closing a file simply involves calling fclose with the file pointer as its argument:
fclose(fp);
Calling fclose arranges that (if the file was open for output) any last, buffered output is finally written
to the file, and that those resources used by the operating system (and the C library) for this file are
released. If you forget to close a file, it will be closed automatically when the program exits.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx12d.html [22/07/2003 5:32:46 PM]

12.5 Example: Reading a Data File

12.5 Example: Reading a Data File


Suppose you had a data file consisting of rows and columns of numbers:
1
5
9

2
6
10

34
78
112

Suppose you wanted to read these numbers into an array. (Actually, the array will be an array of arrays,
or a ``multidimensional'' array; see section 4.1.2.) We can write code to do this by putting together
several pieces: the fgetline function we just showed, and the getwords function from chapter 10.
Assuming that the data file is named input.dat, the code would look like this:
#define MAXLINE 100
#define MAXROWS 10
#define MAXCOLS 10
int array[MAXROWS][MAXCOLS];
char *filename = "input.dat";
FILE *ifp;
char line[MAXLINE];
char *words[MAXCOLS];
int nrows = 0;
int n;
int i;
ifp = fopen(filename, "r");
if(ifp == NULL)
{
fprintf(stderr, "can't open %s\n", filename);
exit(EXIT_FAILURE);
}
while(fgetline(ifp, line, MAXLINE) != EOF)
{
if(nrows >= MAXROWS)
{
fprintf(stderr, "too many rows\n");
exit(EXIT_FAILURE);
}
n = getwords(line, words, MAXCOLS);

http://www.eskimo.com/~scs/cclass/notes/sx12e.html (1 of 2) [22/07/2003 5:32:48 PM]

12.5 Example: Reading a Data File

for(i = 0; i < n; i++)


array[nrows][i] = atoi(words[i]);
nrows++;
}
Each trip through the loop reads one line from the file, using fgetline. Each line is broken up into
``words'' using getwords; each ``word'' is actually one number. The numbers are however still
represented as strings, so each one is converted to an int by calling atoi before being stored in the
array. The code checks for two different error conditions (failure to open the input file, and too many
lines in the input file) and if one of these conditions occurs, it prints an error message, and exits. The
exit function is a Standard library function which terminates your program. It is declared in
<stdlib.h>, and accepts one argument, which will be the exit status of the program.
EXIT_FAILURE is a code, also defined by <stdlib.h>, which indicates that the program failed.
Success is indicated by a code of EXIT_SUCCESS, or simply 0. (These values can also be returned from
main(); calling exit with a particular status value is essentially equivalent to returning that same
status value from main.)

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx12e.html (2 of 2) [22/07/2003 5:32:48 PM]

Chapter 13: Reading the Command Line

Chapter 13: Reading the Command Line


[This section corresponds to K&R Sec. 5.10]
We've mentioned several times that a program is rarely useful if it does exactly the same thing every time
you run it. Another way of giving a program some variable input to work on is by invoking it with
command line arguments.
(We should probably admit that command line user interfaces are a bit old-fashioned, and currently
somewhat out of favor. If you've used Unix or MS-DOS, you know what a command line is, but if your
experience is confined to the Macintosh or Microsoft Windows or some other Graphical User Interface,
you may never have seen a command line. In fact, if you're learning C on a Mac or under Windows, it
can be tricky to give your program a command line at all. Think C for the Macintosh provides a way; I'm
not sure about other compilers. If your compilation environment doesn't provide an easy way of
simulating an old-fashioned command line, you may skip this chapter.)
C's model of the command line is that it consists of a sequence of words, typically separated by
whitespace. Your main program can receive these words as an array of strings, one word per string. In
fact, the C run-time startup code is always willing to pass you this array, and all you have to do to receive
it is to declare main as accepting two parameters, like this:
int main(int argc, char *argv[])
{
...
}
When main is called, argc will be a count of the number of command-line arguments, and argv will
be an array (``vector'') of the arguments themselves. Since each word is a string which is represented as a
pointer-to-char, argv is an array-of-pointers-to-char. Since we are not defining the argv array, but
merely declaring a parameter which references an array somewhere else (namely, in main's caller, the
run-time startup code), we do not have to supply an array dimension for argv. (Actually, since functions
never receive arrays as parameters in C, argv can also be thought of as a pointer-to-pointer-to-char, or
char **. But multidimensional arrays and pointers to pointers can be confusing, and we haven't
covered them, so we'll talk about argv as if it were an array.) (Also, there's nothing magic about the
names argc and argv. You can give main's two parameters any names you like, as long as they have
the appropriate types. The names argc and argv are traditional.)
The first program to write when playing with argc and argv is one which simply prints its arguments:
#include <stdio.h>

http://www.eskimo.com/~scs/cclass/notes/sx13.html (1 of 3) [22/07/2003 5:32:50 PM]

Chapter 13: Reading the Command Line

main(int argc, char *argv[])


{
int i;
for(i = 0; i < argc; i++)
printf("arg %d: %s\n", i, argv[i]);
return 0;
}
(This program is essentially the Unix or MS-DOS echo command.)
If you run this program, you'll discover that the set of ``words'' making up the command line includes the
command you typed to invoke your program (that is, the name of your program). In other words,
argv[0] typically points to the name of your program, and argv[1] is the first argument.
There are no hard-and-fast rules for how a program should interpret its command line. There is one set of
conventions for Unix, another for MS-DOS, another for VMS. Typically you'll loop over the arguments,
perhaps treating some as option flags and others as actual arguments (input files, etc.), interpreting or
acting on each one. Since each argument is a string, you'll have to use strcmp or the like to match
arguments against any patterns you might be looking for. Remember that argc contains the number of
words on the command line, and that argv[0] is the command name, so if argc is 1, there are no
arguments to inspect. (You'll never want to look at argv[i], for i >= argc, because it will be a null
or invalid pointer.)
As another example, also illustrating fopen and the file I/O techniques of the previous chapter, here is a
program which copies one or more input files to its standard output. Since ``standard output'' is usually
the screen by default, this is therefore a useful program for displaying files. (It's analogous to the
obscurely-named Unix cat command, and to the MS-DOS type command.) You might also want to
compare this program to the character-copying program of section 6.2.
#include <stdio.h>
main(int argc, char *argv[])
{
int i;
FILE *fp;
int c;
for(i = 1; i < argc; i++)
{
fp = fopen(argv[i], "r");
if(fp == NULL)
{
http://www.eskimo.com/~scs/cclass/notes/sx13.html (2 of 3) [22/07/2003 5:32:50 PM]

Chapter 13: Reading the Command Line

fprintf(stderr, "cat: can't open %s\n", argv[i]);


continue;
}
while((c = getc(fp)) != EOF)
putchar(c);
fclose(fp);
}
return 0;
}
As a historical note, the Unix cat program is so named because it can be used to concatenate two files
together, like this:
cat a b > c
This illustrates why it's a good idea to print error messages to stderr, so that they don't get redirected.
The ``can't open file'' message in this example also includes the name of the program as well as the name
of the file.
Yet another piece of information which it's usually appropriate to include in error messages is the reason
why the operation failed, if known. For operating system problems, such as inability to open a file, a
code indicating the error is often stored in the global variable errno. The standard library function
strerror will convert an errno value to a human-readable error message string. Therefore, an even
more informative error message printout would be
fp = fopen(argv[i], "r");
if(fp == NULL)
fprintf(stderr, "cat: can't open %s: %s\n",
argv[i], strerror(errno));
If you use code like this, you can #include <errno.h> to get the declaration for errno, and
<string.h> to get the declaration for strerror().

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx13.html (3 of 3) [22/07/2003 5:32:50 PM]

Chapter 14: What's Next?

Chapter 14: What's Next?


This last handout contains a brief list of the significant topics in C which we have not covered, and which
you'll want to investigate further if you want to know all of C.
Types and Declarations
Operators
Statements
Functions
C Preprocessor
Standard Library Functions

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx14.html [22/07/2003 5:32:51 PM]

Types and Declarations

Types and Declarations


We have not talked about the void, short int, and long double types. void is a type with no
values, used as a placeholder to indicate functions that do not return values or that accept no arguments,
and in the ``generic'' pointer type void * that can point to anything. short int is an integer type
that might use less space than a plain int; long double is a floating-point type that might have even
more range or precision than plain double.
The char type and the various sizes of int also have ``unsigned'' versions, which are declared using the
keyword unsigned. Unsigned types cannot hold negative values but have guaranteed properties on
overflow. (Whether a plain char is signed or unsigned is implementation-defined; you can use the
keyword signed to force a character type to contain signed characters.) Unsigned types are also useful
when manipulating individual bits and bytes, when ``sign extension'' might otherwise be a problem.
Two additional type qualifiers const and volatile allow you to declare variables (or pointers to
data) which you promise not to change, or which might change in unexpected ways behind the program's
back.
There are user-defined structure and union types. A structure or struct is a ``record'' consisting of one
or more values of one or more types concreted together into one entity which can be manipulated as a
whole. A union is a type which, at any one time, can hold a value from one of a specified set of types.
There are user-defined enumeration types (``enum'') which are like integers but which always contain
values from some fixed, predefined set, and for which the values are referred to by name instead of by
number.
Pointers can point to functions as well as to data types.
Types can be arbitrarily complicated, when you start using multiple levels of pointers, arrays, functions,
structures, and/or unions. Eventually, it's important to understand the concept of a declarator: in the
declaration
int i, *ip, *fpi();
we have the base type int and three declarators i, *ip, and *fpi(). The declarator gives the name of
a variable (or function) and also indicates whether it is a simple variable or a pointer, array, function, or
some more elaborate combination (array of pointers, function returning pointer, etc.). In the example, i
is declared to be a plain int, ip is declared to be a pointer to int, and fpi is declared to be a function
returning pointer to int. (Complicated declarators may also contain parentheses for grouping, since
there's a precedence hierarchy in declarators as well as expressions: [] for arrays and () for functions
have higher precedence than * for pointers.)
http://www.eskimo.com/~scs/cclass/notes/sx14a.html (1 of 2) [22/07/2003 5:32:53 PM]

Types and Declarations

We have not said much about pointers to pointers, or arrays of arrays (i.e. multidimensional arrays), or
the ramifications of array/pointer equivalence on multidimensional arrays. (In particular, a reference to
an array of arrays does not generate a pointer to a pointer; it generates a pointer to an array. You cannot
pass a multidimensional array to a function which accepts pointers to pointers.)
Variables can be declared with a hint that they be placed in high-speed CPU registers, for efficiency.
(These hints are rarely needed or used today, because modern compilers do a good job of register
allocation by themselves, without hints.)
A mechanism called typedef allows you to define user-defined aliases (i.e. new and perhaps moreconvenient names) for other types.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx14a.html (2 of 2) [22/07/2003 5:32:53 PM]

Operators

Operators
The bitwise operators &, |, ^, and ~ operate on integers thought of as binary numbers or strings of bits.
The & operator is bitwise AND, the | operator is bitwise OR, the ^ operator is bitwise exclusive-OR
(XOR), and the ~ operator is a bitwise negation or complement. (&, |, and ^ are ``binary'' in that they
take two operands; ~ is unary.) These operators let you work with the individual bits of a variable; one
common use is to treat an integer as a set of single-bit flags. You might define the 3rd (2**2) bit as the
``verbose'' flag bit by defining
#define VERBOSE 4
Then you can ``turn the verbose bit on'' in an integer variable flags by executing
flags = flags | VERBOSE;
or
flags |= VERBOSE;
and turn it off with
flags = flags & ~VERBOSE;
or
flags &= ~VERBOSE;
and test whether it's set with
if(flags & VERBOSE)
The left-shift and right-shift operators << and >> let you shift an integer left or right by some number of
bit positions; for example, value << 2 shifts value left by two bits.
The ?: or conditional operator (also called the ``ternary operator'') essentially lets you embed an
if/then statement in an expression. The assignment
a = expr ? b : c;
is roughly equivalent to
if(expr)
else

a = b;
a = c;

http://www.eskimo.com/~scs/cclass/notes/sx14b.html (1 of 3) [22/07/2003 5:32:55 PM]

Operators

Since you can use ?: anywhere in an expression, it can do things that if/then can't, or that would be
cumbersome with if/then. For example, the function call
f(a, b, c ? d : e);
is roughly equivalent to
if(c)
else

f(a, b, d);
f(a, b, e);

(Exercise: what would the call


g(a, b, c ? d : e, h ? i : j, k);
be equivalent to?)
The comma operator lets you put two separate expressions where one is required; the expressions are
executed one after the other. The most common use for comma operators is when you want multiple
variables controlling a for loop, for example:
for(i = 0, j = 10; i < j; i++, j--)
A cast operator allows you to explicitly force conversion of a value from one type to another. A cast
consists of a type name in parentheses. For example, you could convert an int to a double by typing
int i = 10;
double d;
d = (double)i;
(In this case, though, the cast is redundant, since this is a conversion that C would have performed for
you automatically, i.e. if you'd just said d = i .) You use explicit casts in those circumstances where C
does not do a needed conversion automatically. One example is division: if you're dividing two integers
and you want a floating-point result, you must explicitly force at least one of the operands to floatingpoint, otherwise C will perform an integer division and will discard the remainder. The code
int i = 1, j = 2;
double d = i / j;
will set d to 0, but

http://www.eskimo.com/~scs/cclass/notes/sx14b.html (2 of 3) [22/07/2003 5:32:55 PM]

Operators

d = (double)i / j;
will set d to 0.5. You can also ``cast to void'' to explicitly indicate that you're ignoring a function's
return value, as in
(void)fclose(fp);
or
(void)printf("Hello, world!\n");
(Usually, it's a bad idea to ignore return values, but in some cases it's essentially inevitable, and the
(void) cast keeps some compilers from issuing warnings every time you ignore a value.)
There's a precise, mildly elaborate set of rules which C uses for converting values automatically, in the
absence of explicit casts.
The . and -> operators let you access the members (components) of structures and unions.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx14b.html (3 of 3) [22/07/2003 5:32:55 PM]

Statements

Statements
The switch statement allows you to jump to one of a number of numeric case labels depending on the
value of an expression; it's more convenient than a long if/else chain. (However, you can use
switch only when the expression is integral and all of the case labels are compile-time constants.)
The do/while loop is a loop that tests its controlling expression at the bottom of the loop, so that the
body of the loop always executes once even if the condition is initially false. (C's do/while loop is
therefore like Pascal's repeat/until loop, while C's while loop is like Pascal's while/do loop.)
Finally, when you really need to write ``spaghetti code,'' C does have the all-purpose goto statement,
and labels to go to.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995-1997 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx14c.html [22/07/2003 5:32:57 PM]

Functions

Functions
Functions can't return arrays, and it's tricky to write a function as if it returns an array (perhaps by
simulating the array with a pointer) because you have to be careful about allocating the memory that the
returned pointer points to.
The functions we've written have all accepted a well-defined, fixed number of arguments. printf
accepts a variable number of arguments (depending on how many % signs there are in the format string)
but we haven't seen how to declare and write functions that do this.

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx14d.html [22/07/2003 5:32:58 PM]

C Preprocessor

C Preprocessor
If you're careful, it's possible (and can be useful) to use #include within a header file, so that you end
up with ``nested header files.''
It's possible to use #define to define ``function-like'' macros that accept arguments; the expansion of
the macro can therefore depend on the arguments it's ``invoked'' with.
Two special preprocessing operators # and ## let you control the expansion of macro arguments in
fancier ways.
The preprocessor directive #if lets you conditionally include (or, with #else, conditionally not
include) a section of code depending on some arbitrary compile-time expression. (#if can also do the
same macro-definedness tests as #ifdef and #ifndef, because the expression can use a defined()
operator.)
Other preprocessing directives are #elif, #error, #line, and #pragma.
There are a few predefined preprocessor macros, some required by the C standard, others perhaps
defined by particular compilation environments. These are useful for conditional compilation (#ifdef,
#ifndef).

Read sequentially: prev next up top


This page by Steve Summit // Copyright 1995, 1996 // mail feedback

http://www.eskimo.com/~scs/cclass/notes/sx14e.html [22/07/2003 5:33:00 PM]

Standard Library Functions

Standard Library Functions


C's standard library contains many features and functions which we haven't seen.
We've seen many of printf's formatting capabilities, but not all. Besides format specifier characters for
a few types we haven't seen, you can also control the width, precision, justification (left or right) and a
few other attributes of printf's format conversions. (In their full complexity, printf formats are
about as elaborate and powerful as FORTRAN format statements.)
A scanf function lets you do ``formatted input'' analogous to printf's formatted output. scanf reads
from the standard input; a variant fscanf reads from a specified file pointer.
The sprintf and sscanf functions let you ``print'' and ``read'' to and from in-memory strings instead
of files. We've seen that atoi lets you convert a numeric string into an integer; the inverse operation can
be performed with sprintf:
int i = 10;
char str[10];
sprintf(str, "%d", i);
We've used printf and fprintf to write formatted output, and getchar, getc, putchar, and
putc to read and write characters. There are also functions gets, fgets, puts, and fputs for
reading and writing lines (though we rarely need these, especially if we're using our own getline and
maybe fgetline), and also fread and fwrite for reading or writing arbitrary numbers of
characters.
It's possible to ``un-read'' a character, that is, to push it back on an input stream, with ungetc. (This is
useful if you accidentally read one character too far, and would prefer that some other part of your
program read that character instead.)
You can use the ftell, fseek, and rewind functions to jump around in files, performing random
access (as opposed to sequential) I/O.
The feof and ferror functions will tell you whether you got EOF due to an actual end-of-file
condition or due to a read error of some sort. You can clear errors and end-of-file conditions with
clearerr.
You can open files in ``binary'' mode, or for simultaneous reading and writing. (These options involve
extra characters appended to fopen's mode string: b for binary, + for read/write.)
There are several more string functions in <string.h>. A second set of string functions strncpy,
http://www.eskimo.com/~scs/cclass/notes/sx14f.html (1 of 3) [22/07/2003 5:33:01 PM]

Standard Library Functions

strncat, and strncmp all accept a third argument telling them to stop after n characters if they
haven't found the \0 marking the end of the string. A third set of ``mem'' functions, including memcpy
and memcmp, operate on blocks of memory which aren't necessarily strings and where \0 is not treated
as a terminator. The strchr and strrchr functions find characters in strings. There is a motley
collection of ``span'' and ``scan'' functions, strspn, strcspn, and strpbrk, for searching out or
skipping over sequences of characters all drawn from a specified set of characters. The strtok function
aids in breaking up a string into words or ``tokens,'' much like our own getwords function.
The header file <ctype.h> contains several functions which let you classify and manipulate
characters: check for letters or digits, convert between upper- and lower-case, etc.
A host of mathematical functions are defined in the header file <math.h>. (As we've mentioned,
besides including <math.h>, you may on some Unix systems have to ask for a special library
containing the math functions while compiling/linking.)
There's a random-number generator, rand, and a way to ``seed'' it, srand. rand returns integers from
0 up to RAND_MAX (where RAND_MAX is a constant #defined in <stdlib.h>). One way of getting
random integers from 1 to n is to call
(int)(rand() / (RAND_MAX + 1.0) * n) + 1
Another way is
rand() / (RAND_MAX / n + 1) + 1
It seems like it would be simpler to just say
rand() % n + 1
but this method is imperfect (or rather, it's imperfect if n is a power of two and your system's
implementation of rand() is imperfect, as all too many of them are).
Several functions let you interact with the operating system under which your program is running. The
exit function returns control to the operating system immediately, terminating your program and
returning an ``exit status.'' The getenv function allows you to read your operating system's or process's
``environment variables'' (if any). The system function allows you to invoke an operating-system
command (i.e. another program) from within your program.
The qsort function allows you to sort an array (of any type); you supply a comparison function (via a
function pointer) which knows how to compare two array elements, and qsort does the rest. The
bsearch function allows you to search for elements in sorted arrays; it, too, operates in terms of a
caller-supplied comparison function.
http://www.eskimo.com/~scs/cclass/notes/sx14f.html (2 of 3) [22/07/2003 5:33:01 PM]

Standard Library Functions

Several functions--time, asctime, gmtime, localtime, asctime, mktime, difftime, and


strftime--allow you to determine the current date and time, print dates and times, and perform other
date/time manipulations. For example, to print today's date in a program, you can write
#include <time.h>
time_t now;
now = time((time_t *)NULL);
printf("It's %.24s", ctime(&now));
The header file <stdarg.h> lets you manipulate variable-length function argument lists (such as the
ones printf is called with). Additional members of the printf family of functions let you write your
own